Concurrency != Parallelism

时间 2019-11-10

原文原文链接

前段时间在公司给你们分享GO语言的一些特性，而后讲到了并发概念，你们表示很迷茫，而后分享过程当中我拿来了Rob Pike大神的Slides 《Concurrency is not Parallelism》，反而搞的你们更迷茫了，看来你们丢了不少之前的基本知识。后来我就把Pike大神的slide和网上的一些牛人关于Cocurrency和Parallelism的观点作了整理，最终写了本文。程序员

在Rob Pike的《Concurrency is not Parallelism》（http://talks.golang.org/2012/waza.slide）中说到，咱们的世界是Parallelism,好比网络，好比大量独立的我的。可是须要协做的。因此就有了并发。不少人认为并发很COOL，认为并发就是并行。可是这种观念在ROB PIKE看来是错误的。好比以前有我的写了一个质数筛选的程序，而后这个运行在4核的平台上，可是运行结果很慢。这个程序员错误的觉得GO提供的并发就是并行计算。golang

在Rob的这个slide中举了一个地鼠烧书的例子，在我和小伙伴们一块儿看的时候表示不太理解，因此作了一些功课来理解。浏览器

为了更容易的搞懂Rob提的并发不是并行的概念，就须要搞懂并发和并行到底有什么区别。网络

在Rob看来，Concurrency是一种把一些独立的执行过程组合起来的程序设计方法。并发

而Parallelism是同时执行一些可能相关(结果相关而不是依赖耦合关系的相关)或者独立计算过程。app

Rob总结到：dom

Concurrency is about dealing with lots of things at one. ide

Parallelism is about doing lots of things at one. Not the same, but related. 函数

Concurrency is about structure, parallelism is about execution. 工具

Concurrency provides a way to structure a solution to solve a problem that may(but not necessarily) be parallelizable.

Rob还提到：

Concurrency is a way to structure a program by breaking it into pieces that can be executed independently.

Communication is the means to coordinate the independent executions.

This is the GO model and it's based on CSP.

上面的只有一只地鼠推着小车把一堆语言手册运送到焚烧炉中烧掉。若是手册不少，运送距离很远，那么整个过程就要花费不少时间。

而后增长一只地鼠来一块儿帮忙作这个事情，可是两我的来作，只有一辆小推车是不够的，因此须要更多的车子

虽然车子增长了，可是有可能手册不够少了，或者炉子不够用了，并且两只地鼠须要作工做的时候协商搬书到车里，或者协商谁先用炉子。工做效率也不是过高。

所以咱们把它们真正的独立开。它们之间就不须要协商了，也都有足够的手册能够烧了。

虽然上面两只地鼠都分开独立的运行了，可是它们之间有并发成分(Concurrent composition)存在。

假设在一个单位时间内，只有一只地鼠在工做，它们就不是并行(Parallel)，它们仍是并发的。

这个设计不是为了有意并行而设计，可是能够天然地转化为可并行的(The design is not automatically parallel. However, it's automatically parallelizable）。

并且这个并发成分还暗示了有其余的模型。

这里三只地鼠活动，可能有些延迟。每一个地鼠都是一个独立的执行过程。它们协做交流。

这里新增长了一只地鼠用于推送空车子到装书处。每只地鼠都只作一件事情。这个并发的粒度比之前的更细小了。

若是咱们把一切正确的安排好（虽然可能使人难以置信，可是并不是不可能），这个设计会比最开始的那一只地鼠的效率要快4倍。

咱们在原有的设计里增长了一个并发的执行过程提高了性能。

Different concurrent designs enable different ways to parallelize.

这种并发的设计能够很容易的使这个流程并行的执行。好比下面这样：8只地鼠繁忙的工做。

须要记住的是，即便在单位时间内只有一只地鼠在活动，此时虽然不是并行的，可是这个设计仍是一个正确的并发的解决方案。

还有另一种结构来组织两只地鼠的并发因素.即在两只地鼠中间再增长一堆手册。

而后咱们能够也很容易的并行执行：

还有一个组织方式：

而后再并行它：

如今有16只地鼠在工做了！

有不少方式能够分解一个过程。这个分解的过程就是并发的设计。一旦咱们作好分解，并行化和正确性就会变的容易。
There are many ways to break the processing down. That's concurrent design. Once we have the breakdown, parallelization can fall out and correctness is easy.

一个复杂的问题能够分解为多个能够简单易懂的部分，而后把它们并发地(concurrently)组合在一块儿。最后获得一个简单易懂，高效的，可伸缩的并且正确的设计。甚至能够并行。

并发是很是强大的，虽然不是并行，可是能够作到并行，并且能够容易的作到并行，甚至是可伸缩性或者其余任何的东西。

把上面地鼠的例子转换到咱们计算机中，书堆就比如网页内容，地鼠比如CPU，小推车比如序列化器或者渲染过程，或者网络，而火炉就是最终的消费者，好比浏览器。

就是说浏览器发起请求，地鼠开始工做，把须要的网页内容渲染好，而后经过网络再发回浏览器。这也就变成了一个针对可伸缩的WEB服务作的并发的设计。

在Golang中，Goroutine就是一个与其余goroutine在同一地址空间可是是独立执行的函数。Goroutine不是线程，虽然有点像，可是比线程更轻量。

Goroutine会被多路复用到须要的系统线程中。当一个goroutine阻塞了，其相关的线程也阻塞了，可是其余的goroutine没有阻塞（注：这种阻塞是指一些系统调用，而像网络操做，CHANNEL操做之类的，goroutine会被放入等待队列中，等运行条件成立，好比网络操做完毕，或者从CHANNEL中处理好收发，那么就会被切换到运行队列中运行）

Golang还提供了Channel用于goroutine之间的同步和数据交换。Select语句相似于switch，只是是用于判断哪一个channel能够通讯。

Golang是真的支持并发，一个Golang能够建立很是多的goroutine，好比一个测试程序建立了130W个goroutine，并且每一个goroutine的栈开始的时候都比较小，可是会需扩大或收缩。可是Goroutine不是免费的，可是是轻量的。

总结

用于区别Concurrency和Parallelism的最好的例子是哲学家进餐问题。经典哲学家进餐问题中，若是刀或者叉不够的时候，每一个哲学家就须要协商来确保总有一我的可以吃到饭。而若是餐具足够了，那么就须要协商了，你们能够很嗨皮的各吃各的了。

Concurrency与Parallelism最大的区别是要知足Concurrency，每一个独立的执行体之间必须有协做，协做工具好比锁或者信号量，或者队列。

而Parallelism虽然也有不少独立的执行体，可是它们不是须要协做的，只须要本身运行，计算出结果，最后被汇总起来就行了。

Concurrency是一种问题分解和组织方法，而Parallelism一种执行方式。二者不一样，可是有必定的关系。

咱们大部分的问题均可以切割成细小的问题，每一个问题做为一个独立的执行体执行，相互间有协做，它们就是Concurrency。

这种设计方式并非为了Parallelism，可是这种设计能够天然而然的转化为可并行的。并且这种设计方式还可以确保正确性。

Concurrency不论是单核机器仍是多核机器，仍是网络或者其余平台，都具备更好的伸缩性；

而Parallelism在单核机器上就作不到真正的Parallelism。可是Concurrency在多核机器上反而有可能天然而然的进化为Parallelism。

Concurrency的系统的结果是不肯定的(indeterminate)，为了保证肯定性，须要用锁或者其余方式解决。而Parallelism的结果是肯定的。

Golang是支持并发的，能够作到并行的执行Goroutine（即经过runtime.GOMAXPROC设置P的数量）。基于CSP提供了Goroutine，channel和select语句，提倡你们经过Message-Passing进行协做，而不是经过锁等工具。这种协做方式能够带来更好的可扩展性。

在维基百科中，关于Concurrent的定义以下：

Concurrent computing is a form of computing in which several computations are executing during overlapping time periods – concurrently – instead of sequentially (one completing before the next starts). This is a property of a system – this may be an individual program, a computer, or a network – and there is a separate execution point or "thread of control" for each computation ("process"). A concurrent system is one where a computation can make progress without waiting for all other computations to complete – where more than one computation can make progress at "the same time"

Concurrent computing is related to but distinct from parallel computing, though these concepts are frequently confused, and both can be described as "multiple processes executing at the same time". In parallel computing, execution literally occurs at the same instant, for example on separate processors of a multi-processor machine – parallel computing is impossible on a (single-core) single processor, as only one computation can occur at any instant (during any single clock cycle).(This is discounting parallelism internal to a processor core, such as pipelining or vectorized instructions. A single-core, single-processor machine may be capable of some parallelism, such as with a coprocessor, but the processor itself is not.) By contrast, concurrent computing consists of process lifetimes overlapping, but execution need not happen at the same instant.

For example, concurrent processes can be executed on a single core by interleaving the execution steps of each process via time slices: only one process runs at a time, and if it does not complete during its time slice, it is paused, another process begins or resumes, and then later the original process is resumed. In this way multiple processes are part-way through execution at a single instant, but only one process is being executed at that instant.

Concurrent computations may be executed in parallel, for example by assigning each process to a separate processor or processor core, or distributing a computation across a network. This is known as task parallelism, and this type of parallel computing is a form of concurrent computing.

The exact timing of when tasks in a concurrent system are executed depend on the scheduling, and tasks need not always be executed concurrently. For example, given two tasks, T1 and T2:

T1 may be executed and finished before T2
T2 may be executed and finished before T1
T1 and T2 may be executed alternatively (time-slicing)
T1 and T2 may be executed simultaneously at the same instant of time (parallelism)

The word "sequential" is used as an antonym for both "concurrent" and "parallel"; when these are explicitly distinguished, concurrent/sequential and parallel/serial are used as opposing pairs

Concurrency的定义以下：

In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other.

The computations may be executing on multiple cores in the same chip, preemptively time-shared threads on the same processor, or executed on physically separated processors.

A number of mathematical models have been developed for general concurrent computation including Petri nets,process calculi, the Parallel Random Access Machine model, the Actor model and the Reo Coordination Language.

Because computations in a concurrent system can interact with each other while they are executing, the number of possible execution paths in the system can be extremely large, and the resulting outcome can be indeterminate. Concurrent use of shared resources can be a source of indeterminacy leading to issues such as deadlock, and starvation.

The design of concurrent systems often entails finding reliable techniques for coordinating their execution, data exchange, memory allocation, and execution scheduling to minimize response time and maximize throughput.

关于Parallelism的定义以下：

Parallel computing is a form of computation in which many calculations are carried out simultaneously,

operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently (“in parallel”).

By contrast, parallel computing by data parallelism may or may not be concurrent computing – a single process may control all computations, in which case it is not concurrent, or the computations may be spread across several processes, in which case this is concurrent. For example, SIMD (single instruction, multiple data) processing is (data) parallel but not concurrent – multiple computations are happening at the same instant (in parallel), but there is only a single process. Examples of this include vector processors and graphics processing units (GPUs). By contrast, MIMD (multiple instruction, multiple data) processing is both data parallel and task parallel, and is concurrent; this is commonly implemented as SPMD (single program, multiple data), where multiple programs execute concurrently and in parallel on different data.

Concurrency有Parallelism所不具有的特性：interacting。并发的程序中，每一个执行体之间都是能够相互做用的。