分布式系统：时间、时钟和事件序列

时间 2019-11-09

原文原文链接

在程序中，咱们常常须要知道事件序列，在单体应用中，事件序列是较为简单的，最简单的办法就是用时间戳，但在分布式系统中，事件序列是很困难的，Leslie Lamport大神在论文Time, Clocks, and the Ordering of Events in a Distributed System讨论了在分布式系统中时间、时钟和事件序列的问题。html

【1】分布式系统中物理时钟存在的问题

逻辑时钟是相对物理时钟这个概念的，为何要提出逻辑时钟，由于物理时钟在分布式系统中存在一系列问题。在一台机器上的多个进程能够从一个物理时钟中获取时间戳，无论这个物理时钟是否准确，只要是从一个物理时钟获取时间戳，咱们都能得到多个事件的相对时间顺序。可是在分布式系统中，咱们没法从一个物理时钟获取时间戳，只能从各自机器上物理时钟获取时间戳，而各台机器的物理时钟是很难彻底同步的，即便有NTP，精度也是有限的。因此在分布式系统中，是不能经过物理时钟决定事件序列的。web

物理时钟在分布式系统中也不是毫无用处，至少它必定程度上能够判断在一台机器上的事件顺序，同时分布式系统中仍是有必要让不一样机器上的物理时钟在必定精度内同步时间的，只是不做为决定事件序列的方法。算法

【2】偏序（Partial Ordering）

事件序列有两种：偏序事件序列和全序事件序列。所谓的偏序指的是只能为系统中的部分事件定义前后顺序。这里的部分实际上是有因果关系的事件。在论文Time, Clocks, and the Ordering of Events in a Distributed System中，偏序是由“happened before”引出的，咱们先看一下"happened before"（表示为“->”）的定义：微信

Definition. The relation "->"on the set of events of a system is the smallest relation satisfying the following three conditions:网络

(1) If a and b are events in the same process, and a comes before b, then a->b.并发

(2) If a is the sending of a message by one process and b is the receipt of the same message by another process, then a->b.app

(3) If a->b and b->c then a->c.分布式

在分布式系统中，只有两个发生关联的事件（有因果关系），咱们才会去关心二者的先来后到关系。对于并发事件，他们两个谁先发生，谁后发生，其实咱们并不关心。偏序就是用来定义两个因果事件的发生次序，即‘happens before’。而对于并发事件（没有因果关系），并不能决定其前后，因此说这种‘happens before’的关系，是一种偏序关系。操作系统

If two entities do not exchange any messages, then they probably do not need to share a common clock; events occurring on those entities are termed as concurrent events.”.net

【3】逻辑时钟

论文原文中有这样一句：We begin with an abstract point of view in which a clock is just a way of assigning a number to an event, where the number is thought of as the time at which the event occurred. 这句话的意思是，能够把时间进行抽象，把时间值当作是事件发生顺序的一个序列号，这个值能够<20190515,20190516,20190517>，也能够是<1,2,3>。后面就有了逻辑时钟的概念。定义以下：
we define a clock Ci for each process Pi to be a function which assigns a number Ci(a) to any event a in that process.

Clock Condition. For any events a,b: if a->b then C(a) < C(b).
C1. If a and b are events in process Pi, and a comes before b, then Ci(a) < Ci(b).
C2. If a is the sending of a message by process Pi and b is the receipt of that message by process Pi, then Ci(a) < Ci(b).

具体的，根据上面的定义条件，咱们作以下实现规则：

每一个事件对应一个Lamport时间戳，初始值为0
若是事件在节点内发生，本地进程中的时间戳加1
若是事件属于发送事件，本地进程中的时间戳加1并在消息中带上该时间戳
若是事件属于接收事件，本地进程中的时间戳 = Max(本地时间戳，消息中的时间戳) + 1

根据上面的定义，咱们知道a->b，C(a)<C(b)，但若是C(a)=C(b)，那么a,b是什么顺序呢？它们确定不是因果关系，因此它们之间的前后其实并不会影响结果，咱们这里只须要给出一种肯定的方式来定义它们之间的前后就能获得全序关系。

一种可行的方式是利用给进程编号，利用进程编号的大小来排序。假设a、b分别在节点P、Q上发生，Pi、Qj分别表示咱们给P、Q的编号，若是 C(a)=C(b) 而且 Pi<Qj，一样定义为a发生在b以前，记做 a⇒b（全序关系）。假如咱们上图的A、B、C分别编号Ai=一、Bj=二、Ck=3，因 C(B4)=C(C3) 而且 Bj<Ck，则 B4⇒C3。

经过以上定义，咱们能够对全部事件排序，得到事件的全序关系(total order)。上图例子，咱们能够进行排序：C1⇒B1⇒B2⇒A1⇒B3⇒A2⇒C2⇒B4⇒C3⇒A3⇒B5⇒C4⇒C5⇒A4。观察上面的全序关系你能够发现，从时间轴来看B5是早于A3发生的，可是在全序关系里面咱们根据上面的定义给出的倒是A3早于B5，这是由于Lamport逻辑时钟只保证因果关系（偏序）的正确性，不保证绝对时序的正确性。

【4】尝试用逻辑时钟解决分布式锁的问题

单机多进程程序可由锁进行同步，那是由于这些进程都运行在操做系统上，有center为它们的请求排序，这个center知道全部须要进行同步的进程的全部信息。可是在分布式系统中，各个进程运行在各自的主机上，没有一个center的概念，那分布式系统中多进程该怎么进行同步呢？或者说分布式锁该怎么实现呢？论文中提出了解决这一问题的算法要知足下面三个条件：

(I) A process which has been granted the resource must release it before it can be granted to another process.

(II) Different requests for the resource must be granted in the order in which they are made.

(III) If every process which is granted the resource eventually releases it, then every request is eventually granted.
为了简化问题，咱们作以下假设：

任何两个进程Pi，Pj它们之间接收到的消息的顺序与发送消息的顺序一致，而且每一个消息必定可以被接收到。
每一个进程都维护一个不被其余进程所知的请求队列。而且请求队列初始化为包含一个T0:P0请求，P0用于该共享资源，T0是初始值小于任什么时候钟值

算法以下：

To request the resource, process Pi sends the message Tm:Pi requests resource to every other process, and puts that message on its request queue, where Tm is the timestamp of the message.（请求资源，发送请求给其余进程，在本身的请求队列中添加该请求）
When process Pj receives the message Tm:Pi requests resource, it places it on its request queue and sends a (timestamped) acknowledgment message to Pi.（收到其余进程的请求，放到请求队列中，回应发起请求的进程）
To release the resource, process Pi removes any Tm:Pi requests resource message from its request queue and sends a (timestamped) Pi releases resource message to every other process.（释放资源，从请求队列中移除该资源请求，发送给其余进程，告诉它们我释放了该资源）
When process Pj receives a Pi releases resource message, it removes any Tm:Pi requests resource message from its request queue.（收到其余进程释放资源的消息，从请求队列中移除该资源请求）
Process Pi granted the resource when the following two conditions are satisfied: (i) There is a Tm:Pi requests resource message in its request queue which is ordered before any other request in its queue by the relation ⇒ . (ii) Pi has received a message from every other process timestamped later than Tm.
（判断本身是否能够得到该资源，有两个条件：其一，按全序排序后，Tm:Pi请求在请求队列的最前面；其二，本身Pi已经收到了全部其余进程的时戳大于Tm的消息）

下面咱们举个例子说明上面的算法过程：
初始状态为P0拥有资源，请求队列中为0:0(T0:P0的简写)，然后P1请求资源，将1:1添加到请求队列中，此时P0让占有资源，P1还没法获取资源，等到P0释放资源后，0:0从请求队列中移除（下图中没有画出），此时请求队列中1:1的请求在最前面，同时P1收到了其余两个进程的大于1的回应消息，知足了占有资源的条件，此时P1占有资源。

其实关键思想很简单，既然分布式系统中没有“center”的概念，那我请求共享资源时我就让其余全部进程都知道我要请求该资源，拥有资源的进程释放资源时也告诉全部进程，我要释放该资源，想请求该资源的大家能够按序（逻辑时钟的做用，这里再次说明一下，并不能保证在绝对物理时间上请求的排序）请求了。这样每一个进程都知道其余进程的状态，就至关于有个“center”。

对于分布式锁问题，多个请求不必定是必定按照绝对物理时钟排序才能够，只要咱们有这样一个算法，这个算法能够保证多个进程的请求按照这个算法总能获得同一个排序，就能够了，按照绝对物理时钟排序只是其中一个可行的算法。

到这里是否就万事大吉了呢，其实并无，这个实现是很脆弱的，它要求全部进程都很是可靠，一旦一个进程挂了或出现网络分区的状况，是没法工做的，同时咱们提出的网络要求也很是严格，要求发出的消息必定被接收到，这个在实用的系统中是很难作到的。因此这是一个理想情况下的算法实现，并非一个能够工业级应用的算法实现。但它仍然是很是有意义的，给了咱们关于分布式系统中解决一致性、共识算法等思想启迪。

参考文档：
大神论文：Time, Clocks, and the Ordering of Events in a Distributed System
Lamport timestamps
分布式系统：Lamport 逻辑时钟

关注微信公众号，天天进步一点点！