ZooKeeper官方文档翻译——ZooKeeper Overview 3.4.6

ZooKeeperhtml

ZooKeeper: A Distributed Coordination Service for Distributed Applications

ZooKeeper is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.java

Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.node

Zookeeper是一个为分布式应用提供分布式、开源的调度服务。它暴露一组简单的基本架构,分布式应用能够在其上面来实现高层次服务用于同步、维护配置、分组和命名。它被设计得容易编程,在类似的文件系统树结构目录下使用一个数据模型。它运行在java环境上和绑定Java和C。ios

调度服务是出了名的难。它们特别容易出错例如竞态条件和死锁。ZooKeeper的动机是减轻分布式应用从零开始实现调度服务的责任。算法

Design Goals (设计目的)

ZooKeeper is simple. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system. The name space consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can acheive high throughput and low latency numbers.数据库

The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access. The performance aspects of ZooKeeper means it can be used in large, distributed systems. The reliability aspects keep it from being a single point of failure. The strict ordering means that sophisticated synchronization primitives can be implemented at the client.apache

ZooKeeper是简单的。ZooKeeper容许分布式进程经过一个共享的跟标准文件系统类似的架构的层级命名空间来互相调度。命名空间包含称为znodes的数据寄存器(在ZooKeeper的说法中),这些相似于文件和目录。不像传统的文件系统被设计用于存储,ZooKeeper数据是保存在内存中,那就意味着ZooKeeper可以得到高吞吐量和低延迟。编程

ZooKeeper实现高性能、高可能性和严格的访问命令。性能方面意味着它能够用在大型分布式系统。可靠性方面使它避免了单点故障。严格的访问命令意味着复杂的同步原语能够在客户端实现。缓存

ZooKeeper is replicated. Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts called an ensemble.服务器

ZooKeeper是可复制的。像它所调度的分布式进程,ZooKeeper他自己也是能够被复制来构成一组集合。

 ZooKeeper Service

The servers that make up the ZooKeeper service must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store. As long as a majority of the servers are available, the ZooKeeper service will be available.

Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.

构成ZooKeeper服务的全部服务器都必须知道彼此。它们维持着一个状态相关的内存图像,和事务日志和快照保存在一个持久化的仓库。只要大多数服务器是可用的,那么ZooKeeper服务就可用。

客户端与一个单独的服务器创建链接。它们之间经过发送请求、得到回复,得到观察事件和发送心跳来维持一个TCP链接。若是客户端与服务器的TCP链接断开了,那么客户端会去链接另外一个服务器。

ZooKeeper is ordered. ZooKeeper stamps each update with a number that reflects the order of all ZooKeeper transactions. Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives.

ZooKeeper是有序的。ZooKeeper用一个数字来记录每一个反映全部ZooKeeper事务的顺序。后续的操做可使用顺序来实现高水平的抽象,例如同步原语。

ZooKeeper is fast. It is especially fast in "read-dominant" workloads. ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

ZooKeeper是快速的。尤为是在读取性能特性明显。ZooKeeper应用运行在成千台机器上,而且它在读取上比写入表现得更好,比率大概为10:1

Data model and the hierarchical namespace

The name space provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node in ZooKeeper's name space is identified by a path.

ZooKeeper所提供的命名空间跟标准文件系统很类似。路径中一系列元素是用斜杠(/)分隔的。每一个节点在ZooKeper命名空间中是用路径来识别的。

   

        ZooKeeper's Hierarchical Namespace      

Nodes and ephemeral nodes(节点和临时节点)

Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory. (ZooKeeper was designed to store coordination data: status information, configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.) We use the term znode to make it clear that we are talking about ZooKeeper data nodes.

Znodes maintain a stat structure that includes version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode's data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.

The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.

ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Ephemeral nodes are useful when you want to implement [tbd].

不像标准的文件系统,ZooKeeper命名空间中每一个节点拥有与它以及它的孩子有关的数据。就像拥有一个文件系统同样容许一个文件也作为一个目录。(ZooKeeper被设计为储存调度数据:状态信息,配置信息、位置信息等等,因此储存每一个节点中的数据一般很小,在字节到千字节之间)。当咱们讨论ZooKeeper数据节点时使用“znode”这个称呼使得表述清晰。

Znodes维持一个状态结构包括数据改变的状态码,ACL改变和时间戳,容许缓存验证和调度更新信息。znodes的每一个时间点的数据改变,版本号会增长。例如,当客户端得到数据时也接收到数据的版本。

储存在每一个znode命名空间中的数据的读写都是原子性的。读取时得到与znode相关联的全部数据,写入时替换全部数据。每一个节点都有严格的准入控制来限制谁能够作什么。

ZooKeeper也拥有临时节点的概念。这些节点一直存在只要建立这些节点的会话仍是活跃的。当会话结束时节点被删除。当你想要实现临时节点是有用(待定)。

Conditional updates and watches(条件更新和监控)

ZooKeeper supports the concept of watches. Clients can set a watch on a znodes. A watch will be triggered and removed when the znode changes. When a watch is triggered the client receives a packet saying that the znode has changed. And if the connection between the client and one of the Zoo Keeper servers is broken, the client will receive a local notification. These can be used to [tbd].

ZooKeeper支持监控的概念。客户端对znode设置一个监控。当znode改变时监控会触发并移除。当一个监控触发时客户端会收到一个数据包包含znode已经改变的信息。若是当客户端和ZooKeeper服务器的链接断开,客户端将会收到一个本地通知。这些均可以用来(待定)

Guarantees(保证)

ZooKeeper is very fast and very simple. Since its goal, though, is to be a basis for the construction of more complicated services, such as synchronization, it provides a set of guarantees. These are:

    • Sequential Consistency - Updates from a client will be applied in the order that they were sent.
    • Atomicity - Updates either succeed or fail. No partial results.
    • Single System Image - A client will see the same view of the service regardless of the server that it connects to.
    • Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
    • Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.

For more information on these, and how they can be used, see [tbd]

ZooKeeper很是快和很是简单。然而它一直以来的目标,是做为更多复杂服务结构的基础,例如同步,提供一系列的保证。它们是:

    • 顺序一致性:来自客户端的更新会按照它们的发送顺序进行应用。
    • 原子性:更新只有成功或者失败,没有中间状态
    • 单一系统图像:客户端不管链接哪一个服务器,它所获得ZooKeeper服务的图像都是一致的
    • 可靠性:一旦更新被应用,那么它将会一直持续保存直到更新被覆盖。
    • 时效性:系统的客户端视图在一个特定的时间里都保证是最新的。

更多关于这些保证的信息和如何使用,能够看[待定]

Simple API

One of the design goals of ZooKeeper is provide a very simple programming interface. As a result, it supports only these operations:

ZooKeeper的一个设计目标就是提供简单的编程接口。所以,他只提供这些操做:

create(建立)

creates a node at a location in the tree(在树结构位置中建立一个节点)

delete(删除)

deletes a node(删除一个节点)

exists(判断是否存在)

tests if a node exists at a location(判断一个节点是否存在么讴歌位置上)

get data(获取数据)

reads the data from a node(从一个节点读取数据)

set data(设置数据)

writes data to a node(往一个节点里写入数据)

get children(得到子集)

retrieves a list of children of a node(得到一个节点的子集)

sync(同步)

waits for data to be propagated(等待数据同步到每一个节点上)

For a more in-depth discussion on these, and how they can be used to implement higher level operations, please refer to [tbd]

这些方法的深刻讨论和如何是用来实现更高程度的操做,请参考[tbd]

Implementation(实现)

ZooKeeper Components shows the high-level components of the ZooKeeper service. With the exception of the request processor, each of the servers that make up the ZooKeeper service replicates its own copy of each of components.

    ZooKeeper Components

                ZooKeeper Components   

The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.

Every ZooKeeper server services clients. Clients connect to exactly one server to submit irequests. Read requests are serviced from the local replica of each server database. Requests that change the state of the service, write requests, are processed by an agreement protocol.

As part of the agreement protocol all write requests from clients are forwarded to a single server, called the leader. The rest of the ZooKeeper servers, called followers, receive message proposals from the leader and agree upon message delivery. The messaging layer takes care of replacing leaders on failures and syncing followers with leaders.

ZooKeeper uses a custom atomic messaging protocol. Since the messaging layer is atomic, ZooKeeper can guarantee that the local replicas never diverge. When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.

ZooKeeper Components 展现了ZooKeeper服务的高级别的组件。除了请求处理器,组成ZooKeeper服务的每一个服务器复制它自己每一个组件的副本。

副本数据库是一个包含整个数据树的内存数据库。更新信息将被记录在磁盘中保证可恢复性,在它们被写到内存数据库以前序列化写到磁盘中。

每一个ZooKeeper服务器服务客户端。客户端链接到一个正确的服务器来提交请求。读取请求是由每一个服务器的数据库的本地副本提供服务的。改变服务的请求和写请求都是由一致性协议来处理的。

全部来自客户端的写请求做为协议的一部分都将转发给一个单独的服务器,称之为leader。剩下的ZooKeeper服务器称之为followers,接收来自leader的信息提案和达成信息传输的一致性。消息传递层负责leaders的失效替换和同步leaders和followers。

ZooKeeper使用一个自定义的原子的消息传递协议。因此消息传递层是原子性的。ZooKeeper能够保证本地副本不会分割。当leader服务器收到一个写请求,它会计算这个写入操做执行时系统的状态和获取这个操做转化成一个事务的新状态。

Uses(使用)

The programming interface to ZooKeeper is deliberately simple. With it, however, you can implement higher order operations, such as synchronizations primitives, group membership, ownership, etc. Some distributed applications have used it to: [tbd: add uses from white paper and video presentation.] For more information, see [tbd]

ZooKeeper的编程接口特地设计得简单。然而,你可使用它来实现高层次的命令操做,例如同步原语,组的成员关系。全部权等。一些分布式应用可使用它。

Performance(性能)

ZooKeeper is designed to be highly performant. But is it? The results of the ZooKeeper's development team at Yahoo! Research indicate that it is. (See ZooKeeper Throughput as the Read-Write Ratio Varies.) It is especially high performance in applications where reads outnumber writes, since writes involve synchronizing the state of all servers. (Reads outnumbering writes is typically the case for a coordination service.)

ZooKeeper是设计成高性能的,可是真的这样么?ZooKeeper在雅虎的研发团队研究结果显明它真的如此。(看ZooKeeper Throughput as the Read-Write Ratio Varies.)应用在读取性能上表现地写性能高得多,由于写操做要涉及全部服务器的同步。(在调度服务中读性能超过写性能是广泛的状况)

   ZooKeeper Throughput as the Read-Write Ratio Varies

       ZooKeeper Throughput as the Read-Write Ratio Varies

The figure ZooKeeper Throughput as the Read-Write Ratio Varies is a throughput graph of ZooKeeper release 3.2 running on servers with dual 2Ghz Xeon and two SATA 15K RPM drives. One drive was used as a dedicated ZooKeeper log device. The snapshots were written to the OS drive. Write requests were 1K writes and the reads were 1K reads. "Servers" indicate the size of the ZooKeeper ensemble, the number of servers that make up the service. Approximately 30 other servers were used to simulate the clients. The ZooKeeper ensemble was configured such that leaders do not allow connections from clients.

 ZooKeeper Throughput as the Read-Write Ratio Varies 图是ZooKeeper3.2发布版本运行在配置为两个2GHz的至强芯片和两个SATA 15K RPM驱动器上的吞吐量图表。一个驱动器用来ZooKeeper专用的日志设备。快照写到系统驱动。1K的读和1K的写。“服务器”数代表ZooKeeper集群的大小,服务器的数量构成服务。大概30个服务器用于模拟客户端。ZooKeeper集群配置leaders不容许客户端的链接。

Note(说明)

In version 3.2 r/w performance improved by ~2x compared to the previous 3.1 release.

Benchmarks also indicate that it is reliable, too. Reliability in the Presence of Errors shows how a deployment responds to various failures. The events marked in the figure are the following:

    1. Failure and recovery of a follower
    2. Failure and recovery of a different follower
    3. Failure of the leader
    4. Failure and recovery of two followers
    5. Failure of another leader

3.2版本比以前3.1版本提升了两倍性能。

基准测试也代表它的可靠性。Reliability in the Presence of Errors 展现了部署的框架如何应用各类失效。下面是图像中标志的事件:

    1. follower的失效和恢复
    2. 不一样的follower的失效和恢复
    3. leader的失效
    4. 两个follower的失效和恢复
    5. 另外一个 leader 的失效

Reliability(可靠性)

To show the behavior of the system over time as failures are injected we ran a ZooKeeper service made up of 7 machines. We ran the same saturation benchmark as before, but this time we kept the write percentage at a constant 30%, which is a conservative ratio of our expected workloads.

展现运行在7台机器上的ZooKeeper服务在故障发生后随着时间的推动系统的行为。咱们运行跟上面测试一样的环境上,但此次只保持30%的写入,保持在一个保守的负载。

    Reliability in the Presence of Errors

                                   Reliability in the Presence of Errors

The are a few important observations from this graph. First, if followers fail and recover quickly, then ZooKeeper is able to sustain a high throughput despite the failure. But maybe more importantly, the leader election algorithm allows for the system to recover fast enough to prevent throughput from dropping substantially. In our observations, ZooKeeper takes less than 200ms to elect a new leader. Third, as followers recover, ZooKeeper is able to raise throughput again once they start processing requests.

从图表中咱们获得一些重要的观察。第一,若是followers失效和迅速恢复,zooKeeper可以保持一个高吞吐量无视失效。可是可能重要的是,leader选举算法容许系统快速恢复来避免吞吐量的大幅降低。在咱们的观察当中,ZooKeeper只须要不到200ms来选举中一个新的leader。第三,随着follower恢复,ZooKeeper可以提升吞吐量一旦他们开始处理请求。

The ZooKeeper Project(ZooKeeper项目)

ZooKeeper has been successfully used in many industrial applications. It is used at Yahoo! as the coordination and failure recovery service for Yahoo! Message Broker, which is a highly scalable publish-subscribe system managing thousands of topics for replication and data delivery. It is used by the Fetching Service for Yahoo! crawler, where it also manages failure recovery. A number of Yahoo! advertising systems also use ZooKeeper to implement reliable services.

All users and developers are encouraged to join the community and contribute their expertise. See the Zookeeper Project on Apache for more information.

ZooKeeper已经成功运行在许多单独的项目中。它被Yahoo!用来做为Yahoo!消息中间件,一个具备高可扩展性的用于管理上千个话题的复制和数据传输的发布-订阅系统的调度和失效恢复服务。也用在Yahoo!爬虫程序中管理失效恢复。大量的Yahoo!广告系统也它来实现可靠地服务。

 

*因为译者自己能力有限,因此译文中确定会出现表述不正确的地方,请你们多多包涵,也但愿你们可以指出文中翻译得不对或者不许确的地方,共同探讨进步,谢谢。

相关文章
相关标签/搜索