SolrCloud Wiki翻译(5)读写容错性

时间 2019-11-16

原文原文链接

Read Side Fault Tolerance

“读”容错

With earlier versions of Solr, you had to set up your own load balancer. Now each individual node load balances requests across the replicas in a cluster. You still need a load balancer on the 'outside' that talks to the cluster, or you need a smart client. (Solr provides a smart Java Solrj client called CloudSolrServer.) node

在旧版本的Solr中，你必须本身实现一个负载均衡器，然而如今在集群中的每一个节点均可以把请求自动的负载均衡到全部的replica节点上去。可是对于整个集群来讲，你仍然须要一个外部的负载均衡器，或者是一个智能客户端（Solr已经在Java的客户端Solrj中提供了一个CloudSolrServer的智能客户端）负载均衡

A smart client understands how to read and interact with ZooKeeper and only requests the ZooKeeper ensembles' address to start discovering to which nodes it should send requests. ide

智能的客户端知道怎么去读取ZooKeeper里面的信息并和ZooKeeper交互，并且只经过请求ZooKeeper集群来判断应该向哪一个节点发送请求。 this

Write Side Fault Tolerance

“写”容错

SolrCloud supports near real-time actions, elasticity, high availability, and fault tolerance. What this means, basically, is that when you have a large cluster, you can always make requests to the cluster, and if a request is acknowledged you are sure it will be durable; i.e., you won't lose data. Updates can be seen right after they are made and the cluster can be expanded or contracted. spa

SolrCloud支持一些近实时操做、弹性伸缩、高可用和可容错的特性。这意味着，基本上只要你有一个大型集群，你就能够一直把请求发送到这个集群中去，而且只要这个请求是节点公认的，就能够肯定这个请求操做能够一直使用；好比，你不会在集群中丢失任何数据。全部的更新操做只要在完成以后而且集群能够正常的伸缩的话，结果均可以正确可见。 orm

Recovery

数据恢复

A Transaction Log is created for each node so that every change to content or organization is noted. The log is used to determine which content in the node should be included in a replica. When a new replica is created, it refers to the Leader and the Transaction Log to know which content to include. If it fails, it retries. 索引

每个节点都会建立一个Transaction Log来记录全部索引内容或结构的变动。这个Log被用来肯定在各个replica节点中应该包含哪些索引内容。当一个新的replica节点建立以后，它会查阅Leader节点和它的Transaction Log来了解本身应该包含哪些索引内容。若是这个过程失败了的话，它会自动重试。 ci

Since the Transaction Log consists of a record of updates, it allows for more robust indexing because it includes redoing the uncommitted updates if indexing is interrupted. 同步

Transaction Log由一个保存了一系列的更新操做的记录构成，它能增长索引操做的健壮性，由于只要某个节点在索引操做过程当中意外中断了，它能够重作全部未提交的更新操做。 requests

If a leader goes down, it may have sent requests to some replicas and not others. So when a new potential leader is identified, it runs a synch process against the other replicas. If this is successful, everything should be consistent, the leader registers as active, and normal actions proceed. If the a replica is too far out of synch, the system asks for a full replication/replay-based recovery.

假如一个leader节点宕机了，可能它已经把请求发送到了一些replica节点可是却没有发送到另外一些却没有发送，因此在一个新的leader节点在被选举出来以前，它会依靠其余replica节点来运行一个同步处理操做。若是这个操做成功了的话，全部节点的数据就都保持一致了，而后leader节点把本身注册为活动节点，普通的操做就会被处理。若是一个replica节点的数据脱离总体同步太多了的话，系统会请求执行一个全量的基于普通的replication同步恢复。

an update fails because cores are reloading schemas and some have finished but others have not, the leader tells the nodes that the update failed and starts the recovery procedure.

一个更新操做可能在core在加载schema的时候失败，由于一些节点可能已经加载完成了，另外一些节点却没有，leader节点会告诉那些更新数据失败的节点启动一个回复处理。