SolrCloud wiki翻译(6)近实时搜索, 索引复制,灾难恢复

SolrCloud and Replication

SolrCloud与索引复制

Replication ensures redundancy for your data, and enables you to send an update request to any node in the shard.  If that node is a replica, it will forward the request to the leader, which then forwards it to all existing replicas, using versioning to make sure every replica has the most up-to-date version.  This architecture enables you to be certain that your data can be recovered in the event of a disaster, even if you are using Near Real Time searching. node

索引复制确保为你的数据提供了冗余,而且你能够把一个更新请求发送到shard里面的任意一个节点。若是收到请求的节点是replica节点,它会把请求转发给leader节点,而后leader节点会把这个请求转发到全部存活的replica节点上去,他们经过使用版本控制来确保每一个replica节点的数据都是最新的版本。SolrCloud的这种结构让数据可以在一个灾难事故以后恢复,即使你正在使用的是一个近实时的搜索系统。 apache

Near Real Time Searching

近实时搜索

If you want to use the NearRealtimeSearch support, enable auto soft commits in your solrconfig.xml file before storing it into Zookeeper. Otherwise you can send explicit soft commits to the cluster as you need. 服务器

若是你想要得到近实时搜索的支持,在solrconfig.xml放到ZooKeeper以前打开索引自动softCommit的特性。另外若是你须要的话能够明确的发送一个softCommit请求给集群。 架构

SolrCloud doesn't work very well with separated data clusters connected by an expensive pipe. The root problem is that SolrCloud's architecture sends documents to all the nodes in the cluster (on a per-shard basis), and that architecture is really dictated by the NRT functionality. app

若是你的数据分布在一个节点之间传输数据代价很是高的集群中,那么SolrCloud可能不会运行的很好。其根本缘由是由于SolrCloud的架构会把文档发送给集群中的全部节点(会在每一个shard的节点之间发送),而这种架构其实是基于近实时功能的。 ide

Imagine that you have a set of servers in China and one in the US that are aware of each other. Assuming 5 replicas, a single update to a shard may make multiple trips over the expensive pipe before it's all done, probably slowing indexing speed unacceptably. ui

想象一下你有一系列的服务器是在放在中国,还有一些放在美国,而且它们都知道彼此的存在。假设有5个replica节点,一个发送给shard的单独请求在完成以前可能在高代价的链接上传输屡次,极可能把索引速度拖慢到一个不可接受的程度。 this

So the SolrCloud recommendation for this situation is to maintain these clusters separately; nodes in China don't even know that nodes exist in the US and vice-versa.  When indexing, you send the update request to one node in the US and one in China and all the node-routing after that is local to the separate clusters. Requests can go to any node in either country and maintain a consistent view of the data. spa

所以SolrCloud对这种状况的建议是把这些集群分开维护;放在中国的节点不用知道放在美国的节点的存在,反之亦然。当索引的时候,你把更新请求发送到一个放在美国的节点同时也发送到一个放在中国的节点,而后发送以后两个分开的集群之间的节点路由都是在各自集群本地进行的。 版本控制

However, if your US cluster goes down, you have to re-synchronize the down cluster with up-to-date information from China. The process requires you to replicate the index from China to the repaired US installation and then get everything back up and working.

然而,若是你在美国的集群宕机了,你必须将最新的数据相关信息从中国的机器上从新同步到美国的集群中。这个处理须要你把索引从中国的机器上拷贝到美国的集群中,而后备份好数据就能够继续正常工做了。

Disaster Recovery for an NRT system

近实时系统的灾难恢复

Use of Near Real Time (NRT) searching affects the way that systems using SolrCloud behave during disaster recovery.

使用近实时搜索会影响使用SolrCloud的系统在灾难恢复时候的行为方式。

The procedure outlined below assumes that you are maintaining separate clusters, as described above.   Consider, for example, an event in which the US cluster goes down (say, because of a hurricane), but the China cluster is intact.  Disaster recovery consists of creating the new system and letting the intact cluster create a replicate for each shard on it, then promoting those replicas to be leaders of the newly created US cluster.

下面所述的处理过程是假设你正在维护一个分开的集群,跟上面所述的状况同样。考虑到以下这个例子,在美国的集群出现了宕机事件(能够说是由于一场飓风),可是在中国的集群倒是无缺无损的。灾难恢复由如下流程构成,首先建立一个新的系统而且让完整的集群在这个系统里面为每个shard都建立一个replica节点,而后把这些replica节点所有晋升为新建立的美国集群里面的leader节点。

Here are the steps to take:

以下是须要进行的步骤:

  1. Take the downed system offline to all end users.
  2. Take the indexing process offline.
  3. Repair the system.
  4. Bring up one machine per shard in the repaired system as part of the ZooKeeper cluster on the good system, and wait for replication to happen, creating a replica on that machine.  (SoftCommits will not be repeated, but data will be pulled from the transaction logs if necessary.)
    Icon

    SolrCloud will automatically use old-style replication for the bulk load. By temporarily having only one replica, you'll minimize data transfer across a slow connection.

  5. Bring the machines of the repaired cluster down, and reconfigure them to be a separate Zookeeper cluster again, optionally adding more replicas for each shard.
  6. Make the repaired system visible to end users again.
  7. Start the indexing program again, delivering updates to both systems.
  1. 使宕机的系统对全部用户来讲都变成离线状态。
  2. 中止提供索引处理服务。
  3. 修复系统。
  4. 从待修复系统中每一个shard拿一台机器加入到没有问题的系统中做为ZooKeeper集群的一部分,而后等待索引复制的开始,在每台机器上都会建立一个replica节点。(软提交不会被复制,可是若是有必要的话会从事务日志中拉取相关的数据)SolrCloud会自动的使用旧的主从方式来进行索引的批量加载。因为你只是临时的建立一个replica节点,因此经过慢速的链接传输的数据会减小到最少。
  5. 把这些原本是待修复集群中的机器停掉,而后把它们从新配置成一个分开的ZooKeeper集群,能够为每一个shard添加更多的replica节点。
  6. 让修复的系统对全部终端用户可见。
  7. 从新启动索引程序,把更新请求同时分发给两个系统。


    全文完

相关文章
相关标签/搜索