Replication ensures redundancy for your data, and enables you to send an update request to any node in the shard. If that node is a replica, it will forward the request to the leader, which then forwards it to all existing replicas, using versioning to make sure every replica has the most up-to-date version. This architecture enables you to be certain that your data can be recovered in the event of a disaster, even if you are using Near Real Time searching. node
索引复制确保为你的数据提供了冗余,而且你能够把一个更新请求发送到shard里面的任意一个节点。若是收到请求的节点是replica节点,它会把请求转发给leader节点,而后leader节点会把这个请求转发到全部存活的replica节点上去,他们经过使用版本控制来确保每一个replica节点的数据都是最新的版本。SolrCloud的这种结构让数据可以在一个灾难事故以后恢复,即使你正在使用的是一个近实时的搜索系统。 apache
If you want to use the NearRealtimeSearch support, enable auto soft commits in your solrconfig.xml file before storing it into Zookeeper. Otherwise you can send explicit soft commits to the cluster as you need. 服务器
若是你想要得到近实时搜索的支持,在solrconfig.xml放到ZooKeeper以前打开索引自动softCommit的特性。另外若是你须要的话能够明确的发送一个softCommit请求给集群。 架构
SolrCloud doesn't work very well with separated data clusters connected by an expensive pipe. The root problem is that SolrCloud's architecture sends documents to all the nodes in the cluster (on a per-shard basis), and that architecture is really dictated by the NRT functionality. app
若是你的数据分布在一个节点之间传输数据代价很是高的集群中,那么SolrCloud可能不会运行的很好。其根本缘由是由于SolrCloud的架构会把文档发送给集群中的全部节点(会在每一个shard的节点之间发送),而这种架构其实是基于近实时功能的。 ide
Imagine that you have a set of servers in China and one in the US that are aware of each other. Assuming 5 replicas, a single update to a shard may make multiple trips over the expensive pipe before it's all done, probably slowing indexing speed unacceptably. ui
想象一下你有一系列的服务器是在放在中国,还有一些放在美国,而且它们都知道彼此的存在。假设有5个replica节点,一个发送给shard的单独请求在完成以前可能在高代价的链接上传输屡次,极可能把索引速度拖慢到一个不可接受的程度。 this
So the SolrCloud recommendation for this situation is to maintain these clusters separately; nodes in China don't even know that nodes exist in the US and vice-versa. When indexing, you send the update request to one node in the US and one in China and all the node-routing after that is local to the separate clusters. Requests can go to any node in either country and maintain a consistent view of the data. spa
所以SolrCloud对这种状况的建议是把这些集群分开维护;放在中国的节点不用知道放在美国的节点的存在,反之亦然。当索引的时候,你把更新请求发送到一个放在美国的节点同时也发送到一个放在中国的节点,而后发送以后两个分开的集群之间的节点路由都是在各自集群本地进行的。 版本控制
However, if your US cluster goes down, you have to re-synchronize the down cluster with up-to-date information from China. The process requires you to replicate the index from China to the repaired US installation and then get everything back up and working.
然而,若是你在美国的集群宕机了,你必须将最新的数据相关信息从中国的机器上从新同步到美国的集群中。这个处理须要你把索引从中国的机器上拷贝到美国的集群中,而后备份好数据就能够继续正常工做了。
Use of Near Real Time (NRT) searching affects the way that systems using SolrCloud behave during disaster recovery.
使用近实时搜索会影响使用SolrCloud的系统在灾难恢复时候的行为方式。
The procedure outlined below assumes that you are maintaining separate clusters, as described above. Consider, for example, an event in which the US cluster goes down (say, because of a hurricane), but the China cluster is intact. Disaster recovery consists of creating the new system and letting the intact cluster create a replicate for each shard on it, then promoting those replicas to be leaders of the newly created US cluster.
下面所述的处理过程是假设你正在维护一个分开的集群,跟上面所述的状况同样。考虑到以下这个例子,在美国的集群出现了宕机事件(能够说是由于一场飓风),可是在中国的集群倒是无缺无损的。灾难恢复由如下流程构成,首先建立一个新的系统而且让完整的集群在这个系统里面为每个shard都建立一个replica节点,而后把这些replica节点所有晋升为新建立的美国集群里面的leader节点。
Here are the steps to take:
以下是须要进行的步骤:
SolrCloud will automatically use old-style replication for the bulk load. By temporarily having only one replica, you'll minimize data transfer across a slow connection.
全文完