SolrCloud集群Collection进行手动二次Sharding--solr分片相关

时间 2019-11-09

标签 solrcloud 集群 collection 进行手动二次 sharding solr 分片相关栏目云服务繁體版

原文原文链接

solrCloud路由

SolrCloud路由

SolrCloud中，提供了两种路由算法：web

compositeIdimplicit 在建立Collection时，须要经过router.name指定路由策略，默认为compositeId路由。算法

compositeId

该路由为一致性哈希路由，shards的哈希范围从80000000~7fffffff。初始建立collection是必须指定numShards，compositeId路由算法根据numShards的个数，计算出每一个shard的哈希范围，所以路由策略不能够扩展shard。apache

implicit

该路由方式指定索引具体落在路由到哪一个Shard，这与compositeId路由方式索引可均匀分布在每一个shard上不一样。同时只有在implicit路由策略下才可建立shard。json

利用solrJ新建索引时，须要在代码中指定索引具体落在哪一个shard上，添加代码：tomcat

doc.addField("_route_", "shard_X");

同时在schema.xml添加字段负载均衡

<field name="_route_" type="string"/>

利用URL建立implicit路由方式collection：分布式

http://10.21.17.200:9580/solr-5.0.0-web/admin/collections?action=CREATE&name=testimplicit&router.name=implicit&shards=shard1,shard2,shard3

SolrRouter源码

在Solr源码中，能够看到，Solr路由的基类为DocRouter抽象类，HashBasedRouter和ImplicitDouter继承自DocRouter，同时CompositeIdRouter又继承HashBasedRouter抽象类，经过一个工具Hash类实现Document的路由策略。

建立Collection

Solr建立Collection的两种方式：

经过前台界面Add Core建立collection

因为在tomcat，setenv.sh，设置-DnumShards=7，因此该collection有7个shards。

须要注意的是：使用compositeId路由建立collection，指定numShards后，不可扩展Shard，即便勉强增长Shard，新建索引也不会落在该Shard上。查看clusterstate.json，可看到新建shard的"range":null

URL建立collection
经过URL建立collection须要知足条件：num of (shards + replications)< num of live nodes

测试环境中3台solr机器，建立collection URL为：

http://10.21.17.200:9580/solr-4.10.0/admin/collections?action=CREATE&name=collection1&router.name=compositeId&numShards=5&replicationFactor=1

执行结果报错

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:Cannot create collection collection1. Value of maxShardsPerNode is 1, and thenumber of live nodes is 3. This allows a maximum of 3 to be created. Value ofnumShards is 5 and value of replicationFactor is 1. This requires 5 shards tobe created (higher than the allowed number)

</str>

报错缘由不知足 5 + 1 < 3

数据迁移

在某些场景中，须要对SolrCloud进行扩容或数据迁移。

根据以上讨论的两种路由算法，implicit实现该需求比较简单，只要建立Shard便可，新建索引时，将索引建到新建Shard上，查询操做，指定collection名称，获得的还是整个集群返回的结果。

compositeId路由实现上述需求稍微麻烦一下，经过分裂（SPLITSHARD）操做实现。以下图，对Shard1进行分裂，分裂URL为：

http://10.21.17.200:9580/solr-4.10.0-web/admin/collections?action=SPLITSHARD&collection=log4j201503&shard=shard1此时Shard1的数据会平均分布到shard1_0和shard1_1上，在利用DELETESHARD API删除Shard1，便可保证数据不冗余

关于SolrCloud的索引分片和查询的可加强点

Solr4.0包含了分布式的sorl解决方案solrCloud，能够作sharding切分，每一个sharding中节点支持选举算法（leader,replica），在sharding里面支持query的负载均衡。
在集群启动时，就须要声明当shard、collection等信息，启动过程当中把集群的状态信息维护在zookeeper节点里。
集群中的任何一台server均可以响应客户端的请求，包括索引操做和查询操做。

对于索引操做，solrCloud提供了简单的分片算法，即根据当前的索引记录的ID值作hash操做，后根据zookeeper中维护的集群的相关状态（Collection,RangeInfo,Range<min,max>）去查找hash值在哪一个Range中，找到对应的shard；在该shard中 leader 中创建索引，Leader节点更新结束完成，最后将版本号和文档转发给同属于一个Shard的replicas节点。不过在创建索引时，shard的算法没有考虑到负载均衡，有可能往一个shard中一直插入，因此须要本身考虑进行shard的切分负载均衡。

SolrCloud集群Collection进行手动二次Sharding--solr分片相关