ES ElasticSearch 7.x 下动态扩大索引的shard数量

时间 2021-01-19

标签 html json api 数据结构 app curl elasticsearch ide 测试栏目日志分析繁體版

原文原文链接

ES ElasticSearch 7.x 下动态扩大索引的shard数量

背景

在老版本的ES（例如2.3版本）中， index的shard数量定好后，就不能再修改，除非重建数据才能实现。html

从ES6.1开始，ES 支持能够在线操做扩大shard的数量（注意：操做期间也须要对index锁写）json

从ES7.0开始，split时候，再也不须要加参数 index.number_of_routing_shardsapi

具体参考官方文档：数据结构

https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices-split-index.htmlapp

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-split-index.htmlcurl

split的过程：elasticsearch

一、建立一个新的目标index，其定义与源index相同，可是具备更多的primary shard。ide

二、将segment从源index硬连接到目标index。（若是文件系统不支持硬连接，则将全部segment都复制到新索引中，这是一个很是耗时的过程。）测试

三、建立低级文件后，再次对全部文档进行哈希处理，以删除属于不一样shard的documentsui

四、恢复目标索引，就像它是刚刚从新打开的封闭索引同样。

为啥ES不支持增量resharding？

从N个分片到N + 1个分片。增量从新分片确实是许多键值存储支持的功能。仅添加一个新的分片并将新的数据推入该新的分片是不可行的：这多是一个索引瓶颈，并根据给定的_id来肯定文档所属的分片，这对于获取，删除和更新请求是必需的，会变得很复杂。这意味着咱们须要使用其余哈希方案从新平衡现有数据。

键值存储有效执行此操做的最多见方式是使用一致的哈希。当分片的数量从N增长到N + 1时，一致的哈希仅须要重定位键的1 / N。可是，Elasticsearch的存储单位（碎片）是Lucene索引。因为它们以搜索为导向的数据结构，仅占Lucene索引的很大一部分，即仅占5％的文档，将其删除并在另外一个分片上创建索引一般比键值存储要高得多的成本。如上节所述，当经过增长乘数来增长分片数量时，此成本保持合理：这容许Elasticsearch在本地执行拆分，这又容许在索引级别执行拆分，而不是为须要从新索引的文档从新编制索引移动，以及使用硬连接进行有效的文件复制。

对于仅追加数据，能够经过建立新索引并将新数据推送到其中，同时添加一个别名来覆盖读取操做的新旧索引，从而得到更大的灵活性。假设旧索引和新索引分别具备M和N个分片，与搜索具备M + N个分片的索引相比，这没有开销。

索引能进行split的前提条件：

一、目标索引不能存在。

二、源索引必须比目标索引具备更少的primary shard。

三、目标索引中主shard的数量必须是源索引中主shard的数量的倍数。

四、处理拆分过程的节点必须具备足够的可用磁盘空间，以容纳现有索引的第二个副本。

操做

下面是具体的实验部分：

tips：实验机器有限，索引的replica都设置为0，生产上至少replica>=1

建立一个索引，2个主shard，没有副本

curl -s -X PUT "http://localhost:9200/twitter?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.number_of_shards": 2,
    "index.number_of_replicas": 0
  },
    "aliases": {
    "my_search_indices": {}
  }
}'

# index.number_of_shards：主分片设定个数
# index.number_of_replicas：副本分片设定个数，一个副本就等于把整个索引备份1份
# aliases：设定索引别名"my_search_indices"

# 写入几条测试数据

curl -s -X PUT "http://localhost:9200/my_search_indices/_doc/11?pretty" -H 'Content-Type: application/json' -d '{
  "id": 11,
  "name":"lee",
  "age":"23"
}'
curl -s -X PUT "http://localhost:9200/my_search_indices/_doc/22?pretty" -H 'Content-Type: application/json' -d '{
  "id": 22,
  "name":"amd",
  "age":"22"
}'

# 查询数据

curl -s -XGET "http://localhost:9200/my_search_indices/_search" | jq .

对索引锁写，以便下面执行split操做

curl -s -X PUT "http://localhost:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.blocks.write": true
  }
}'

# index.blocks.write：写入锁定，只能读，不能写

# 写数据测试，确保锁写生效

curl -s -X PUT "http://localhost:9200/twitter/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'

# 测试写入失败

# 取消 twitter 索引的alias

curl -s -X POST "http://localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d '{
    "actions" : [
        { "remove" : { "index" : "twitter", "alias" : "my_search_indices" } }
    ]
}'

curl -s -X GET "http://localhost:9200/_cat/aliases"

第二种方式：

# 取消索引别名
curl -s -X DELETE "http://localhost:9200/twitter/_alias/my_search_indices"

curl -s -X GET "http://localhost:9200/_cat/aliases"

开始执行 split 切分索引的操做，调整后索引名称为new_twitter，且主shard数量为8

curl -s -X POST "http://localhost:9200/twitter/_split/new_twitter?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.number_of_shards": 8,
    "index.number_of_replicas": 0
  }
}'

# 对新的index添加alias

curl -s -X POST "http://localhost:9200/_aliases?pretty" -H 'Content-Type: application/json' -d '{
    "actions" : [
        { "add" : { "index" : "new_twitter", "alias" : "my_search_indices" } }
    ]
}'

第二种方式：

# 新建索引别名
curl -s -X PUT "http://localhost:9200/new_twitter/_alias/my_search_indices"

结果：

{
 "acknowledged" : true,
 "shards_acknowledged" : true,
 "index" : "new_twitter"
}

补充：

查看split的进度，可使用 _cat/recovery 这个api，或者在 cerebro 界面上查看。

查看新索引的数据，能正常查看

curl -s -XGET "http://localhost:9200/my_search_indices/_search" | jq .

查看split的进度，可使用 _cat/recovery 这个api，或者在 cerebro 界面上查看。

curl -s -X GET "http://localhost:9200/_cat/recovery"

# 对新索引写数据测试,能够看到失败的

curl -s -X PUT "localhost:9200/my_search_indices/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'
# 写入失败

# 打开索引的写功能

curl -s -X PUT "localhost:9200/my_search_indices/_settings?pretty" -H 'Content-Type: application/json' -d '{
  "settings": {
    "index.blocks.write": false 
  }
}'

# 再次对新索引写数据测试,能够看到此时，写入是成功的

curl -s -X PUT "localhost:9200/my_search_indices/_doc/33?pretty" -H 'Content-Type: application/json' -d '{
  "id": 33,
  "name":"amd",
  "age":"33"
}'

curl -s -X PUT "localhost:9200/my_search_indices/_doc/44?pretty" -H 'Content-Type: application/json' -d '{
  "id": 44,
  "name":"intel",
  "age":"4"
}'

# 此时，老的那个索引仍是只读的，咱们确保新索引OK后，就能够考虑关闭或者删除老的 twitter索引了。

测试将新数据写入别名

curl -s -X PUT "localhost:9200/my_search_indices/_doc/44?pretty" -H 'Content-Type: application/json' -d '{
	"id": 44,
    "name":"amd",
    "age":"44"
}'


写入也是ok 的

删除索引

curl -s -X DELETE "http://localhost:9200/new_twitter"

总结

贴一张生产环境执行后的index的截图，能够看到新的index的每一个shard体积只有老index的一半，这样也就分摊了index的压力：