[译]使用explain API摆脱ElasticSearch集群RED苦恼

时间 2019-11-10

标签使用 explain api 摆脱 elasticsearch 集群 red 苦恼栏目日志分析繁體版

原文原文链接

原谅连接: https://www.elastic.co/blog/r...html

"哔...哔...哗",PagerDuty的报警通知又来了. 多是由于你又遭遇了节点宕机, 或者服务器机架不可用, 或者整个ElasticSearch集群重启了. 无论哪一种状况, 当前集群的状态都成为了RED: 由于当前有些分片不可被指派(到某个节点), 从而致使部分数据不可用.node

这种状况总会不期而至, 而你该怎么办!?服务器

在ElasticSearch的早期版本中, 一般须要具备诸如爆破专家般的分析能力的人才能找到问题根源: 分片为什么不可用!?. 你须要经过cluster state API, cat-shards API, cat-allocation API, cat-indices API, indices-recovery API, indices-shard-stores API等一系列API来判断集群状态并分析当前可能遇到的问题根源.架构

好在如今的状况大有改善, 只须要一个cluster-allocation-explain API, 你就能轻松分析当前的分片分配状况.负载均衡

cluster-allocation-explain API在ElasticSearch 5.0中初次引入,并在5.2版本中进行了重构. 这个API主要是为了方便解决下面两个问题:elasticsearch

对于不能指派(unassigned)的分片: 解释这些分片不能被指派(到某个节点)的缘由.ide
对于已指派的分片: 解决这些分片指派到特定节点的理由.工具

须要注意的是, 分片分配的问题不该该在集群中常常发生, 一般是节点或集群配置问题所致(例如, 设置了错误的分片分配过滤参数), 或者集群中的节点都保存了分片的副本却互相链接不到, 又或者磁盘问题等等诸如此类. 当问题出现时, 集群管理员须要使用恰当的工具来定位问题, 并把集群恢复到健康状态, 而这正是cluster allocation explain API将要带给咱们的.性能

本文目标就是经过几个具体的示例给你们讲述如何使用explain API来定位分片分配相关的问题.ui

什么是分片分配

分片分配就是把一个分片指派到集群中某个节点的过程. 为了能处理大规模的文档数据,提供高可用的集群能力, ElasticSearch把索引中的文档拆分红分片, 并把分片分配到集群中的不一样节点.

当主分片(primary shard)分配失败时, 将会致使索引的数据丢失以及不能为该索引写入新的数据.
当副本分片(replica shard)分配失败时, 若是相应的主分片完全坏掉(例如磁盘故障)时, 集群将面临数据丢失的困境.
当分片分配到较慢的节点上时, 数据传输量大的索引将由于这些较慢分片而遭受影响, 从而致使集群的性能下降.

所以, 分配分片并指派到最优的节点无疑是ElasticSearch内部一项重要的基础功能.

对于新建索引和已有索引, 分片分配过程也不尽相同. 不过无论哪一种场景, ElasticSearch都经过两个基础组件完成工做: allocators和deciders. Allocators尝试寻找最优的节点来分配分片, deciders则负责判断并决定是否要进行此次分配.

对于新建索引, allocators负责找出拥有分片数最少的节点列表, 并按分片数量增序排序, 所以分片较少的节点会被优先选择. 因此对于新建索引, allocators的目标就是以更为均衡的方式为把新索引的分片分配到集群的节点中. 而后deciders依次遍历allocators给出的节点, 并判断是否把分片分配到该节点. 例如, 若是分配过滤规则中禁止节点A持有索引idx中的任一分片, 那么过滤器也阻止把索引idx分配到节点A中, 即使A节点是allocators从集群负载均衡角度选出的最优节点. 须要注意的是allocators只关心每一个节点上的分片数, 而无论每一个分片的具体大小. 这刚好是deciders工做的一部分, 即阻止把分片分配到将超出节点磁盘容量阈值的节点上.
对于已有索引, 则要区分主分片仍是副本分片. 对于主分片, allocators只容许把主分片指定在已经拥有该分片完整数据的节点上. 若是allocators不这样作, 并把主分片分配到那些没有最新数据的节点上, 则集群将不得不面临数据丢失的困境. 而对于副本分片, allocators则是先判断其余节点上是否已有该分片的数据的拷贝(即使数据不是最新的). 若是有这样的节点, allocators就优先把把分片分配到这其中一个节点. 由于副本分片一旦分配, 就须要从主分片中进行数据同步, 因此当一个节点只拥分片中的部分时, 也就意思着那些未拥有的数据必须从主节点中复制获得. 这样能够明显的提升副本分片的数据恢复过程.

诊断不可指派的主分片

出现不可指派的主分片大概是ElasticSearch中最糟糕的事情之一. 若是未指派的分片出如今新建立的索引, 则将不能向该分片索引数据; 若是出如今已有索引中, 则不但不能索引数据, 而且以前已索引的数据也将不可被搜索.

咱们先在一个拥有两个节点A和B的集群中建立一个名为test_idx的索引, 为该索引只设定1个分片且不设置副本分片. 但在建立索引的时候, 为其设置了分配过滤规则, 即该索引不能出如今节点A和B上. 索引建立命名以下:

PUT /test_idx?wait_for_active_shards=0
{
    "settings":
    {
        "number_of_shards": 1,
        "number_of_replicas": 0,
        "index.routing.allocation.exclude._name": "A,B"
    }
}

虽然索引能建立成功, 但由于过滤规则的限制, 该索引中任何分片都不能分配到所在集群的仅有的两个节点A和B上. 这个例子是咱们人为设置的, 听起来真实场景中也许不可能发生. 但确实会存在由于分配过滤相关设置的错误配置而致使分片没法指派.

此时, 集群将处于RED状态. 这时候咱们就能够经过explain API来得到第一个未指派的分片的一些状况(上面例子中, 集群中只有一个分片未进行指派).

GET /_cluster/allocation/explain

输出信息以下:

{
  "index" : "test_idx",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "INDEX_CREATED", 
    "at" : "2017-01-16T18:12:39.401Z",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",   
  "node_allocation_decisions" : [ 
    {
      "node_id" : "tn3qdPdnQWuumLxVVjJJYQ",
      "node_name" : "A", 
      "transport_address" : "127.0.0.1:9300",
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "filter",  
          "decision" : "NO", 
          "explanation" : "node matches index setting [index.routing.allocation.exclude.] filters [_name:\"A OR B\"]" 
        }
      ]
    },
    {
      "node_id" : "qNgMCvaCSPi3th0mTcyvKQ",
      "node_name" : "B", 
      "transport_address" : "127.0.0.1:9301",
      "node_decision" : "no",
      "weight_ranking" : 2,
      "deciders" : [
        {
          "decider" : "filter",
          "decision" : "NO",
          "explanation" : "node matches index setting [index.routing.allocation.exclude.] filters [_name:\"A OR B\"]"
        }
      ]
    }
  ]
}

explain API对索引test_idx中的第一个主分片0进行了解释: 由于索引刚刚建立(unassigned_info所示), 因此还处于未指派状态(current_state所示). 但又由于没有节点被容许分配给该分片(allocate_explanation所示), 因此分片处于不可分配状态(can_allocate所示). 继续看每一个节点的决策信息(node_allocation_decisions), 能够看到由于建立索引时过滤了节点A和节点B, 因此filter decider(decider所示)给A发出的决定是不容许在A上分配分片('node_decision'所示, decider的explanation也对此作了说明). 在解释中也包含了改变当前状态须要调整的配置参数.

经过下面的_settings API来更新分配过滤配置:

PUT /test_idx/_settings
{
    "index.routing.allocation.exclude._name": null
}

而后再次执行explain API将收到以下信息:

unable to find any unassigned shards to explain

也就是是当前已没有未指派到节点的分片了, 由于索引test_idx中惟一的一个分片已经成功分配过了. 若是只对主分片执行explain API, 以下(注意这里是GET请求):

GET /_cluster/allocation/explain
{
    "index": "test_idx",
    "shard": 0,
    "primary": true
}

则将返回该分片被指派到的节点信息(对输出信息作了缩减):

{
    "index": "test_idx",
    "shard": 0,
    "primary": true,
    "current_state": "started",
    "current_node": {
        "id" : "tn3qdPdnQWuumLxVVjJJYQ",
        "name" : "A",
        "transport_address" : "127.0.0.1:9300",
        "weight_ranking" : 1
    }
}

能够看出该分片已处于分配成功状态(started), 而且被指派到了节点A上.

好了, 让咱们开始向索引test_idx中写入些数据, 而后主分片上就拥有了一些文档. 这时候若是停掉节点A,那么这个主分片也将随之消失. 由于开始时设置不建立副本分片, 因此集群状态又会变成RED. 从新对主分片执行explain API:

GET /_cluster/allocation/explain
{
    "index": "test_idx",
    "shard": 0,
    "primary": true
}

将返回以下信息:

{
  "index" : "test_idx",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {             
    "reason" : "NODE_LEFT",    
    "at" : "2017-01-16T17:24:21.157Z",
    "details" : "node_left[qU98BvbtQu2crqXF2ATFdA]",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy", 
  "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster" 
}

输出信息告诉咱们主分片当前处于未指派状态(current_state), 由于以前分配了该分片的节点已从集群中离开(unassigned_info). unassigned_info告诉咱们当前不能分配分片的缘由是集群中没有该分片的可用备份数据(can_allocate), allocate_explanation给出了更详细的信息.

explain API告知咱们那个主分片已没有任何可用的分片复制数据, 也就是说集群中任一拥有该分片可用的复制信息的节点都不存在了. 当前惟一能作的事就是等待节点恢复并从新加入集群. 在一些更极端场景, 这些节点被永久移除, 而此时只能接受数据丢失的事实, 并经过reroute commends来从新分配空的主分片.

诊断不可指派的副本分片

回到上面的索引test_idx, 并把其副本分片数增长到1:

PUT /test_idx/_settings
{
    "number"_of_replicas": 1
}

而后对于test_idx, 咱们就拥有了2个分片: 主分片shard 0和副本分片shard 0. 由于节点A上已经分配了主分片, 因此副本分片应该指派到节点B上, 以达到集群的分配均衡. 如今对副本分片执行explain API(这里也是GET请求):

GET /_cluster/allocation/explain
{
    "index": "test_idx",
    "shard": 0,
    "primary": false
}

输出结果以下:

{
  "index" : "test_idx",
  "shard" : 0,
  "primary" : false,
  "current_state" : "started",
  "current_node" : {
    "id" : "qNgMCvaCSPi3th0mTcyvKQ",
    "name" : "B",
    "transport_address" : "127.0.0.1:9301",
    "weight_ranking" : 1
  },
  …
}

结果显示副本分片已经被分配到节点B上.

接下来, 咱们再在该索引上设置分片分配过滤, 不过此次咱们只阻止向节点B分配分片数据:

PUT /text_idx/_settings
{
    "index.routing.allocation.exclude._name": "B"
}

重启节点B, 而后从新为副本节点执行explain API, 这时候的结果以下:

{
  "index" : "test_idx",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "NODE_LEFT",
    "at" : "2017-01-16T19:10:34.478Z",
    "details" : "node_left[qNgMCvaCSPi3th0mTcyvKQ]",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no", 
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "qNgMCvaCSPi3th0mTcyvKQ",
      "node_name" : "B",
      "transport_address" : "127.0.0.1:9301",
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "filter",  
          "decision" : "NO",
          "explanation" : "node matches index setting [index.routing.allocation.exclude.] filters [_name:\"B\"]" 
        }
      ]
    },
    {
      "node_id" : "tn3qdPdnQWuumLxVVjJJYQ",
      "node_name" : "A",
      "transport_address" : "127.0.0.1:9300",
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "same_shard",  
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[test_idx][0], node[tn3qdPdnQWuumLxVVjJJYQ], [P], s[STARTED], a[id=JNODiTgYTrSp8N2s0Q7MrQ]]" 
        }
      ]
    }
  ]
}

结果显示副本分片当前处于不可分配状态(can_allocate), 由于分配过滤规则设置了禁止把分片分配到节点B上(explanation). 由于节点A上已经指派了主分片, 因此不容许再把该分片的其余备份信息指派到A节点(explanation)--由于在同一台机器上分配两份彻底相同的数据没有什么意义, 因此ElasticSearch拒绝这样作.

剖析已指派的分片

若是分片能正常分配, 为何还要关注它的explain信息呢? 一般的理由也许是某个索引(主索引或副本索引)已经分配到一个节点, 而后你又经过分配过滤设置但愿把该分片从当前节点移到另一个节点上(也许你正想尝试hot-warm架构), 但出于一些其余缘由, 这个分片依然驻留在当前节点上. 这也正是explain API能帮助咱们清晰当前分片分配过程的重要场景.

下面咱们先清除掉索引test_idx的分配过滤设置, 以容许主分片和副本分片均可以正常分配:

PUT /test_idx/_settings
{
    "index.routing.allocation.exclude._name": null
}

如今咱们从新设置过滤规则, 以使主分片从当前节点移出:

PUT /test_idx/_settings
{
    "index.routing.allocation.exclude._name": "A"
}

咱们指望的结果是该过滤规则使主分片从当前的节点A中移出到另外一个节点, 然而却事与愿违. 下面经过explain API来分析其中的起因:

GET /_cluster/allocation/explain
{
    "index": "test_idx",
    "shard": 0,
    "primary": true
}

输出结果以下:

{
  "index" : "test_idx",
  "shard" : 0,
  "primary" : true,
  "current_state" : "started",
  "current_node" : {
    "id" : "tn3qdPdnQWuumLxVVjJJYQ",
    "name" : "A",  
    "transport_address" : "127.0.0.1:9300"
  },
  "can_remain_on_current_node" : "no", 
  "can_remain_decisions" : [   
    {
      "decider" : "filter",
      "decision" : "NO",
      "explanation" : "node matches index setting [index.routing.allocation.exclude.] filters [_name:\"A\"]"   
    }
  ],
  "can_move_to_other_node" : "no", 
  "move_explanation" : "cannot move shard to another node, even though it is not allowed to remain on its current node",
  "node_allocation_decisions" : [
    {
      "node_id" : "qNgMCvaCSPi3th0mTcyvKQ",
      "node_name" : "B",
      "transport_address" : "127.0.0.1:9301",
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "same_shard", 
          "decision" : "NO",
          "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[test_idx][0], node[qNgMCvaCSPi3th0mTcyvKQ], [R], s[STARTED], a[id=dNgHLTKwRH-Dp-rIX4Hkqg]]" 
        }
      ]
    }
  ]
}

经过对结果的分析, 咱们看到主分片依然驻留在节点A(current_node). 虽然集群明确表示该分片已不该该再继续滞留在当前节点(can_remain_on_current_node), 理由是当前节点符合设置的分配过滤规则(can_remain_decisions). 然而explain API还表示该分片也不能被分配到另一个节点(can_move_to_other_node), 由于集群只有惟一一个另外的节点(节点B), 而且节点B上已经有了一份副本分片, 而同一份数据并不容许同时在一个节点上分配屡次, 因此主分片当前不能被移到B上, 从而也不能从节点A上移出(node_allocation_decisions).

总结

在这篇文章中, 咱们经过对三个不一样的场景的介绍, 来帮忙ElasticSearch管理员经过explain API来理解集群中的分片分配过程. explain API还有其余的一些使用场景, 例如经过展现节点的权重以解释分片为什么处于当前节点而未被均衡到其余节点. explain API是诊断生产环境集群分片分配过程的一件利器, 即使在ElasticSearch的开发过程当中咱们已经获得巨大的帮助并节省了不少时间, 同时咱们的不少客户也经过explain API在诊断集群状态过程当中受益不浅.