ElasticSearch——冷热(hot&warm)架构部署

时间 2020-04-18

标签 elasticsearch 冷热 hot&warm hot warm 架构部署栏目日志分析繁體版

原文原文链接

背景

最近在作订单数据存储到ElasticSearch，考虑到数据量比较大，采用冷热架构来存储，每个月创建一个新索引，数据先写入到热索引，经过工具将3个月后的索引自动迁移到冷节点上。node

ElasticSearch版本：6.2.4linux

冷热架构

官方叫法：热暖架构——“Hot-Warm” Architecture。shell

通俗解读：热节点存放用户最关心的热数据；温节点或者冷节点存放用户不太关心或者关心优先级低的冷数据或者暖数据。json

1.1 官方解读冷热架构

为了保证Elasticsearch的读写性能，官方建议磁盘使用SSD固态硬盘。然而Elasticsearch要解决的是海量数据的存储和检索问题，海量的数据就意味须要大量的存储空间，若是都使用SSD固态硬盘成本将成为一个很大的问题，这也是制约许多企业和我的使用Elasticsearch的因素之一。为了解决这个问题，Elasticsearch冷热分离架构应运而生。bootstrap

冷热架构是一项十分强大的功能，可以让您将 Elasticsearch 部署划分为“热”数据节点和“冷”数据节点。vim

热数据节点处理全部新输入的数据，而且存储速度也较快，以便确保快速地采集和检索数据。
冷节点的存储密度则较大，如需在较长保留期限内保留日志数据，不失为一种具备成本效益的方法。

将这两种类型的数据节点结合到一块儿后，您便可以有效地处理输入数据，并将其用于查询，同时还能在节省成本的前提下在较长时间内保留数据。此架构对日志用例来讲尤为大有帮助，由于在日志用例中，人们的大部分精力都会专一于近期的日志（例如最近两周），而较早的日志（因为合规性或者其余缘由仍须要保留）则能够接受较慢的查询时间。bash

1.2 典型应用场景

一句话：在成本有限的前提下，让客户关注的实时数据和历史数据硬件隔离，最大化解决客户反应的响应时间慢的问题。业务场景描述：
每日增量6TB日志数据，高峰时段写入及查询频率都较高，集群压力较大，查询ES时，常出现查询缓慢问题。服务器

ES集群的索引写入及查询速度主要依赖于磁盘的IO速度，冷热数据分离的关键为使用SSD磁盘存储热数据，提高查询效率。
若所有使用SSD，成本太高，且存放冷数据较为浪费，于是使用普通SATA磁盘与SSD磁盘混搭，可作到资源充分利用，性能大幅提高的目标。

实现原理

借助 Elasticsearch的分片分配策略，确切的说是：架构

第一：集群节点层面支持规划节点类型，这是划分热暖节点的前提。

具体方式是在elasticsearch.yml文件中增长如下配置：app

node.attr.{attribute}: {value}

其中attribute为用户自定义的任意标签名，value为该节点对应的该标签的值，例如对于冷热分离，可使用以下设置

node.attr.temperature: hot //热节点
node.attr.temperature: cold //冷节点

第二：索引层面支持将数据路由到给定节点，这为数据写入冷、热节点作了保障。

具体方式是在建立模板或索引时指定属性：

index.routing.allocation.include.{attribute} 　　//表示索引能够分配在包含多个值中其中一个的节点上。
index.routing.allocation.require.{attribute}　　 //表示索引要分配在包含索引指定值的节点上（一般通常设置一个值）。
index.routing.allocation.exclude.{attribute}　　 //表示索引只能分配在不包含全部指定值的节点上。

实现方案

1.1 集群设计：

节点名称	服务器类型	存储数据
es-master1	4C 16G 1T SATA	元数据
es-master2
es-master3
es-hot1	16C 64G 1T SSD	Hot
es-hot2
es-hot3
es-cold1	8C 32G 5T SATA	Cold
es-cold2

2.1 配置Master节点

Master1节点配置（其余节点配置相似）

[root@es-master1 ~]# cd /etc/elasticsearch/
[root@es-master1 elasticsearch]# vim elasticsearch.yml
cluster.name: linuxplus
node.name: es-master1.linuxplus.com
node.attr.rack: r6
node.master: true
node.data: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["es-master1.linuxplus.com:9300","es-master2.linuxplus.com:9300","es-master3.linuxplus.com:9300","es-hot1.linuxplus.com:9300","es-hot2.linuxplus.com:9300","es-hot3.linuxplus.com:9300","es-stale1.linuxplus.com:9300","es-stale2.linuxplus.com:9300"]
discovery.zen.minimum_master_nodes: 1
bootstrap.system_call_filter: false

2.2 配置Hot节点

Hot1节点配置（其余节点配置相似）

[root@es-hot1 elasticsearch]# vim elasticsearch.yml
cluster.name: linuxplus
node.name: es-hot1.linuxplus.com     # 提示：自行修改其余节点的名称
node.attr.rack: r1
node.master: false
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.10.10.24           # 提示：自行修改其余节点的IP
discovery.zen.ping.unicast.hosts: ["es-master1.linuxplus.com:9300","es-master2.linuxplus.com:9300","es-master3.linuxplus.com:9300"]
discovery.zen.minimum_master_nodes: 1
bootstrap.system_call_filter: false node.attr.hotwarm_type: hot　　　　　# 标识为热数据节点　
[root@es-hot1 elasticsearch]# /etc/init.d/elasticsearch start

2.3 配置Cold节点

Cold1节点配置（其余节点配置相似）

[root@es-stale1 elasticsearch]# vim elasticsearch.yml
cluster.name: linuxplus
node.name: es-stale1.linuxplus.com　　　　# 提示：自行修改其余节点的名称
node.attr.rack: r1
node.master: false
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.10.10.27　　　　　　　　　# 提示：自行修改其余节点的IP
discovery.zen.ping.unicast.hosts: ["es-master1.linuxplus.com:9300","es-master2.linuxplus.com:9300","es-master3.linuxplus.com:9300"]
discovery.zen.minimum_master_nodes: 1
bootstrap.system_call_filter: false node.attr.hotwarm_type: cold　　　　　　# 标识为冷数据节点
[root@es-stale1 elasticsearch]# /etc/init.d/elasticsearch start

3.1 数据写入

方案一：经过模板指定冷热数据节点

PUT _template/order_template
{
    "index_patterns": "order_*",
    "settings": {
　　　　  "index.routing.allocation.require.hotwarm_type": "hot",　　# 指定默认为热数据节点　　　　
        "index.number_of_replicas": "0"

     }
}

　注：以【order_】开头索引命名的，都将其数据放到hot节点上

方案二：经过索引指定冷热数据节点

PUT /order_2019-12
{
  "settings": {
    "index.routing.allocation.require.hotwarm_type": "hot",　　　# 指定为热数据节点　
    "number_of_replicas": 0
  }
}

热节点效果图：分别建立2个索引，包含3个分片1个副本

4.1 数据迁移至冷节点

方案一：手工修改索引路由为：cold

ES看到有新的标记就会将这个索引自动迁移到冷数据节点中

#在kibana里操做:

PUT /order_stpprdinf_2019-12/_settings 
{ 
  "settings": { 
    "index.routing.allocation.require.hotwarm_type": "cold"    # 指定数据存放到冷数据节点
  } 
}

方案二：经过shell脚本按期迁移数据

#!/bin/bash  hot数据（保留7天）迁移到cold

Time=$(date -d "1 week ago" +"%Y.%m.%d")
Hostname=$(hostname)
arr=("order_stpprdinf" "order_stppayinf")
for var in ${arr[@]}
do
    curl -H "Content-Type: application/json" -XPUT http://$Hostname:9200/$var_$Time/_settings?pretty -d'
    { 
       "settings": { 
             "index.routing.allocation.require.hotwarm_type": "cold"    # 指定数据存放到冷数据节点
        } 
    }'
done

方案三：经过curator按期迁移数据

　　步骤1：建立config.yml，填写Elasticsearch集群配置信息。

# Rmember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
client:
  hosts: ["10.0.101.100", "10.0.101.101", "10.0.101.102"]
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: INFO
  logfile: /opt/elasticsearch-curator/logs/run.log
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

　　步骤2：建立action.yml

# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
#
# Also remember that all examples have 'disable_action' set to True.  If you
# want to use this action as a template, be sure to set this to False after
# copying it.
actions:
  1:
    action: allocation　　　　　　　　　# 这里执行操做类型为删除索引
    description: >-
      Apply shard allocation routing to 'require' 'tag=cold' for hot/cold node
      setup for logstash- indices older than 3 days, based on index_creation date.
    options:
      key: hotwarm_type　　　　　　　　 # 这是es节点中定义的属性
      value: cold　　　　　　　　　　　　# 这是要更新的值，变为冷节点
      allocation_type: require　　　　 # 这里alloction的类型
      disable_action: false
    filters:
    - filtertype: pattern　　
      kind: prefix　　　　　　　　　　　 # 这里是指匹配前缀为 “order_” 的索引，还能够支持正则匹配等，详见官方文档
      value: order_ - filtertype: age　　　　　　　　　 # 这里匹配时间
      source: name　　　　　　　　　　　 # 这里根据索引name来匹配，还能够根据字段等，详见官方文档
      direction: older
      timestring: "%Y-%m"　　　　　　　 # 用于匹配和提取索引或快照名称中的时间戳
      unit: months　　　　　　　　　　　　# 这里定义的是months，还有days,weeks等，总时间为unit * unit_count
      unit_count: 3

　　步骤3：运行curator

单次运行：

cd /opt/elasticsearch-curator
curator --config config.yml action.yml

cron定时任务运行：

crontab -e
#添加以下配置,天天0时运行一次
0 0 */1 * * curator --config /opt/elasticsearch-curator/config.yml /opt/elasticsearch-curator/action.yml

迁移冷节点效果图：

应用

由于按时间分了多个索引，查询的时候能够跨多个索引进行查询，打分、排序、分页和搜单个索引没什么区别。

    /**
     * 查询.
     *
     * @param indexName    索引名稱
     * @param type         索引類型
     * @param conditionMap 查询条件Map
     * @param orderByMap   排序Map
     * @param page         分页page
     * @return 查询结果
     */
    @Override
    public List<Map<String, Object>> query(final String[] indexName, final String type,
                                           final Map<String, Object> conditionMap, final Map<String, String> orderByMap,
                                           final Page page) {
        logger.info("查询elasticSearch数据......");
        logger.info("indexName={}", Arrays.toString(indexName));
        logger.info("conditionMap={}", conditionMap.toString());
        logger.info("orderByMap={}", orderByMap.toString());

        final long currentTimeMillis = System.currentTimeMillis();
        RestHighLevelClient client = null;
        List<Map<String, Object>> resultList = new ArrayList<>();
        try {
            // 一、建立链接
            client = createConnect();

            // 二、建立search请求
            SearchRequest searchRequest = new SearchRequest(indexName);
            searchRequest.types(type);


　　　　　　　这里省略几百行代码................  

}

参考：

铭毅天下：干货 | Elasticsearch 冷热集群架构实战

http://www.javashuo.com/article/p-ttujyktn-bb.html

https://cloud.tencent.com/developer/article/1544261

https://elasticsearch.cn/article/6127#tip3