Elasticsearch由浅入深（七）搜索引擎：_search含义、_multi-index搜索模式、分页搜索以及深分页性能问题、query string search语法以及_all metada

时间 2019-11-06

标签 elasticsearch 由浅入深搜索引擎 search 含义 multi index 搜索模式分页以及性能问题 query string 语法 metada 栏目日志分析繁體版

原文原文链接

_search含义

_search查询返回结果数据含义分析

GET _search

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 16,
    "successful": 16,
    "failed": 0
  },
  "hits": {
    "total": 19,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "config",
        "_id": "5.2.0",
        "_score": 1,
        "_source": {
          "buildNum": 14695
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_doc",
        "_id": "10",
        "_score": 1,
        "_source": {
          "test_field": "test10 routing _id"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_doc",
        "_id": "11",
        "_score": 1,
        "_routing": "12",
        "_source": {
          "test_field": "test routing not _id"
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "jiajieshi yagao",
          "desc": "youxiao fangzhu",
          "price": 25,
          "producer": "jiajieshi producer",
          "tags": [
            "fangzhu"
          ]
        }
      },
      {
        "_index": "ecommerce",
        "_type": "product",
        "_id": "4",
        "_score": 1,
        "_source": {
          "name": "special yagao",
          "desc": "special meibai",
          "price": 50,
          "producer": "special yagao producer",
          "tags": [
            "meibai"
          ]
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field": "test4"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaces test2"
        }
      }
    ]
  }
}

View Code

took: 整个搜索请求花费了多少毫秒
timed_out:表示请求是否超时
hits:total:value表示返回结果的总数，relation表示关系例如通常是eq表示相等
hits:max_score: 表示本次搜索的全部结果中，最大的相关度分数是多少，每一条document对于search的相关度，越相关，_score分数就越大，排位就越靠前
hits:hits：表示查询出来document的结果集合
shards:total表示打到的全部分片，
shards:successful表示打到的分片中查询成功的分片,
shards:skipped表示打到的分片中跳过的分片,
shards:failed表示打到的分片中查询失败的分片

search timeout机制

由于ES默认是没有timeout的，因此先描述一下场景假设咱们有些搜索应用，对时间是很敏感的，好比电商网站，你不能让用户等个10分钟，若是那样的话，人家早就走了，不来买东西了。前端

因而咱们就须要有timeout机制，指定每一个shard,就只能在timeout时间范围内，将搜索到的部分数据（也可能全都搜索到了），直接返回给客户端，而不是等到全部数据全都搜索出来之后在返回。node

这样就能够确保说，一次搜索请求能够在用户指定的timeout时长内完成，为一些时间敏感的搜索应用提供良好的支持。ide

注意：ES在默认状况下是没有所谓的timeout的，好比说若是你的搜索特别慢，每一个shard都要花好几分钟才能查询出来全部的数据，那么你的搜索请求也会等待好几分钟以后才会返回。
下面画图简单描述一下timeout机制性能

语法：网站

GET _search?timeout=10ms

_multi-index&multi-type搜索模式

先说明一下，低版本的ES一个index是支持多type的，因此就有multi-type这一种搜索模式，这里不作详细讲解，由于和multi-index搜索模式是基本同样的。并且高版本的ES会弃用type。ui

multi-index搜索模式

/_search:全部索引下的全部数据都搜索出来
```
GET /_search
```
/{index}/_search：指定一个index,搜索这个索引下的全部数据
```
GET /test/_search
```
/index1,index2/_search:同时搜索两个索引下的数据
```
GET /test_index,test/_search
```
/1,2/_search: 经过通配符匹配多个索引，查询多个索引下的数据
```
GET /test*/_search
```
/_all/_search: 表明全部的index
```
GET /_all/_search
```

搜索原理浅析

当客户端发送查询请求到ES时，会把请求打到全部的primary shard上去执行，由于每一个shard都包含部分数据，全部每一个shard均可能会包含搜索请求的结果，可是若是primary shard有replica shard，那么请求也能够打到replica shard上去。
以下图所示：spa

分页搜索以及deep paging性能揭秘

在实际应用中，分页是必不可少的，例如，前端页面展现数据给用户每每都是分页进行展现的。code

ES分页搜索

Elasticsearch分页搜索采用的是from+size。from表示查询结果的起始下标，size表示从起始下标开始返回文档的个数。
示例：blog

GET test_index/test_type/_search?from=0&size=3

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 1,
        "_source": {
          "test_field": "test test"
        }
      }
    ]
  }
}

深分页性能问题

什么是深分页（deep paging）?简单来讲，就是搜索的特别深，好比总共有60000条数据，三个primary shard,每一个shard上分了20000条数据，每页是10条数据，这个时候，你要搜索到第1000页，实际上要拿到的是10001~10010。排序

注意这里千万不要理解成每一个shard都是返回10条数据。这样理解是错误的！

下面作一下详细的分析：
请求首先多是打到一个不包含这个index的shard的node上去，这个node就是一个协调节点coordinate node，那么这个coordinate node就会将搜索请求转发到index的三个shard所在的node上去。好比说咱们以前说的状况下，要搜索60000条数据中的第1000页，实际上每一个shard都要将内部的20000条数据中的第10001~10010条数据，拿出来，不是才10条，是10010条数据。3个shard的每一个shard都返回10010条数据给协调节点coordinate node，coordinate node会收到总共30030条数据，而后在这些数据中进行排序，根据_score相关度分数，而后取到10001~10010这10条数据，就是咱们要的第1000页的10条数据。
以下图所示：

deep paging问题就是说from + size分页太深，那么每一个shard都要返回大量数据给coordinate node协调节点，会消耗大量的带宽，内存，CPU。

query string search语法以及_all metadata原理

query string基础语法

GET /test_index/test_type/_search?q=test_field:test

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.43445712,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.25316024,
        "_source": {
          "test_field": "test client 1"
        }
      }
    ]
  }
}

View Code

GET /test_index/test_type/_search?q=+test_field:test

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.43445712,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.25316024,
        "_source": {
          "test_field": "test client 1"
        }
      }
    ]
  }
}

View Code

GET /test_index/test_type/_search?q=-test_field:test

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 1,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "4",
        "_score": 1,
        "_source": {
          "test_field": "test4"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaces test2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "partial updated test1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "num": 0,
          "tags": []
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "3",
        "_score": 1,
        "_source": {
          "test_field": "test3"
        }
      }
    ]
  }
}

View Code

对于query string只要掌握q=field:search content的语法，以及+和-的含义

+：表明包含这个筛选条件结果
-：表明不包含这个筛选条件的结果

_all metadata

GET /test_index/test_type/_search?q=test

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 0.843298,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "6",
        "_score": 0.843298,
        "_source": {
          "test_field": "test test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AWypxxLYFCl_S-ox4wvd",
        "_score": 0.3794414,
        "_source": {
          "test_content": "my test"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "8",
        "_score": 0.31387395,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 0.18232156,
        "_source": {
          "test_field": "test client 1"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "1",
        "_score": 0.16203022,
        "_source": {
          "test_field1": "test field1",
          "test_field2": "partial updated test1"
        }
      }
    ]
  }
}

View Code

也就是在使用query string的时候，若是不指定field，那么默认就是_all。_all元数据是在创建索引的时候产生的，咱们插入一条document，它里面包含了多个field,此时ES会自动将多个field的值所有用字符串的方式串联起来，变成一个长的字符串。这个长的字符串就是_all field的值。同时创建索引。
举个例子：
对于一个document：

{
  "name": "jack",
  "age": 26,
  "email": "jack@sina.com",
  "address": "guamgzhou"
}

那么"jack 26 jack@sina.com guamazhou",就会做为这个document的_all fieldd的值，同时进行分词后创建对应的倒排索引。
注意在生产环境中通常不会使用query string这种查询方式。