GET _search
{ "took": 1, "timed_out": false, "_shards": { "total": 16, "successful": 16, "failed": 0 }, "hits": { "total": 19, "max_score": 1, "hits": [ { "_index": ".kibana", "_type": "config", "_id": "5.2.0", "_score": 1, "_source": { "buildNum": 14695 } }, { "_index": "test_index", "_type": "test_type", "_id": "AWypxxLYFCl_S-ox4wvd", "_score": 1, "_source": { "test_content": "my test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 1, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_doc", "_id": "10", "_score": 1, "_source": { "test_field": "test10 routing _id" } }, { "_index": "test_index", "_type": "test_doc", "_id": "11", "_score": 1, "_routing": "12", "_source": { "test_field": "test routing not _id" } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 1, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } }, { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 1, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "4", "_score": 1, "_source": { "test_field": "test4" } }, { "_index": "test_index", "_type": "test_type", "_id": "2", "_score": 1, "_source": { "test_field": "replaces test2" } } ] } }
由于ES默认是没有timeout的,因此先描述一下场景假设咱们有些搜索应用,对时间是很敏感的,好比电商网站,你不能让用户等个10分钟,若是那样的话,人家早就走了,不来买东西了。前端
因而咱们就须要有timeout机制,指定每一个shard,就只能在timeout时间范围内,将搜索到的部分数据(也可能全都搜索到了),直接返回给客户端,而不是等到全部数据全都搜索出来之后在返回。node
这样就能够确保说,一次搜索请求能够在用户指定的timeout时长内完成,为一些时间敏感的搜索应用提供良好的支持。ide
注意:ES在默认状况下是没有所谓的timeout的,好比说若是你的搜索特别慢,每一个shard都要花好几分钟才能查询出来全部的数据,那么你的搜索请求也会等待好几分钟以后才会返回。
下面画图简单描述一下timeout机制性能
语法:网站
GET _search?timeout=10ms
先说明一下,低版本的ES一个index是支持多type的,因此就有multi-type这一种搜索模式,这里不作详细讲解,由于和multi-index搜索模式是基本同样的。并且高版本的ES会弃用type。ui
GET /_search
GET /test/_search
GET /test_index,test/_search
GET /test*/_search
GET /_all/_search
当客户端发送查询请求到ES时,会把请求打到全部的primary shard上去执行,由于每一个shard都包含部分数据,全部每一个shard均可能会包含搜索请求的结果,可是若是primary shard有replica shard,那么请求也能够打到replica shard上去。
以下图所示:spa
在实际应用中,分页是必不可少的,例如,前端页面展现数据给用户每每都是分页进行展现的。code
Elasticsearch分页搜索采用的是from+size。from表示查询结果的起始下标,size表示从起始下标开始返回文档的个数。
示例:blog
GET test_index/test_type/_search?from=0&size=3 { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 9, "max_score": 1, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "AWypxxLYFCl_S-ox4wvd", "_score": 1, "_source": { "test_content": "my test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 1, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 1, "_source": { "test_field": "test test" } } ] } }
什么是深分页(deep paging)?简单来讲,就是搜索的特别深,好比总共有60000条数据,三个primary shard,每一个shard上分了20000条数据,每页是10条数据,这个时候,你要搜索到第1000页,实际上要拿到的是10001~10010。排序
注意这里千万不要理解成每一个shard都是返回10条数据。这样理解是错误的!
下面作一下详细的分析:
请求首先多是打到一个不包含这个index的shard的node上去,这个node就是一个协调节点coordinate node,那么这个coordinate node就会将搜索请求转发到index的三个shard所在的node上去。好比说咱们以前说的状况下,要搜索60000条数据中的第1000页,实际上每一个shard都要将内部的20000条数据中的第10001~10010条数据,拿出来,不是才10条,是10010条数据。3个shard的每一个shard都返回10010条数据给协调节点coordinate node,coordinate node会收到总共30030条数据,而后在这些数据中进行排序,根据_score相关度分数,而后取到10001~10010这10条数据,就是咱们要的第1000页的10条数据。
以下图所示:
deep paging问题就是说from + size分页太深,那么每一个shard都要返回大量数据给coordinate node协调节点,会消耗大量的带宽,内存,CPU。
GET /test_index/test_type/_search?q=test_field:test
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0.843298, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 0.843298, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 0.43445712, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "7", "_score": 0.25316024, "_source": { "test_field": "test client 1" } } ] } }
GET /test_index/test_type/_search?q=+test_field:test
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0.843298, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 0.843298, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 0.43445712, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "7", "_score": 0.25316024, "_source": { "test_field": "test client 1" } } ] } }
GET /test_index/test_type/_search?q=-test_field:test
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 6, "max_score": 1, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "AWypxxLYFCl_S-ox4wvd", "_score": 1, "_source": { "test_content": "my test" } }, { "_index": "test_index", "_type": "test_type", "_id": "4", "_score": 1, "_source": { "test_field": "test4" } }, { "_index": "test_index", "_type": "test_type", "_id": "2", "_score": 1, "_source": { "test_field": "replaces test2" } }, { "_index": "test_index", "_type": "test_type", "_id": "1", "_score": 1, "_source": { "test_field1": "test field1", "test_field2": "partial updated test1" } }, { "_index": "test_index", "_type": "test_type", "_id": "11", "_score": 1, "_source": { "num": 0, "tags": [] } }, { "_index": "test_index", "_type": "test_type", "_id": "3", "_score": 1, "_source": { "test_field": "test3" } } ] } }
对于query string只要掌握q=field:search content的语法,以及+和-的含义
GET /test_index/test_type/_search?q=test
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 5, "max_score": 0.843298, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": 0.843298, "_source": { "test_field": "test test" } }, { "_index": "test_index", "_type": "test_type", "_id": "AWypxxLYFCl_S-ox4wvd", "_score": 0.3794414, "_source": { "test_content": "my test" } }, { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": 0.31387395, "_source": { "test_field": "test client 2" } }, { "_index": "test_index", "_type": "test_type", "_id": "7", "_score": 0.18232156, "_source": { "test_field": "test client 1" } }, { "_index": "test_index", "_type": "test_type", "_id": "1", "_score": 0.16203022, "_source": { "test_field1": "test field1", "test_field2": "partial updated test1" } } ] } }
也就是在使用query string的时候,若是不指定field,那么默认就是_all。_all元数据是在创建索引的时候产生的,咱们插入一条document,它里面包含了多个field,此时ES会自动将多个field的值所有用字符串的方式串联起来,变成一个长的字符串。这个长的字符串就是_all field的值。同时创建索引。
举个例子:
对于一个document:
{ "name": "jack", "age": 26, "email": "jack@sina.com", "address": "guamgzhou" }
那么"jack 26 jack@sina.com guamazhou",就会做为这个document的_all fieldd的值,同时进行分词后创建对应的倒排索引。
注意在生产环境中通常不会使用query string这种查询方式。