一、ik_max_wordjavascript
会将文本作最细粒度的拆分,好比会将“中华人民共和国人民大会堂”拆分为“中华人民共和国、中华人民、中华、华人、人民共和国、人民、共和国、大会堂、大会、会堂等词语。java
二、ik_smart
会作最粗粒度的拆分,好比会将“中华人民共和国人民大会堂”拆分为中华人民共和国、人民大会堂。
elasticsearch
term
和 match
总结在实际的项目查询中,term
和match
是最经常使用的两个查询,而常常搞不清二者有什么区别,趁机总结有空总结下。code
term
用法token
先看看term的定义,term是表明彻底匹配,也就是精确查询,搜索前不会再对搜索词进行分词拆解。ip
这里经过例子来讲明,先存放一些数据:文档
{ "title": "love China", "content": "people very love China", "tags": ["China", "love"] } { "title": "love HuBei", "content": "people very love HuBei", "tags": ["HuBei", "love"] }
来使用term
查询下:it
{ "query": { "term": { "title": "love" } } }
结果是,上面的两条数据都能查询到:io
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.6931472, "hits": [ { "_index": "test", "_type": "doc", "_id": "8", "_score": 0.6931472, "_source": { "title": "love HuBei", "content": "people very love HuBei", "tags": ["HuBei","love"] } }, { "_index": "test", "_type": "doc", "_id": "7", "_score": 0.6931472, "_source": { "title": "love China", "content": "people very love China", "tags": ["China","love"] } } ] } }
发现,title里有关love的关键字都查出来了,可是我只想精确匹配 love China
这个,按照下面的写法看看能不能查出来:ast
{ "query": { "term": { "title": "love China" } } }
执行发现无数据,从概念上看,term属于精确匹配,只能查单个词。我想用term匹配多个词怎么作?可使用terms
来:
{ "query": { "terms": { "title": ["love", "China"] } } }
查询结果为:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.6931472, "hits": [ { "_index": "test", "_type": "doc", "_id": "8", "_score": 0.6931472, "_source": { "title": "love HuBei", "content": "people very love HuBei", "tags": ["HuBei","love"] } }, { "_index": "test", "_type": "doc", "_id": "7", "_score": 0.6931472, "_source": { "title": "love China", "content": "people very love China", "tags": ["China","love"] } } ] } }
发现所有查询出来,为何?由于terms里的[ ]
多个是或者的关系,只要知足其中一个词就能够。想要通知知足两个词的话,就得使用bool的must来作,以下:
{ "query": { "bool": { "must": [ { "term": { "title": "love" } }, { "term": { "title": "china" } } ] } } }
能够看到,咱们上面使用china
是小写的。当使用的是大写的China
咱们进行搜索的时候,发现搜不到任何信息。这是为何了?title这个词在进行存储的时候,进行了分词处理。咱们这里使用的是默认的分词处理器进行了分词处理。咱们能够看看如何进行分词处理的?
分词处理器
GET test/_analyze { "text" : "love China" }
结果为:
{ "tokens": [ { "token": "love", "start_offset": 0, "end_offset": 4, "type": "<ALPHANUM>", "position": 0 }, { "token": "china", "start_offset": 5, "end_offset": 10, "type": "<ALPHANUM>", "position": 1 } ] }
分析出来的为love
和china
的两个词。而term
只能完完整整的匹配上面的词,不作任何改变的匹配。因此,咱们使用China
这样的方式进行的查询的时候,就会失败。稍后会有一节专门讲解分词器。
match
用法
先用 love China
来匹配。
GET test/doc/_search { "query": { "match": { "title": "love China" } } }
结果是:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1.3862944, "hits": [ { "_index": "test", "_type": "doc", "_id": "7", "_score": 1.3862944, "_source": { "title": "love China", "content": "people very love China", "tags": [ "China", "love" ] } }, { "_index": "test", "_type": "doc", "_id": "8", "_score": 0.6931472, "_source": { "title": "love HuBei", "content": "people very love HuBei", "tags": [ "HuBei", "love" ] } } ] } }
发现两个都查出来了,为何?由于match进行搜索的时候,会先进行分词拆分,拆完后,再来匹配,上面两个内容,他们title的词条为: love china hubei
,咱们搜索的为love China
咱们进行分词处理获得为love china
,而且属于或的关系,只要任何一个词条在里面就能匹配到。若是想 love
和 China
同时匹配到的话,怎么作?使用 match_phrase
match_phrase
用法
match_phrase
称为短语搜索,要求全部的分词必须同时出如今文档中,同时位置必须紧邻一致。
GET test/doc/_search { "query": { "match_phrase": { "title": "love china" } } }
结果为:
{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.3862944, "hits": [ { "_index": "test", "_type": "doc", "_id": "7", "_score": 1.3862944, "_source": { "title": "love China", "content": "people very love China", "tags": [ "China", "love" ] } } ] } }
此次好像符合咱们的需求了,结果只出现了一条记录。
Bool查询对应Lucene中的BooleanQuery,它由一个或者多个子句组成,每一个子句都有特定的类型。
minimum_should_match
参数定义了至少知足几个子句。
bool查询也支持禁用协同计分选项disable_coord。通常计算分值的因素取决于全部的查询条件。
bool查询也是采用more_matches_is_better的机制,所以知足must和should子句的文档将会合并起来计算分值。
{ "bool" : { "must" : { "term" : { "user" : "kimchy" } }, "filter": { "term" : { "tag" : "tech" } }, "must_not" : { "range" : { "age" : { "from" : 10, "to" : 20 } } }, "should" : [ { "term" : { "tag" : "wow" } }, { "term" : { "tag" : "elasticsearch" } } ], "minimum_should_match" : 1, "boost" : 1.0 } }
在filter子句查询中,分值将会都返回0。分值会受特定的查询影响。
好比,下面三个查询中都是返回全部status字段为active的文档
第一个查询,全部的文档都会返回0:
GET _search { "query": { "bool": { "filter": { "term": { "status": "active" } } } } }
下面的bool查询中包含了一个match_all,所以全部的文档都会返回1
GET _search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "term": { "status": "active" } } } } }
constant_score与上面的查询结果相同,也会给每一个文档返回1:
GET _search { "query": { "constant_score": { "filter": { "term": { "status": "active" } } } } }
若是想知道究竟是bool里面哪一个条件匹配,可使用named query查询:
{ "bool" : { "should" : [ {"match" : { "name.first" : {"query" : "shay", "_name" : "first"} }}, {"match" : { "name.last" : {"query" : "banon", "_name" : "last"} }} ], "filter" : { "terms" : { "name.last" : ["banon", "kimchy"], "_name" : "test" } } } }