这是我参与8月更文挑战的第11天,活动详情查看:8月更文挑战
本Elasticsearch相关文章的版本为:7.4.2markdown
测试数据:app
POST /match_phrase_test/_doc/1
{
"my_text": "my favorite dialet is cold porridge"
}
POST /match_phrase_test/_doc/2
{
"my_text": "when it's cold his favorite food is porridge"
}
复制代码
match_phrase查询会对待查询的文本进行分词,而后对所获得的分词进行phrase查询。post
例子:测试
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "my favorite"
}
}
}
}
复制代码
分析:spa
my favorite
通过分词获得["my", "favorite"]
;my
后面紧跟favorite
, 但doc2只具备favorite
, 不知足短语要求;{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6520334,
"hits" : [
{
"_index" : "match_phrase_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6520334,
"_source" : {
"my_text" : "my favorite dialet is cold porridge"
}
}
]
}
}
复制代码
slop
参数能够设置容许调换文本顺序的最大调换次数,此值是2的倍数。假如文档里记录的是favorite food
,输入的查询文本是food favorite
, 那么调整到和文档favorite food
的顺序同样须要调换步骤:code
food
放到 favorite
所在的位置;favorite
放到 food
所在的位子。总结:
因此调换一个分词须要2个slop,调换两个分词就须要4个slop,调换n个分词须要最少2*n个slop, 也能够理解为使用(顺序错乱的分词的个数-1)*2
。
例子:
假如输入my dialet favorite
,那么要命中doc1的my favorite dialet is cold porridge
,由于dialet favorite
的顺序是错乱的,只须要调换其中一个便可,所须要的最少slop就是1*2即2. 也能够这样计算:(顺序错乱的分词的个数-1)*2 ==> (2-1)*2orm
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "my dialet favorite is",
"slop": 2
}
}
}
}
复制代码
查询结果:索引
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.9197583,
"hits" : [
{
"_index" : "match_phrase_test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.9197583,
"_source" : {
"my_text" : "my favorite dialet is cold porridge"
}
}
]
}
}
复制代码
也能够使用analyzer这个参数指定在进行分词时的分词器,默认是使用所查询的字段的mapping时所显式指定的search_analyzer或索引的默认analyzer。ip
POST /match_phrase_test/_search
{
"query": {
"match_phrase": {
"my_text": {
"query": "favorite Dialet",
"analyzer": "whitespace"
}
}
}
}
复制代码
由于指定analyzer为whitespace,亦即按空格进行分词,获得["favorite", "Dialet"]
,
doc1的my_text在进行倒排索引分词所使用的analyzer为standard分词器(以空格分词,而后统一为小写字母),获得的是["my", "favorite", "dialect", "is", "cold", "porridge"]
,
由于Dialet
并存在doc1的倒排索引里,因此doc1并不会被命中,因此查询结果为空。文档
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
复制代码