承接上一篇博客 https://segmentfault.com/a/11...segmentfault
most_fields是以字段为中心,这就使得它会查询最多匹配的字段。
假设咱们有一个让用户搜索地址。其中有两个文档以下:app
PUT /test_index/_create/1 { "street": "5 Poland Street", "city": "Poland", "country": "United W1V", "postcode": "W1V 3DG" } PUT /test_index/_create/2 { "street": "5 Poland Street W1V", "city": "London", "country": "United Kingdom", "postcode": "3DG" }
使用most_fields进行查询:dom
GET /test_index/_search { "query": { "bool": { "should": [ { "match": { "street": "Poland Street W1V" } }, { "match": { "city": "Poland Street W1V" } }, { "match": { "country": "Poland Street W1V" } }, { "match": { "postcode": "Poland Street W1V" } } ] } } }
咱们发现对每一个字段重复查询字符串很快就会显得冗长,此时用multi_match进行简化以下:post
GET /test_index/_search { "query": { "multi_match": { "query": "Poland Street W1V", "type": "most_fields", "fields": ["street", "city", "country", "postcode"] } } }
结果:设计
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 2.3835402, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 2.3835402, "_source" : { "street" : "5 Poland Street", "city" : "Poland", "country" : "United W1V", "postcode" : "W1V 3DG" } }, { "_index" : "test_index", "_type" : "_doc", "_id" : "2", "_score" : 0.99938464, "_source" : { "street" : "5 Poland Street W1V", "city" : "London", "country" : "United Kingdom", "postcode" : "3DG" } } ] } }
若是用best_fields,那么doc2会在doc1的前面code
GET /test_index/_search { "query": { "multi_match": { "query": "Poland Street W1V", "type": "best_fields", "fields": ["street", "city", "country", "postcode"] } } }
结果:排序
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.99938464, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "2", "_score" : 0.99938464, "_source" : { "street" : "5 Poland Street W1V", "city" : "London", "country" : "United Kingdom", "postcode" : "3DG" } }, { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 0.6931472, "_source" : { "street" : "5 Poland Street", "city" : "Poland", "country" : "United W1V", "postcode" : "W1V 3DG" } } ] } }
(1)它被设计用来找到匹配任意单词的多数字段,而不是找到跨越全部字段的最匹配的单词
(2)它不能使用operator或者minimum_should_match参数来减小低相关度结果带来的长尾效应
(3)每一个字段的词条频度是不一样的,会互相干扰最终获得较差的排序结果索引
上面那说了most_fields的问题,下面就来解决一下这个问题,解决这个问题的第一种方式就是使用copy_to参数。
咱们能够用copy_to将多个field组合成一个field
创建以下索引:ip
DELETE /test_index PUT /test_index { "mappings": { "properties": { "street": { "type": "text", "copy_to": "full_address" }, "city": { "type": "text", "copy_to": "full_address" }, "country": { "type": "text", "copy_to": "full_address" }, "postcode": { "type": "text", "copy_to": "full_address" }, "full_address": { "type": "text" } } } }
插入以前的数据:ci
PUT /test_index/_create/1 { "street": "5 Poland Street", "city": "Poland", "country": "United W1V", "postcode": "W1V 3DG" } PUT /test_index/_create/2 { "street": "5 Poland Street W1V", "city": "London", "country": "United Kingdom", "postcode": "3DG" }
查询:
GET /test_index/_search { "query": { "match": { "full_address": "Poland Street W1V" } } }
结果:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : 0.68370587, "hits" : [ { "_index" : "test_index", "_type" : "_doc", "_id" : "1", "_score" : 0.68370587, "_source" : { "street" : "5 Poland Street", "city" : "Poland", "country" : "United W1V", "postcode" : "W1V 3DG" } }, { "_index" : "test_index", "_type" : "_doc", "_id" : "2", "_score" : 0.5469647, "_source" : { "street" : "5 Poland Street W1V", "city" : "London", "country" : "United Kingdom", "postcode" : "3DG" } } ] } }
咱们能够发现这样变成一个字段full_address以后,就能够解决most_fields的问题了。
解决most_fields的问题的第二种方式就是使用cross_fields查询。
若是咱们在索引文档以前都可以使用_all或是提早定义好copy_to的话,那就没什么问题。可是,Elasticsearch同时也提供了一个搜索期间的解决方案就是使用cross_fields查询。cross_fields采用了一种以词条为中心的方法,这种方法和best_fields以及most_fields采用的以字段为中心的方法有很大的区别。它将全部的字段视为一个大的字段,而后在任一字段中搜索每一个词条。
下面解释一下以字段为中心和以词条为中心的区别。
经过查询:
GET /test_index/_validate/query?explain { "query": { "multi_match": { "query": "Poland Street W1V", "type": "best_fields", "fields": ["street", "city", "country", "postcode"] } } }
获得:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "test_index", "valid" : true, "explanation" : "((postcode:poland postcode:street postcode:w1v) | (country:poland country:street country:w1v) | (city:poland city:street city:w1v) | (street:poland street:street street:w1v))" } ] }
((postcode:poland postcode:street postcode:w1v) |
(country:poland country:street country:w1v) |
(city:poland city:street city:w1v) |
(street:poland street:street street:w1v))
这个就是规则。
将operator设置成and就变成
((+postcode:poland +postcode:street +postcode:w1v) |
(+country:poland +country:street +country:w1v) |
(+city:poland +city:street +city:w1v) |
(+street:poland +street:street +street:w1v))
标识四个词条都须要出如今相同的字段中
经过查询
GET /test_index/_validate/query?explain { "query": { "multi_match": { "query": "Poland Street W1V", "type": "cross_fields", "operator": "and", "fields": ["street", "city", "country", "postcode"] } } }
获得:
{ "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "valid" : true, "explanations" : [ { "index" : "test_index", "valid" : true, "explanation" : "+blended(terms:[postcode:poland, country:poland, city:poland, street:poland]) +blended(terms:[postcode:street, country:street, city:street, street:street]) +blended(terms:[postcode:w1v, country:w1v, city:w1v, street:w1v])" } ] }
+blended(terms:[postcode:poland, country:poland, city:poland, street:poland]) +blended(terms:[postcode:street, country:street, city:street, street:street]) +blended(terms:[postcode:w1v, country:w1v, city:w1v, street:w1v])
这个是规则。换言之全部的词必须出如今任意字段中。
cross_fields类型首先会解析查询字符串来获得一个词条列表,而后在任一字段中搜索每一个词条。经过混合字段的倒排文档频度来解决词条频度问题。从而完美结局了most_fields的问题。
使用cross_fields相比较于copy_to,能够在查询期间对个别字段进行加权。
示例:
GET /test_index/_search { "query": { "multi_match": { "query": "Poland Street W1V", "type": "cross_fields", "fields": ["street^2", "city", "country", "postcode"] } } }
这样street字段的boost就是2,其它字段都为1