Multi Match Query

时间 2019-11-30

标签 multi match query 繁體版

原文原文链接

Multi Match Query

　　multi_match查询建议在match query之上，并容许多字段查询：html

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "this is a test", 　　【1】
      "fields": [ "subject", "message" ]  【2】
    }
  }
}

　　【1】查询字符串java

　　【2】被查询的字段elasticsearch

`fields` and per-field boosting

　　字段能够经过通配符指定，例如：ide

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "Will Smith",
      "fields": [ "title", "*_name" ] 【1】
    }
  }
}

　　【1】查询title,first_name和last_name字段。ui

　　个别字段能够经过插入符号（^）来提高：this

GET /_search
{
  "query": {
    "multi_match" : {
      "query" : "this is a test",
      "fields" : [ "subject^3", "message" ] 【1】
    }
  }
}

　　【1】subject字段是message字段的3倍。spa

Types of `multi_match` query:

　　内部执行multi_match查询的方式依赖于type参数，它能够被设置成：code

　　best_fields 　　（默认）查找与任何字段匹配的文档，但使用最佳字段中的_score。看best_fields.htm

　　most_fields　　查找与任何字段匹配的文档，并联合每一个字段的_score.blog

　　cross_fields　　采用相同分析器处理字段，就好像他们是一个大的字段。在每一个字段中查找每一个单词。看cross_fields。

　　phrase　　　　在每一个字段上运行match_phrase查询并和每一个字段的_score组合。看phrase and phrase_prefix。

　　phrase_prefix 在每一个字段上运行match_phrase_prefix查询并和每一个字段的_score组合。看phrase and phrase_prefix。

`best_fields`

　　当你在同一个字段中搜索最佳查找的多个单词时，bese_fields类型是最有效的。例如，"brown fox"单独在一个字段中比"brown"在一个字段中和"for"在另一个字段中更有意义。

　　best_fields为每个字段生成match query并在dis_max查询中包含他们，以发现单个最匹配的字段。例如这个查询：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "brown fox",
      "type":       "best_fields",
      "fields":     [ "subject", "message" ],
      "tie_breaker": 0.3
    }
  }
}

　　也能够这样执行：

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "subject": "brown fox" }},
        { "match": { "message": "brown fox" }}
      ],
      "tie_breaker": 0.3
    }
  }
}

　　一般best_fields类型使用单个最佳匹配字段的score，可是假如tie_breaker被指定，则它经过如下计算score:

来自最佳匹配字段的score
相加全部其余匹配字段的tie_breaker * _score。

　　同时也接受analyzer, boost, operator, minimum_should_match, fuzziness, lenient, prefix_length, max_expansions, rewrite, zero_terms_query和cutoff_frequency做为匹配查询的解释。

　　重要：operator 和 minimum_should_match

　　　　best_fields和most_fields类型是field-centric（他们为每个字段生成匹配查询）。这意味着为每个字段单独提供operator和minimum_should_match参数，这可能不是你想要的。

　　　　以此查询为例：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Will Smith",
      "type":       "best_fields",
      "fields":     [ "first_name", "last_name" ],
      "operator":   "and" 【1】
    }
  }
}

　　　　【1】全部的项必须存在

　　　　该查询也能够这样执行：

  (+first_name:will +first_name:smith)
| (+last_name:will  +last_name:smith)

　　换句话说，全部项必须在单个字段中存在，以匹配文档。查看cross_fields以寻找更好的解决方案。

`most_fields`

　　当查询使用不一样方式包含相同文本分析的多个字段时，most_fields类型是很是有用的。例如，main字段可能包含synonyms，stemming 和没有变音符的项，second字段可能包含original项和third字段包含shingles。经过组合来自三个字段的score，咱们能尽量多的经过main字段匹配文档，可是使用second和third字段将最类似的结果推送到列表的顶部。

　　该查询：　

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "quick brown fox",
      "type":       "most_fields",
      "fields":     [ "title", "title.original", "title.shingles" ]
    }
  }
}

　　可能执行以下：

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title":          "quick brown fox" }},
        { "match": { "title.original": "quick brown fox" }},
        { "match": { "title.shingles": "quick brown fox" }}
      ]
    }
  }
}

　　每个match子句的score将被加在一块儿，而后经过match子句的数量来分割。

　　也接受analyzer, boost, operator, minimum_should_match, fuzziness, lenient, prefix_length, max_expansions, rewrite, zero_terms_query和cutoff_frequency，做为match query中的解释，但请看operator and minimum_should_match。

`phrase` and `phrase_prefix`

　　phrase和phrase_prefix类型行为就像best_fields，但他们使用match_phrase或者match_phrase_prefix查询代替match查询。

　　该查询：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "quick brown f",
      "type":       "phrase_prefix",
      "fields":     [ "subject", "message" ]
    }
  }
}

　　可能执行以下：

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match_phrase_prefix": { "subject": "quick brown f" }},
        { "match_phrase_prefix": { "message": "quick brown f" }}
      ]
    }
  }
}

　　也接受analyzer, boost, lenient, slop 和zero_terms_query做为在match query中的解释。phrase_prefix类型此外接受max_expansions。

　　重要：phrase,phrase_prefix和fuzziness：fuzziness参数不能被phrase和phrase_prefix类型使用

`cross_fields`

　　cross_fields类型对于多个字段应该匹配的结构文档特别有用。例如，当为“Will Smith”查询first_name和last_name字段时，最佳匹配应该是"Will"在一个字段中而且"Smith"在另一个字段中。

   这听起来像most_fields的工做，但这种方法有两个问题。第一个问题是operator和minimum_should_match在每一个前缀字段中做用，以代替前缀项（请参考explanation above）。

　　第二个问题是与关联性有关：在first_name和last_name字段中不一样的项频率可能致使不可预期的结果。

　　例如，想像咱们有两我的，“Will Smith”和"Smith Jones"。“Smith”做为姓是很是常见的（因此重要性很低），可是“Smith”做为名字是很是不常见的（因此重要性很高）。

　　假如咱们搜索“Will Smith”,则“Smith Jones”文档可能显示在更加匹配的"Will Smith"上，由于first_name:smith的得分已经赛过first_name:will加last_name:smith的总分。

　　处理该种类型查询的一种方式是简单的将first_name和last_name索引字段放入单个full_name字段中。固然，这只能在索引时间完成。

　　cross_field类型尝试经过采用term-centric方法在查询时解决这些问题。首先把查询字符串分解成当个项，而后在任何字段中查询每一个项，就好像它们是一个大的字段。

　　查询就像这样：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Will Smith",
      "type":       "cross_fields",
      "fields":     [ "first_name", "last_name" ],
      "operator":   "and"
    }
  }
}

　　被执行为：

+(first_name:will  last_name:will)
+(first_name:smith last_name:smith)

　　换一种说法，全部的项必须至少在匹配文档中一个字段中出现（比较the logic used for best_fields and most_fields）。

　　解决了两个问题中的一个。经过混合全部字段项的频率解决不一样项匹配的问题，以便平衡差别。

　　在实践中，first_name:smith将被视为和last_name:smith具备相同的频率，加1。这将使得在first_name和last_name上的匹配具备可比较的分数，对于last_name具备微小的优点，由于它是最有可能包含simth的字段。

　　注意，cross_fields一般仅做用与获得1提高的短字符串字段。不然增长，项频率和长度正常化有助于得分，使得项统计的混合再也不有任何意义。

　　假如你经过Validata API运行上面的查询，将返回这样的解释：

+blended("will",  fields: [first_name, last_name])
+blended("smith", fields: [first_name, last_name])

　　也接受analyzer, boost, operator, minimum_should_match, lenient, zero_terms_query 和cutoff_frequency,做为match query的解释。

`cross_field` and analysis

　　cross_field类型只能在具备相同分析器的字段上以term-centric模式工做。具备相同分析器的字段如上述实例组合在一块儿。假若有多个组，则他们使用bool查询相结合。

　　例如，假如咱们有相同分析器的first和last字段，增长一个同时使用edge_ngram分析器的first.edge和last.edge，该查询：

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Jon",
      "type":       "cross_fields",
      "fields":     [
        "first", "first.edge",
        "last",  "last.edge"
      ]
    }
  }
}

　　可能被执行为：

    blended("jon", fields: [first, last])
| (
    blended("j",   fields: [first.edge, last.edge])
    blended("jo",  fields: [first.edge, last.edge])
    blended("jon", fields: [first.edge, last.edge])
)

　　换句话说，first和last可能被组合在一块儿并被当作一个字段来对待，同时first.edge和last.edge可能被组合在一块儿并当作一个字段来对待。

　　具备多个组是好的，当使用operator或者minimum_should_match关联的时候，它可能遭受和most_fields和best_fields相同的问题。

　　你能够容易的将该查询重写为两个独立的cross_fields查询与bool查询相结合，并将minimum_should_match参数应用于其中一个：

GET /_search
{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match" : {
            "query":      "Will Smith",
            "type":       "cross_fields",
            "fields":     [ "first", "last" ],
            "minimum_should_match": "50%" 【1】
          }
        },
        {
          "multi_match" : {
            "query":      "Will Smith",
            "type":       "cross_fields",
            "fields":     [ "*.edge" ]
          }
        }
      ]
    }
  }
}

　　【1】will或smith必须存在于first或last字段。

　　你能够经过在查询中指定analyzer参数强制把全部字段放入相同组中。

GET /_search
{
  "query": {
   "multi_match" : {
      "query":      "Jon",
      "type":       "cross_fields",
      "analyzer":   "standard", 【1】
      "fields":     [ "first", "last", "*.edge" ]
    }
  }
}

　　【1】对全部字段使用standard分析器

将执行以下：

blended("will",  fields: [first, first.edge, last.edge, last])
blended("smith", fields: [first, first.edge, last.edge, last])

`tie_breaker`

　　默认状况，每个per-term混合查询将使用组中任何字段的最佳分数，而后将这些分数相加，以得出最终分数。tie_breaker参数能够改变per-term混合查询的默认行为，它接受：

　　0.0 　　　　　　获取最好的分数（举例）first_name：will和last_name:will（default）

　　1.0　　　　　　全部分数相加（举例）first_name:will和last_name:will　　

　　0.0 < n < 1.0　将单个最佳分数加上tie_breaker乘以其它每一个匹配字段的分数。　

　　重要：cross_fields and fuzziness

　　　　fuzziness参数不能被cross_fields类型使用。

原文地址：https://www.elastic.co/guide/en/elasticsearch/reference/5.0/query-dsl-multi-match-query.html

Multi Match Query