ElasticSearch结构化搜索和全文搜索

时间 2019-11-07

原文原文链接

一、结构化搜索

1.1 精确值查找

过滤器很重要，由于它们执行速度很是快，不会计算相关度（直接跳过了整个评分阶段）并且很容易被缓存。请尽量多的使用过滤式查询。html

term 查询会查找咱们指定的精确值。做为其自己， term 查询是简单的。它接受一个字段名以及咱们但愿查找的数值：
{缓存

"term" : {
    "price" : 20
}

}less

一般当查找一个精确值的时候，咱们不但愿对查询进行评分计算。只但愿对文档进行包括或排除的计算，因此咱们会使用 constant_score 查询以非评分模式来执行 term 查询并以一做为统一评分。由于在查询时，不须要计算评分，所以采用constant_score寻找的方式速度会更快。
最终组合的结果是一个 constant_score 查询，它包含一个 term 查询：elasticsearch

GET /my_store/products/_search
{
  "query" : {
      "constant_score" : { 
          "filter" : {
              "term" : { 
                  "price" : 20
              }
          }
      }
  }
}

1.2 组合过滤器

1.2.1 布尔过滤器

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}

must
全部的语句都 必须（must） 匹配，与 AND 等价。
must_not
全部的语句都 不能（must not） 匹配，与 NOT 等价。
should
至少有一个语句要匹配，与 OR 等价。


GET /my_store/products/_search
{
   "query" : {
      "filtered" : { 
         "filter" : {
            "bool" : {
              "should" : [
                 { "term" : {"price" : 20}}, 
                 { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} 
              ],
              "must_not" : {
                 "term" : {"price" : 30} 
              }
           }
         }
      }
   }
}

1.2.2 嵌套布尔过滤器

SELECT document
FROM   products
WHERE  productID      = "KDKE-B-9947-#kL5"
  OR (     productID = "JODL-X-1937-#pV7"
       AND price     = 30 )

GET /my_store/products/_search
{
   "query" : {
      "filtered" : {
         "filter" : {
            "bool" : {
              "should" : [
                { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, 
                { "bool" : { 
                  "must" : [
                    { "term" : {"productID" : "JODL-X-1937-#pV7"}}, 
                    { "term" : {"price" : 30}} 
                  ]
                }}
              ]
           }
         }
      }
   }
}

1.3 查找多个精确值

若是咱们想要查找价格字段值为 $20 或 $30 的文档该如何处理呢？
GET /my_store/products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "terms" : { 
                    "price" : [20, 30]
                }
            }
        }
    }
}

1.4 范围

gt: > 大于（greater than）
lt: < 小于（less than）
gte: >= 大于或等于（greater than or equal to）
lte: <= 小于或等于（less than or equal to）

GET /my_store/products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "range" : {
                    "price" : {
                        "gte" : 20,
                        "lt"  : 40
                    }
                }
            }
        }
    }
}

若是想要范围无界（比方说 >20 ），只须省略其中一边的限制：
"range" : {
    "price" : {
        "gt" : 20
    }
}

日期范围
"range" : {
    "timestamp" : {
        "gt" : "2014-01-01 00:00:00",
        "lt" : "2014-01-07 00:00:00"
    }
}

当使用它处理日期字段时， range 查询支持对 日期计算（date math） 进行操做，比方说，若是咱们想查找时间戳在过去一小时内的全部文档：
"range" : {
    "timestamp" : {
        "gt" : "now-1h"
    }
}

"range" : {
    "timestamp" : {
        "gt" : "2014-01-01 00:00:00",
        "lt" : "2014-01-01 00:00:00||+1M" 
    }
}

1.5 处理null值

1.5.1 存在查询

SELECT tags
FROM   posts
WHERE  tags IS NOT NULL

GET /my_index/posts/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "exists" : { "field" : "tags" }
            }
        }
    }
}

1.5.2.缺失查询

SELECT tags
FROM   posts
WHERE  tags IS NULL

GET /my_index/posts/_search
{
    "query" : {
        "constant_score" : {
            "filter": {
                "missing" : { "field" : "tags" }
            }
        }
    }
}

二、全文搜索

2.1 匹配查询

match 是个核心查询。不管须要查询什么字段， match 查询都应该会是首选的查询方式。它是一个高级全文查询，这表示它既能处理全文字段，又能处理精确字段。match 查询主要的应用场景就是进行全文搜索。ide

2.1.1. 单个词查询

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": "QUICK!"
        }
    }
}

2.1.2 多个词查询

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": "BROWN DOG!"
        }
    }
}

用任意查询词项匹配文档可能会致使结果中出现不相关的长尾。这是种散弹式搜索。可能咱们只想搜索包含全部词项的文档，也就是说，不去匹配 brown OR dog ，而经过匹配 brown AND dog 找到全部文档。post

match 查询还能够接受 operator 操做符做为输入参数，默认状况下该操做符是 or 。咱们能够将它修改为 and 让全部指定词项都必须匹配：ui

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "title": {      
                "query":    "BROWN DOG!",
                "operator": "and"
            }
        }
    }
}

2.2 组合查询

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "quick" }},
      "must_not": { "match": { "title": "lazy"  }},
      "should": [
                  { "match": { "title": "brown" }},
                  { "match": { "title": "dog"   }}
      ]
    }
  }
}

2.3 控制精度

就像咱们能控制 match 查询的精度同样，咱们能够经过 minimum_should_match 参数控制须要匹配的 should 语句的数量，它既能够是一个绝对的数字，又能够是个百分比：code

GET /my_index/my_type/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "brown" }},
        { "match": { "title": "fox"   }},
        { "match": { "title": "dog"   }}
      ],
      "minimum_should_match": 2 
    }
  }
}

这个查询结果会将全部知足如下条件的文档返回： title 字段包含 "brown" AND "fox" 、 "brown" AND "dog" 或 "fox" AND "dog" 。若是有文档包含全部三个条件，它会比只包含两个的文档更相关。htm

2.4 查询语句提高权重

假设想要查询关于 “full-text search（全文搜索）” 的文档，但咱们但愿为说起 “Elasticsearch” 或 “Lucene” 的文档给予更高的权重，这里更高权重是指若是文档中出现 “Elasticsearch” 或 “Lucene” ，它们会比没有的出现这些词的文档得到更高的相关度评分 _score ，也就是说，它们会出如今结果集的更上面。排序

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "content": { 
                        "query":    "full text search",
                        "operator": "and"
                    }
                }
            },
            "should": [ 
                { "match": { "content": "Elasticsearch" }},
                { "match": { "content": "Lucene"        }}
            ]
        }
    }
}


should 语句匹配得越多表示文档的相关度越高。目前为止还挺好。

可是若是咱们想让包含 Lucene 的有更高的权重，而且包含 Elasticsearch 的语句比 Lucene 的权重更高，该如何处理?

咱们能够经过指定 boost 来控制任何查询语句的相对的权重， boost 的默认值为 1 ，大于 1 会提高一个语句的相对权重。因此下面重写以前的查询：
GET /_search
{
    "query": {
        "bool": {
            "must": {
                "match": {  
                    "content": {
                        "query":    "full text search",
                        "operator": "and"
                    }
                }
            },
            "should": [
                { "match": {
                    "content": {
                        "query": "Elasticsearch",
                        "boost": 3 
                    }
                }},
                { "match": {
                    "content": {
                        "query": "Lucene",
                        "boost": 2 
                    }
                }}
            ]
        }
    }
}

三、展望

在使用ES的过程当中在网上搜到了一些ES封装的第三方扩展类，但并未找到在项目中对第三方扩展类进一步封装更加友好地在项目中使用的PHP封装类，所以打算自定义一个ES服务类，目前正在开发中，于是熟悉文档并进行概括总结即是第一步，接下来完成以后便更新发布。目前的思路为对于经常使用的查询操做，封装一个方法便可，并支持传入参数查询字段、排序，分页。相似like方式的搜索单独写一个方法，并能够支持高亮显示搜索词。对于复杂的可直接支持原生方式查询。

附录

参考文档地址