经常使用的Elasticseaerch检索技巧汇总

    本篇博客是对前期工做中遇到ES坑的一些小结,顺手记录下,方便往后查阅。app

0、前言

为了讲解不一样类型ES检索,咱们将要对包含如下类型的文档集合进行检索:elasticsearch

1. title 标题; 2. authors 做者; 3. summary 摘要; 4. release data 发布日期; 5. number of reviews 评论数。

首先,让咱们借助 bulk API批量建立新的索引并提交数据。ide

PUT /bookdb_index { "settings": { "number_of_shards": 1 }} POST /bookdb_index/book/_bulk { "index": { "_id": 1 }} { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } { "index": { "_id": 2 }} { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" } { "index": { "_id": 3 }} { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" } { "index": { "_id": 4 }}

一、基本匹配检索( Basic Match Query)

1.1 全文检索

有两种方式能够执行全文检索: 
1)使用包含参数的检索API,参数做为URL的一部分。性能

举例:如下对”guide”执行全文检索。优化

GET /bookdb_index/book/_search?q=guide
[Results]
"hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.28168046, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.24144039, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ]

2)使用完整的ES DSL,其中Json body做为请求体。 
其执行结果如方式1)结果一致。
ui

{ "query": { "multi_match" : { "query" : "guide", "fields" : ["_all"] } } }

解读:使用multi_match关键字代替match关键字,做为对多个字段运行相同查询的方便的简写方式。 fields属性指定要查询的字段,在这种状况下,咱们要对文档中的全部字段进行查询。spa

1.2 指定特定字段检索

这两个API也容许您指定要搜索的字段。 例如,要在标题字段中搜索带有“in action”字样的图书, 
1)URL检索方式 
以下所示:.net

GET /bookdb_index/book/_search?q=title:in action [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.6259885, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.5975345, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } } ]

2)DSL检索方式 
然而,full body的DSL为您提供了建立更复杂查询的更多灵活性(咱们将在后面看到)以及指定您但愿的返回结果。 在下面的示例中,咱们指定要返回的结果数、偏移量(对分页有用)、咱们要返回的文档字段以及属性的高亮显示。 
结果数的表示方式:size; 
偏移值的表示方式:from; 
指定返回字段 的表示方式 :_source; 
高亮显示 的表示方式 :highliaght。
scala

POST /bookdb_index/book/_search { "query": { "match" : { "title" : "in action" } }, "size": 2, "from": 0, "_source": [ "title", "summary", "publish_date" ], "highlight": { "fields" : { "title" : {} } } } [Results] "hits": { "total": 2, "max_score": 0.9105287, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.9105287, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" }, "highlight": { "title": [ "Elasticsearch <em>in</em> <em>Action</em>" ] } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.9105287, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" }, "highlight": { "title": [ "Solr <em>in</em> <em>Action</em>" ] } } ] }

注意:对于 multi-word 检索,匹配查询容许您指定是否使用‘and’运算符, code

而不是使用默认’or’运算符。 
您还能够指定minimum_should_match选项来调整返回结果的相关性。 
详细信息能够在Elasticsearch指南中查询Elasticsearch guide. 获取。

二、多字段检索 (Multi-field Search)

如咱们已经看到的,要在搜索中查询多个文档字段(例如在标题和摘要中搜索相同的查询字符串),请使用multi_match查询。

POST /bookdb_index/book/_search { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary"] } } } [Results] "hits": { "total": 3, "max_score": 0.9448582, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.9448582, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.17312013, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ] }

注意:以上结果3匹配的缘由是guide在summary存在。

三、 Boosting提高某字段得分的检索( Boosting)

因为咱们正在多个字段进行搜索,咱们可能但愿提升某一字段的得分。 在下面的例子中,咱们将“摘要”字段的得分提升了3倍,以增长“摘要”字段的重要性,从而提升文档 4 的相关性。

POST /bookdb_index/book/_search { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary^3"] } }, "_source": ["title", "summary", "publish_date"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.31495273, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.13094766, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ]

注意:Boosting不只意味着计算得分乘法以增长因子。 实际的提高得分值是经过归一化和一些内部优化。参考 Elasticsearch guide.查看更多。

四、Bool检索( Bool Query)

可使用AND / OR / NOT运算符来微调咱们的搜索查询,以提供更相关或指定的搜索结果。

在搜索API中是经过bool查询来实现的。 
bool查询接受”must”参数(等效于AND),一个must_not参数(至关于NOT)或者一个should参数(等同于OR)。

例如,若是我想在标题中搜索一本名为“Elasticsearch”或“Solr”的书,AND由“clinton gormley”创做,但NOT由“radu gheorge”创做:

POST /bookdb_index/book/_search { "query": { "bool": { "must": { "bool" : { "should": [ { "match": { "title": "Elasticsearch" }}, { "match": { "title": "Solr" }} ] } }, "must": { "match": { "authors": "clinton gormely" }}, "must_not": { "match": {"authors": "radu gheorge" }} } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.3672021, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } } ]

注意:您能够看到,bool查询能够包含任何其余查询类型,包括其余布尔查询,以建立任意复杂或深度嵌套的查询。

五、 Fuzzy 模糊检索( Fuzzy Queries)

在 Match检索 和多匹配检索中能够启用模糊匹配来捕捉拼写错误。 基于与原始词的Levenshtein距离来指定模糊度。

POST /bookdb_index/book/_search { "query": { "multi_match" : { "query" : "comprihensiv guide", "fields": ["title", "summary"], "fuzziness": "AUTO" } }, "_source": ["title", "summary", "publish_date"], "size": 1 } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.5961596, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } } ]

“AUTO”的模糊值至关于当字段长度大于5时指定值2。可是,设置80%的拼写错误的编辑距离为1,将模糊度设置为1可能会提升总体搜索性能。 有关更多信息, Typos and Misspellingsch 。

https://blog.csdn.net/laoyang360/article/details/76769208 从6开始

相关文章
相关标签/搜索