整理一篇经常使用的CRUD查询语句,以前这篇文件是在17年左右发表的,从英文翻译过来,如今采用7.x 版本进行实验,弃用的功能或者参数,我这边会进行更新,一块儿来学习吧。html
为了演示不一样类型的 ElasticSearch 的查询,咱们将使用书文档信息的集合(有如下字段:title(标题), authors(做者), summary(摘要), publish_date(发布日期)和 num_reviews(浏览数))。数组
在这以前,首先咱们应该先建立一个新的索引(index),并批量导入一些文档:缓存
建立索引:服务器
PUT /bookdb_index { "settings": { "number_of_shards": 1 }}
批量上传文档:app
注意:如今7.x 已经启用types 类型了,对应的操做语句也要修改下,
POST /bookdb_index/book/_bulk
换成POST /bookdb_index/_bulk
,而后进行操做。elasticsearch
POST /bookdb_index/_bulk { "index": { "_id": 1 }} { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } { "index": { "_id": 2 }} { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" } { "index": { "_id": 3 }} { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" } { "index": { "_id": 4 }} { "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" }
有两种方式来执行一个全文匹配查询:ide
url
中读取全部的查询参数下面是一个基本的匹配查询,查询任一字段包含 Guide 的记录函数
GET /bookdb_index/_search?q=guide [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.28168046, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.24144039, "_source": { "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ]
下面是完整 Search Profiler版本的查询,生成相同的内容:性能
{ "query": { "multi_match" : { "query" : "guide", "fields" : [ "*" ] } } }
multi_match
是 match
的做为在多个字段运行相同操做的一个速记法。fields
属性用来指定查询针对的字段,*
表明全部字段,同时也可使用单个字段进行查询,用逗号分隔开就能够。学习
在这个例子中,咱们想要对文档的全部字段进行匹配。两个 API 都容许你指定要查询的字段。例如,查询 title
字段中包含 in Action 的书:
GET /bookdb_index/_search?q=title:in action [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.6259885, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.5975345, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } } ]
然而, 完整的 DSL 给予你灵活建立更复杂查询和指定返回结果的能力(后面,咱们会一一阐述)。在下面例子中,咱们指定 size
限定返回的结果条数,from 指定起始位子,_source
指定要返回的字段,以及语法高亮
POST /bookdb_index/_search { "query": { "match" : { "title" : "in action" } }, "size": 2, "from": 0, "_source": [ "title", "summary", "publish_date" ], "highlight": { "fields" : { "title" : {} } } } [Results] "hits": { "total": 2, "max_score": 0.9105287, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.9105287, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" }, "highlight": { "title": [ "Elasticsearch <em>in</em> <em>Action</em>" ] } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.9105287, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" }, "highlight": { "title": [ "Solr <em>in</em> <em>Action</em>" ] } } ] }
注意:对于多个词查询,match
容许指定是否使用 and
操做符来取代默认的 or
操做符。你还能够指定 mininum_should_match
选项来调整返回结果的相关程度。具体看后面的例子。
正如咱们已经看到来的,为了根据多个字段检索(e.g. 在 title
和 summary
字段都是相同的查询字符串的结果),你可使用 multi_match
语句
POST /bookdb_index/_search { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary"] } } } [Results] "hits": { "total": 3, "max_score": 0.9448582, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.9448582, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.17312013, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ] }
注:第三条被匹配,由于 guide
在 summary
字段中被找到。
因为咱们是多个字段查询,咱们可能须要提升某一个字段的分值。在下面的例子中,咱们把 summary
字段的分数提升三倍,为了提高 summary
字段的重要度;所以,咱们把文档 4 的相关度提升了。
POST /bookdb_index/_search { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary^3"] } }, "_source": ["title", "summary", "publish_date"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.31495273, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.13094766, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ]
注:提高不是简简单单经过提高因子把计算分数加成。实际的 boost
值经过归一化和一些内部优化给出的。相关信息请见 Elasticsearch guide
为了提供更相关或者特定的结果,AND
/OR
/NOT
操做符能够用来调整咱们的查询。它是以 布尔查询 的方式来实现的。布尔查询 接受以下参数:
must
等同于 AND
must_not
等同于 NOT
should
等同于 OR
上面的关键字中在一个query中只能出现一次
打比方,若是我想要查询这样类型的书:书名包含 ElasticSearch 或者(OR
) Solr,而且(AND
)它的做者是 Clinton Gormley 不是(NOT
)Radu Gheorge
POST /bookdb_index/_search { "query": { "bool": { "must": { "match": { "authors": "clinton gormely" }}, "must_not": { "match": { "authors": "radu gheorge" }}, "should": [ { "match": { "title": "Elasticsearch" }}, { "match": { "title": "Solr" }} ] } } } 格式化版本: POST /bookdb_index/_search { "query": { "bool": { "must": { "match": { "authors": "clinton gormely" } }, "must_not": { "match": { "authors": "radu gheorge" } }, "should": [ { "match": { "title": "Elasticsearch" } }, { "match": { "title": "Solr" } } ] } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.3672021, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } } ]
注:正如你所看到的,布尔查询 能够包装任何其余查询类型,包括其余布尔查询,以建立任意复杂或深度嵌套的查询。
在进行匹配和多项匹配时,能够启用模糊匹配来捕捉拼写错误,模糊度是基于原始单词的编辑距离来指定的。
POST /bookdb_index/_search { "query": { "multi_match" : { "query" : "comprihensiv guide", "fields": ["title", "summary"], "fuzziness": "AUTO" } }, "_source": ["title", "summary", "publish_date"], "size": 1 } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.5961596, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } } ]
注:当术语长度大于 5 个字符时,AUTO
的模糊值等同于指定值 “2”。可是,80% 拼写错误的编辑距离为 1,因此,将模糊值设置为 1
可能会提升您的总体搜索性能。更多详细信息,请参阅Elasticsearch指南中的“排版和拼写错误”(Typos and Misspellings)。
通配符查询 容许你指定匹配的模式,而不是整个术语。
?
匹配任何字符*
匹配零个或多个字符。例如,要查找名称以字母’t’开头的全部做者的记录:
POST /bookdb_index/_search { "query": { "wildcard" : { "authors" : "t*" } }, "_source": ["title", "authors"], "highlight": { "fields" : { "authors" : {} } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 1, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ] }, "highlight": { "authors": [ "zachary <em>tong</em>" ] } }, { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 1, "_source": { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": [ "grant ingersoll", "thomas morton", "drew farris" ] }, "highlight": { "authors": [ "<em>thomas</em> morton" ] } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 1, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ] }, "highlight": { "authors": [ "<em>trey</em> grainger", "<em>timothy</em> potter" ] } } ]
正则查询 让你可使用比 通配符查询 更复杂的模式进行查询:
POST /bookdb_index/_search { "query": { "regexp" : { "authors" : "t[a-z]*y" } }, "_source": ["title", "authors"], "highlight": { "fields" : { "authors" : {} } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 1, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ] }, "highlight": { "authors": [ "<em>trey</em> grainger", "<em>timothy</em> potter" ] } } ]
短语匹配查询 要求在请求字符串中的全部查询项必须都在文档中存在,文中顺序也得和请求字符串一致,且彼此相连。默认状况下,查询项之间必须紧密相连,但能够设置 slop
值来指定查询项之间能够分隔多远的距离,结果仍将被看成一次成功的匹配。
POST /bookdb_index/_search { "query": { "multi_match" : { "query": "search engine", "fields": ["title", "summary"], "type": "phrase", "slop": 3 } }, "_source": [ "title", "summary", "publish_date" ] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.22327082, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.16113183, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } } ]
注:在上述例子中,对于非整句类型的查询,_id
为 1 的文档通常会比 _id
为 4 的文档得分高,结果位置也更靠前,由于它的字段长度较短,可是对于 短语匹配类型 查询,因为查询项之间的接近程度是一个计算因素,所以 _id
为 4 的文档得分更高。
短语前缀式查询 可以进行 即时搜索(search-as-you-type) 类型的匹配,或者说提供一个查询时的初级自动补全功能,无需以任何方式准备你的数据。和 match_phrase
查询相似,它接收slop
参数(用来调整单词顺序和不太严格的相对位置)和 max_expansions
参数(用来限制查询项的数量,下降对资源需求的强度)。
POST /bookdb_index/_search { "query": { "match_phrase_prefix" : { "summary": { "query": "search en", "slop": 3, "max_expansions": 10 } } }, "_source": [ "title", "summary", "publish_date" ] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.5161346, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.37248808, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } } ]
注:采用 查询时即时搜索 具备较大的性能成本。更好的解决方案是采用 索引时即时搜索。更多信息,请查看 自动补齐接口(Completion Suggester API) 或 边缘分词器(Edge-Ngram filters)的用法。
查询字符串 类型(query_string)的查询提供了一个方法,用简洁的简写语法来执行 多匹配查询、 布尔查询 、 提权查询、 模糊查询、 通配符查询、 正则查询 和范围查询。下面的例子中,咱们在那些做者是 “grant ingersoll” 或 “tom morton” 的某本书当中,使用查询项 “search algorithm” 进行一次模糊查询,搜索所有字段,但给 summary
的权重提高 2 倍。
POST /bookdb_index/_search { "query": { "query_string": { "query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)", "fields": [ "*", "summary^2" ] } }, "_source": [ "title", "summary", "authors" ], "highlight": { "fields": { "summary": {} } } } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 0.14558059, "_source": { "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": [ "grant ingersoll", "thomas morton", "drew farris" ] }, "highlight": { "summary": [ "organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging, information extraction, and summarization" ] } } ]
简单请求字符串 类型(simple_query_string)的查询是请求字符串类型(query_string)查询的一个版本,它更适合那种仅暴露给用户一个简单搜索框的场景;由于它用 +/\|/-
分别替换了 AND/OR/NOT
,而且自动丢弃了请求中无效的部分,不会在用户出错时,抛出异常。
POST /bookdb_index/_search { "query": { "simple_query_string" : { "query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)", "fields": ["*", "summary^2"] } }, "_source": [ "title", "summary", "authors" ], "highlight": { "fields" : { "summary" : {} } } } [Results] "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 3.5710216, "hits" : [ { "_index" : "bookdb_index", "_type" : "book", "_id" : "2", "_score" : 3.5710216, "_source" : { "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "title" : "Taming Text: How to Find, Organize, and Manipulate It", "authors" : [ "grant ingersoll", "thomas morton", "drew farris" ] }, "highlight" : { "summary" : [ "organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging" ] } } ] }
以上例子均为 full-text
(全文检索) 的示例。有时咱们对结构化查询更感兴趣,但愿获得更准确的匹配并返回结果,词条查询 和 多词条查询 可帮咱们实现。在下面的例子中,咱们要在索引中找到全部由 Manning 出版的图书。
POST /bookdb_index/_search { "query": { "term" : { "publisher": "manning" } }, "_source" : ["title","publish_date","publisher"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 1.2231436, "_source": { "publisher": "manning", "title": "Taming Text: How to Find, Organize, and Manipulate It", "publish_date": "2013-01-24" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 1.2231436, "_source": { "publisher": "manning", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 1.2231436, "_source": { "publisher": "manning", "title": "Solr in Action", "publish_date": "2014-04-05" } } ]
可以使用词条关键字来指定多个词条,将搜索项用数组传入。
{ "query": { "terms" : { "publisher": ["oreilly", "packt"] } } }
词条查询 的结果(和其余查询结果同样)能够被轻易排序,多级排序也被容许:
POST /bookdb_index/_search { "query": { "term": { "publisher": "manning" } }, "_source": [ "publish_date", "publisher" ], "sort": [ { "publish_date": { "order": "desc" } } ] } [Results] "hits" : { "total" : { "value" : 3, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bookdb_index", "_type" : "book", "_id" : "3", "_score" : null, "_source" : { "publisher" : "manning", "publish_date" : "2015-12-03" }, "sort" : [ 1449100800000 ] }, { "_index" : "bookdb_index", "_type" : "book", "_id" : "4", "_score" : null, "_source" : { "publisher" : "manning", "publish_date" : "2014-04-05" }, "sort" : [ 1396656000000 ] }, { "_index" : "bookdb_index", "_type" : "book", "_id" : "2", "_score" : null, "_source" : { "publisher" : "manning", "publish_date" : "2013-01-24" }, "sort" : [ 1358985600000 ] } ] }
另外一个结构化查询的例子是 范围查询。在这个例子中,咱们要查找 2015 年出版的书。
POST /bookdb_index/_search { "query": { "range": { "publish_date": { "gte": "2015-01-01", "lte": "2015-12-31" } } }, "_source": [ "title", "publish_date", "publisher" ] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 1, "_source": { "publisher": "oreilly", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 1, "_source": { "publisher": "manning", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ]
注:范围查询 用于日期、数字和字符串类型的字段。
过滤查询容许你能够过滤查询结果。对于咱们的例子中,要在标题或摘要中检索一些书,查询项为 Elasticsearch,但咱们又想筛出那些仅有 20 个以上评论的。
新版本不支持filtered 查询,已经弃用这个关键字
POST /bookdb_index/_search { "query": { "filtered": { "query" : { "multi_match": { "query": "elasticsearch", "fields": ["title","summary"] } }, "filter": { "range" : { "num_reviews": { "gte": 20 } } } } }, "_source" : ["title","summary","publisher", "num_reviews"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.5955761, "_source": { "summary": "A distibuted real-time search and analytics engine", "publisher": "oreilly", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide" } } ]
注:过滤查询 并不强制它做用于其上的查询必须存在。若是未指定查询,match_all
基本上会返回索引内的所有文档。实际上,过滤只在第一次运行,以减小所需的查询面积,而且,在第一次使用后过滤会被缓存,大大提升了性能。
更新:过滤查询 将在 ElasticSearch 5
中移除,使用 布尔查询 替代。 下面有个例子使用 布尔查询 重写上面的例子:
POST /bookdb_index/_search { "query": { "bool": { "must" : { "multi_match": { "query": "elasticsearch", "fields": ["title","summary"] } }, "filter": { "range" : { "num_reviews": { "gte": 20 } } } } }, "_source" : ["title","summary","publisher", "num_reviews"] }
在后续的例子中,咱们将会把它使用在 多重过滤 中。
多重过滤 能够结合 布尔查询 使用,下一个例子中,过滤查询决定只返回那些包含至少20条评论,且必须在 2015 年前出版,且由 O’Reilly 出版的结果。
POST /bookdb_index/_search { "query": { "bool": { "must": [ { "match": { "title": "Elasticsearch" } } ], "filter": [ { "term": { "publisher": "oreilly" } }, { "range": { "publish_date": { "gte": "2014-12-31" } } } ] } }, "_source": [ "title", "publisher", "publish_date" ] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.5955761, "_source": { "summary": "A distibuted real-time search and analytics engine", "publisher": "oreilly", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } } ]
也许在某种状况下,你想把文档中的某个特定域做为计算相关性分值的一个因素,比较典型的场景是你想根据普及程度来提升一个文档的相关性。在咱们的示例中,咱们想把最受欢迎的书(基于评论数判断)的权重进行提升,可以使用 field_value_factor
用以影响分值。
POST /bookdb_index/_search { "query": { "function_score": { "query": { "multi_match" : { "query" : "search engine", "fields": ["title", "summary"] } }, "field_value_factor": { "field" : "num_reviews", "modifier": "log1p", "factor" : 2 } } }, "_source": ["title", "summary", "publish_date", "num_reviews"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.44831306, "_source": { "summary": "A distibuted real-time search and analytics engine", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.3718407, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "num_reviews": 23, "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.046479136, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "num_reviews": 18, "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } }, { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 0.041432835, "_source": { "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "num_reviews": 12, "title": "Taming Text: How to Find, Organize, and Manipulate It", "publish_date": "2013-01-24" } } ]
注1: 咱们可能刚运行了一个常规的 multi_match
(多匹配)查询,并对 num_reviews
域进行了排序,这让咱们失去了评估相关性分值的好处。
注2: 有大量的附加参数可用来调整提高原始相关性分值效果的程度,好比 modifier
, factor
, boost_mode
等等,至于细节可在 Elasticsearch 指南中探索。
假设不想使用域值作递增提高,而你有一个理想目标值,并但愿用这个加权因子来对这个离你较远的目标值进行衰减。有个典型的用途是基于经纬度、价格或日期等数值域的提高。在以下的例子中,咱们查找在2014年6月左右出版的,查询项是 search engines 的书。
POST /bookdb_index/_search { "query": { "function_score": { "query": { "multi_match" : { "query" : "search engine", "fields": ["title", "summary"] } }, "functions": [ { "exp": { "publish_date" : { "origin": "2014-06-15", "offset": "7d", "scale" : "30d" } } } ], "boost_mode" : "replace" } }, "_source": ["title", "summary", "publish_date", "num_reviews"] } [Results] "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.27420625, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "num_reviews": 23, "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.005920768, "_source": { "summary": "A distibuted real-time search and analytics engine", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 0.000011564, "_source": { "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "num_reviews": 12, "title": "Taming Text: How to Find, Organize, and Manipulate It", "publish_date": "2013-01-24" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.0000059171475, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "num_reviews": 18, "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ]
当内置的评分函数没法知足你的需求时,还能够用 Groovy 脚本。在咱们的例子中,想要指定一个脚本,能在决定把 num_reviews
的因子计算多少以前,先将 publish_date
考虑在内。由于很新的书也许不会有评论,分值不该该被惩罚。
评分脚本以下:
publish_date = doc['publish_date'].value num_reviews = doc['num_reviews'].value if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { my_score = Math.log(2.5 + num_reviews) } else { my_score = Math.log(1 + num_reviews) } return my_score
在 script_score
参数内动态调用评分脚本:
POST /bookdb_index/book/_search { "query": { "function_score": { "query": { "multi_match" : { "query" : "search engine", "fields": ["title", "summary"] } }, "functions": [ { "script_score": { "params" : { "threshold": "2015-07-30" }, "script": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);" } } ] } }, "_source": ["title", "summary", "publish_date", "num_reviews"] } [Results] "hits": { "total": 4, "max_score": 0.8463001, "hits": [ { "_index": "bookdb_index", "_type": "book", "_id": "1", "_score": 0.8463001, "_source": { "summary": "A distibuted real-time search and analytics engine", "num_reviews": 20, "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "bookdb_index", "_type": "book", "_id": "4", "_score": 0.7067348, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "num_reviews": 23, "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "bookdb_index", "_type": "book", "_id": "3", "_score": 0.08952084, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "num_reviews": 18, "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } }, { "_index": "bookdb_index", "_type": "book", "_id": "2", "_score": 0.07602123, "_source": { "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "num_reviews": 12, "title": "Taming Text: How to Find, Organize, and Manipulate It", "publish_date": "2013-01-24" } } ] }
注1: 要在 Elasticsearch 实例中使用动态脚本,必须在 config/elasticsearch.yaml 文件中启用它;也可使用存储在 Elasticsearch 服务器上的脚本。建议看看 Elasticsearch 指南文档获取更多信息。
注2: 因 JSON 不能包含嵌入式换行符,请使用分号来分割语句。