本文主要介绍 Elasticsearch 23种最有用的检索技巧,提供了详尽的源码举例,并配有相应的Java API实现,是不可多得的 Elasticsearch 学习&实战资料html
为了讲解不一样类型 ES 检索,咱们将要对包含如下类型的文档集合进行检索:git
title 标题
authors 做者
summary 摘要
publish_date 发布日期
num_reviews 评论数
publisher 出版社
复制代码
首先,咱们借助 bulk API 批量建立新的索引并提交数据github
# 设置索引 settings
PUT /bookdb_index
{ "settings": { "number_of_shards": 1 }}
# bulk 提交数据
POST /bookdb_index/book/_bulk
{"index":{"_id":1}}
{"title":"Elasticsearch: The Definitive Guide","authors":["clinton gormley","zachary tong"],"summary":"A distibuted real-time search and analytics engine","publish_date":"2015-02-07","num_reviews":20,"publisher":"oreilly"}
{"index":{"_id":2}}
{"title":"Taming Text: How to Find, Organize, and Manipulate It","authors":["grant ingersoll","thomas morton","drew farris"],"summary":"organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization","publish_date":"2013-01-24","num_reviews":12,"publisher":"manning"}
{"index":{"_id":3}}
{"title":"Elasticsearch in Action","authors":["radu gheorge","matthew lee hinman","roy russo"],"summary":"build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms","publish_date":"2015-12-03","num_reviews":18,"publisher":"manning"}
{"index":{"_id":4}}
{"title":"Solr in Action","authors":["trey grainger","timothy potter"],"summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr","publish_date":"2014-04-05","num_reviews":23,"publisher":"manning"}
复制代码
注意:本文实验使用的ES版本是 ES 6.3.0正则表达式
有两种方式能够执行全文检索:缓存
1)使用包含参数的检索API,参数做为URL的一部分bash
举例:如下对 "guide" 执行全文检索服务器
GET bookdb_index/book/_search?q=guide
[Results]
"hits": {
"total": 2,
"max_score": 1.3278645,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.3278645,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1.2871116,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "oreilly"
}
}
]
}
复制代码
2)使用完整的ES DSL,其中Json body做为请求体 其执行结果如方式 1)结果一致.微信
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "guide",
"fields" : ["_all"]
}
}
}
复制代码
解读: 使用multi_match关键字代替match关键字,做为对多个字段运行相同查询的方便的简写方式。 fields属性指定要查询的字段,在这种状况下,咱们要对文档中的全部字段进行查询app
注意:ES 6.x 默认不启用
_all
字段, 不指定 fields 默认搜索为全部字段elasticsearch
这两个API也容许您指定要搜索的字段。
例如,要在标题字段(title)中搜索带有 "in action" 字样的图书
1)URL检索方式
GET bookdb_index/book/_search?q=title:in action
[Results]
"hits": {
"total": 2,
"max_score": 1.6323128,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1.6323128,
"_source": {
"title": "Elasticsearch in Action",
"authors": [
"radu gheorge",
"matthew lee hinman",
"roy russo"
],
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"publish_date": "2015-12-03",
"num_reviews": 18,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.6323128,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
}
]
}
复制代码
2)DSL检索方式 然而,full body的DSL为您提供了建立更复杂查询的更多灵活性(咱们将在后面看到)以及指定您但愿的返回结果。在下面的示例中,咱们指定要返回的结果数、偏移量(对分页有用)、咱们要返回的文档字段以及属性的高亮显示。
结果数的表示方式:size
偏移值的表示方式:from
指定返回字段 的表示方式 :_source
高亮显示 的表示方式 :highliaght
GET bookdb_index/book/_search
{
"query": {
"match": {
"title": "in action"
}
},
"size": 2,
"from": 0,
"_source": ["title", "summary", "publish_date"],
"highlight": {
"fields": {
"title": {}
}
}
}
[Results]
"hits": {
"total": 2,
"max_score": 1.6323128,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1.6323128,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
},
"highlight": {
"title": [
"Elasticsearch <em>in</em> <em>Action</em>"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.6323128,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
},
"highlight": {
"title": [
"Solr <em>in</em> <em>Action</em>"
]
}
}
]
}
复制代码
注意:
- 对于 multi-word 检索,匹配查询容许您指定是否使用 and 运算符, 而不是使用默认 or 运算符 ---> "operator" : "and"
- 您还能够指定 minimum_should_match 选项来调整返回结果的相关性,详细信息能够在Elasticsearch指南中查询Elasticsearch guide获取。
如咱们已经看到的,要在搜索中查询多个文档字段(例如在标题和摘要中搜索相同的查询字符串),请使用multi_match查询
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "guide",
"fields": ["title", "summary"]
}
}
}
[Results]
"hits": {
"total": 3,
"max_score": 2.0281231,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 2.0281231,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "oreilly"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.3278645,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
],
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher": "manning"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1.0333893,
"_source": {
"title": "Elasticsearch in Action",
"authors": [
"radu gheorge",
"matthew lee hinman",
"roy russo"
],
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"publish_date": "2015-12-03",
"num_reviews": 18,
"publisher": "manning"
}
}
]
}
复制代码
注意:以上结果中文档4(_id=4)匹配的缘由是guide在summary存在。
因为咱们正在多个字段进行搜索,咱们可能但愿提升某一字段的得分。 在下面的例子中,咱们将“摘要”字段的得分提升了3倍,以增长“摘要”字段的重要性,从而提升文档 4 的相关性。
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "elasticsearch guide",
"fields": ["title", "summary^3"]
}
},
"_source": ["title", "summary", "publish_date"]
}
[Results]
"hits": {
"total": 3,
"max_score": 3.9835935,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 3.9835935,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 3.1001682,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 2.0281231,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
}
复制代码
注意:Boosting不只意味着计算得分乘法以增长因子。 实际的提高得分值是经过归一化和一些内部优化。参考 Elasticsearch guide查看更多
可使用 AND / OR / NOT 运算符来微调咱们的搜索查询,以提供更相关或指定的搜索结果。
在搜索API中是经过bool查询来实现的。 bool查询接受 must 参数(等效于AND),一个 must_not 参数(至关于NOT)或者一个 should 参数(等同于OR)。
例如,若是我想在标题中搜索一本名为 "Elasticsearch" 或 "Solr" 的书,AND由 "clinton gormley" 创做,但NOT由 "radu gheorge" 创做
GET bookdb_index/book/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{"match": {"title": "Elasticsearch"}},
{"match": {"title": "Solr"}}
]
}
},
{
"match": {"authors": "clinton gormely"}
}
],
"must_not": [
{
"match": {"authors": "radu gheorge"}
}
]
}
}
}
[Results]
"hits": {
"total": 1,
"max_score": 2.0749094,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 2.0749094,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
],
"summary": "A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher": "oreilly"
}
}
]
}
复制代码
关于bool查询中的should, 有两种状况:
注意:您能够看到,bool查询能够包含任何其余查询类型,包括其余布尔查询,以建立任意复杂或深度嵌套的查询
在 Match检索 和多匹配检索中能够启用模糊匹配来捕捉拼写错误。 基于与原始词的 Levenshtein 距离来指定模糊度
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "comprihensiv guide",
"fields": ["title","summary"],
"fuzziness": "AUTO"
}
},
"_source": ["title","summary","publish_date"],
"size": 2
}
[Results]
"hits": {
"total": 2,
"max_score": 2.4344182,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 2.4344182,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1.2871116,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
}
复制代码
"AUTO" 的模糊值至关于当字段长度大于5时指定值2。可是,设置80%的拼写错误的编辑距离为1,将模糊度设置为1可能会提升总体搜索性能。 有关更多信息, Typos and Misspellingsch
通配符查询容许您指定匹配的模式,而不是整个词组(term)检索
举例,要查找具备以 "t" 字母开头的做者的全部记录,以下所示
GET bookdb_index/book/_search
{
"query": {
"wildcard": {
"authors": {
"value": "t*"
}
}
},
"_source": ["title", "authors"],
"highlight": {
"fields": {
"authors": {}
}
}
}
[Results]
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1,
"_source": {
"title": "Elasticsearch: The Definitive Guide",
"authors": [
"clinton gormley",
"zachary tong"
]
},
"highlight": {
"authors": [
"zachary <em>tong</em>"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 1,
"_source": {
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"authors": [
"grant ingersoll",
"thomas morton",
"drew farris"
]
},
"highlight": {
"authors": [
"<em>thomas</em> morton"
]
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
]
},
"highlight": {
"authors": [
"<em>trey</em> grainger",
"<em>timothy</em> potter"
]
}
}
]
}
复制代码
正则表达式能指定比通配符检索更复杂的检索模式,举例以下:
POST bookdb_index/book/_search
{
"query": {
"regexp": {
"authors": "t[a-z]*y"
}
},
"_source": ["title", "authors"],
"highlight": {
"fields": {
"authors": {}
}
}
}
[Results]
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1,
"_source": {
"title": "Solr in Action",
"authors": [
"trey grainger",
"timothy potter"
]
},
"highlight": {
"authors": [
"<em>trey</em> grainger",
"<em>timothy</em> potter"
]
}
}
]
}
复制代码
匹配短语查询要求查询字符串中的全部词都存在于文档中,按照查询字符串中指定的顺序而且彼此靠近。
默认状况下,这些词必须彻底相邻,但您能够指定偏离值(slop value),该值指示在仍然考虑文档匹配的状况下词与词之间的偏离值。
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "search engine",
"fields": ["title", "summary"],
"type": "phrase",
"slop": 3
}
},
"_source": [ "title", "summary", "publish_date" ]
}
[Results]
"hits": {
"total": 2,
"max_score": 0.88067603,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.88067603,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.51429313,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
}
复制代码
注意:在上面的示例中,对于非短语类型查询,文档_id 1一般具备较高的分数,而且显示在文档_id 4以前,由于其字段长度较短。
然而,做为一个短语查询,词与词之间的接近度被考虑在内,因此文档_id 4分数更好
匹配词组前缀查询在查询时提供搜索即时类型或 "相对简单" "的自动完成版本,而无需以任何方式准备数据。
像match_phrase查询同样,它接受一个斜率参数,使得单词的顺序和相对位置没有那么 "严格"。 它还接受max_expansions参数来限制匹配的条件数以减小资源强度
GET bookdb_index/book/_search
{
"query": {
"match_phrase_prefix": {
"summary": {
"query": "search en",
"slop": 3,
"max_expansions": 10
}
}
},
"_source": ["title","summary","publish_date"]
}
复制代码
注意:查询时间搜索类型具备性能成本。 一个更好的解决方案是将时间做为索引类型。 更多相关API查询 Completion Suggester API 或者 Edge-Ngram filters 。
query_string查询提供了以简明的简写语法执行多匹配查询 multi_match queries ,布尔查询 bool queries ,提高得分 boosting ,模糊匹配 fuzzy matching ,通配符 wildcards ,正则表达式 regexp 和范围查询 range queries 的方式。
在下面的例子中,咱们对 "search algorithm" 一词执行模糊搜索,其中一本做者是 "grant ingersoll" 或 "tom morton"。 咱们搜索全部字段,但将提高应用于文档2的摘要字段
GET bookdb_index/book/_search
{
"query": {
"query_string": {
"query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)",
"fields": ["summary^2","title","authors","publisher"]
}
},
"_source": ["title","summary","authors"],
"highlight": {
"fields": {
"summary": {}
}
}
}
[Results]
"hits": {
"total": 1,
"max_score": 3.571021,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 3.571021,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"authors": [
"grant ingersoll",
"thomas morton",
"drew farris"
]
},
"highlight": {
"summary": [
"organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging"
]
}
}
]
}
复制代码
simple_query_string 查询是 query_string 查询的一个版本,更适合用于暴露给用户的单个搜索框, 由于它分别用 +
/ |
/ -
替换了 AND
/ OR
/ NOT
的使用,并放弃查询的无效部分,而不是在用户出错时抛出异常。
GET bookdb_index/book/_search
{
"query": {
"simple_query_string": {
"query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",
"fields": ["summary^2","title","authors","publisher"]
}
},
"_source": ["title","summary","authors"],
"highlight": {
"fields": {
"summary": {}
}
}
}
[Results]
# 结果同上
复制代码
上面1-11小节的例子是全文搜索的例子。 有时咱们对结构化搜索更感兴趣,咱们但愿在其中找到彻底匹配并返回结果
在下面的例子中,咱们搜索 Manning Publications 发布的索引中的全部图书(借助 term和terms查询 )
GET bookdb_index/book/_search
{
"query": {
"term": {
"publisher": {
"value": "manning"
}
}
},
"_source" : ["title","publish_date","publisher"]
}
[Results]
"hits": {
"total": 3,
"max_score": 0.35667494,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.35667494,
"_source": {
"publisher": "manning",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.35667494,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.35667494,
"_source": {
"publisher": "manning",
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
}
]
}
复制代码
Multiple terms可指定多个关键词进行检索
GET bookdb_index/book/_search
{
"query": {
"terms": {
"publisher": ["oreilly", "manning"]
}
}
}
复制代码
Term查询和其余查询同样,轻松的实现排序。多级排序也是容许的
GET bookdb_index/book/_search
{
"query": {
"term": {
"publisher": {
"value": "manning"
}
}
},
"_source" : ["title","publish_date","publisher"],
"sort": [{"publisher.keyword": { "order": "desc"}},
{"title.keyword": {"order": "asc"}}]
}
[Results]
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
},
"sort": [
"manning",
"Elasticsearch in Action"
]
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Solr in Action",
"publish_date": "2014-04-05"
},
"sort": [
"manning",
"Solr in Action"
]
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": null,
"_source": {
"publisher": "manning",
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
},
"sort": [
"manning",
"Taming Text: How to Find, Organize, and Manipulate It"
]
}
]
}
复制代码
注意:Elasticsearch 6.x 全文搜索用text类型的字段,排序用不用 text 类型的字段
另外一个结构化检索的例子是范围检索。下面的举例中,咱们检索了2015年发布的书籍。
GET bookdb_index/book/_search
{
"query": {
"range": {
"publish_date": {
"gte": "2015-01-01",
"lte": "2015-12-31"
}
}
},
"_source" : ["title","publish_date","publisher"]
}
[Results]
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1,
"_source": {
"publisher": "oreilly",
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 1,
"_source": {
"publisher": "manning",
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
}
复制代码
注意:范围查询适用于日期,数字和字符串类型字段
(5.0版本起已再也不存在,没必要关注)
过滤的查询容许您过滤查询的结果。 以下的例子,咱们在标题或摘要中查询名为“Elasticsearch”的图书,可是咱们但愿将结果过滤到只有20个或更多评论的结果。
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"range" : {
"num_reviews": {
"gte": 20
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.5955761,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"publisher": "oreilly",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide"
}
}
]
复制代码
注意:已过滤的查询不要求存在要过滤的查询。 若是没有指定查询,则运行 match_all 查询,基本上返回索引中的全部文档,而后对其进行过滤。 实际上,首先运行过滤器,减小须要查询的表面积。 此外,过滤器在第一次使用后被缓存,这使得它很是有效
更新: 已筛选的查询已推出的Elasticsearch 5.X版本中移除,有利于布尔查询。 这是与上面重写的使用bool查询相同的示例。 返回的结果是彻底同样的。
GET bookdb_index/book/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
}
],
"filter": {
"range": {
"num_reviews": {
"gte": 20
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews"]
}
复制代码
(5.x再也不支持,无需关注) 多个过滤器能够经过使用布尔过滤器进行组合。
在下一个示例中,过滤器肯定返回的结果必须至少包含20个评论,不得在2015年以前发布,而且应该由oreilly发布
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"bool": {
"must": {
"range" : { "num_reviews": { "gte": 20 } }
},
"must_not": {
"range" : { "publish_date": { "lte": "2014-12-31" } }
},
"should": {
"term": { "publisher": "oreilly" }
}
}
}
}
},
"_source" : ["title","summary","publisher", "num_reviews", "publish_date"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.5955761,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"publisher": "oreilly",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
复制代码
可能有一种状况,您想要将文档中特定字段的值归入相关性分数的计算。 这在您但愿基于其受欢迎程度提高文档的相关性的状况下是有表明性的场景
在咱们的例子中,咱们但愿增长更受欢迎的书籍(按评论数量判断)。 这可使用field_value_factor函数得分
GET bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "search engine",
"fields": ["title","summary"]
}
},
"field_value_factor": {
"field": "num_reviews",
"modifier": "log1p",
"factor": 2
}
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 1.5694137,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 1.4725765,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"num_reviews": 23,
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.14181662,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.13297246,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"num_reviews": 12,
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
}
]
}
复制代码
注1:咱们能够运行一个常规的multi_match查询,并按num_reviews字段排序,可是咱们失去了相关性得分的好处。
注2:有许多附加参数能够调整对原始相关性分数 (如“ modifier ”,“ factor ”,“boost_mode”等)的加强效果的程度。
详见 Elasticsearch guide.
假设,咱们不是想经过一个字段的值逐渐增长得分,以获取理想的结果。 举例:价格范围、数字字段范围、日期范围。 在咱们的例子中,咱们正在搜索2014年6月左右出版的“ search engines ”的书籍。
GET bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"exp": {
"publish_date": {
"origin": "2014-06-15",
"scale": "30d",
"offset": "7d"
}
}
}
],
"boost_mode": "replace"
}
},
"_source": ["title", "summary", "publish_date", "num_reviews"]
}
[Results]
"hits": {
"total": 4,
"max_score": 0.22793062,
"hits": [
{
"_index": "bookdb_index",
"_type": "book",
"_id": "4",
"_score": 0.22793062,
"_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
"num_reviews": 23,
"title": "Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "1",
"_score": 0.0049215667,
"_source": {
"summary": "A distibuted real-time search and analytics engine",
"num_reviews": 20,
"title": "Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "2",
"_score": 0.000009612435,
"_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
"num_reviews": 12,
"title": "Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
},
{
"_index": "bookdb_index",
"_type": "book",
"_id": "3",
"_score": 0.0000049185574,
"_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
}
复制代码
在内置计分功能不符合您需求的状况下,能够选择指定用于评分的Groovy脚本
在咱们的示例中,咱们要指定一个考虑到publish_date的脚本,而后再决定考虑多少评论。 较新的书籍可能没有这么多的评论,因此他们不该该为此付出“代价”
得分脚本以下所示:
publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value
if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {
my_score = Math.log(2.5 + num_reviews)
} else {
my_score = Math.log(1 + num_reviews)
}
return my_score
复制代码
要动态使用评分脚本,咱们使用script_score参数
GET /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "search engine",
"fields": ["title","summary"]
}
},
"functions": [
{
"script_score": {
"script": {
"params": {
"threshold": "2015-07-30"
},
"lang": "groovy",
"source": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);"
}
}
}
]
}
},
"_source": ["title","summary","publish_date", "num_reviews"]
}
复制代码
注1:要使用动态脚本,必须为config / elasticsearch.yml文件中的Elasticsearch实例启用它。 也可使用已经存储在Elasticsearch服务器上的脚本。 查看 Elasticsearch reference docs 以获取更多信息。
注2: JSON不能包含嵌入的换行符,所以分号用于分隔语句。
原文做者: by Tim Ojo Aug. 05, 16 · Big Data Zone
原文地址:dzone.com/articles/23…
注意:ES6.3 怎样启用 groovy 脚本?配置未成功
script.allowed_types: inline & script.allowed_contexts: search, update
Java API 实现上面的查询,代码见 github.com/whirlys/ela…
参考文章:
铭毅天下:[译]你必须知道的23个最有用的Elasticseaerch检索技巧
英文原文:23 Useful Elasticsearch Example Queries
更多内容请访问个人我的博客:laijianfeng.org
打开微信扫一扫,关注【小旋锋】微信公众号,及时接收博文推送