23个最有用的Elasticseaerch检索技巧（下）

前言

本文主要介绍 Elasticsearch 23种最有用的检索技巧，提供了详尽的源码举例，并配有相应的Java API实现，是不可多得的 Elasticsearch 学习&实战资料javascript

注：因为公众号推送每篇文章最多 5w 字，因此原文分为两篇分别推送，本文为第二篇
测试使用的ES版本为6.3.2java

十二、Term/Terms检索（指定字段检索）

上面1-11小节的例子是全文搜索的例子。有时咱们对结构化搜索更感兴趣，咱们但愿在其中找到彻底匹配并返回结果git

在下面的例子中，咱们搜索 Manning Publications 发布的索引中的全部图书（借助 term和terms查询）github

GET bookdb_index/book/_search
{
  "query": {
    "term": {
      "publisher": {
        "value": "manning"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"]
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 0.35667494,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      }
    ]
  }

Multiple terms可指定多个关键词进行检索缓存

GET bookdb_index/book/_search
{
  "query": {
    "terms": {
      "publisher": ["oreilly", "manning"]
    }
  }
}

1三、Term排序检索-（Term Query - Sorted）

Term查询和其余查询同样，轻松的实现排序。多级排序也是容许的服务器

GET bookdb_index/book/_search
{
  "query": {
    "term": {
      "publisher": {
        "value": "manning"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"],
  "sort": [{"publisher.keyword": { "order": "desc"}},
    {"title.keyword": {"order": "asc"}}]
}

[Results]
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        },
        "sort": [
          "manning",
          "Elasticsearch in Action"
        ]
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        },
        "sort": [
          "manning",
          "Solr in Action"
        ]
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        },
        "sort": [
          "manning",
          "Taming Text: How to Find, Organize, and Manipulate It"
        ]
      }
    ]
  }

注意：Elasticsearch 6.x 全文搜索用text类型的字段，排序用 number, date 或 keyword 等类型的字段微信

1四、范围检索（Range query）

另外一个结构化检索的例子是范围检索。下面的举例中，咱们检索了2015年发布的书籍。app

GET bookdb_index/book/_search
{
  "query": {
    "range": {
      "publish_date": {
        "gte": "2015-01-01",
        "lte": "2015-12-31"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"]
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1,
        "_source": {
          "publisher": "oreilly",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      }
    ]
  }

注意：范围查询适用于日期，数字和字符串类型字段elasticsearch

1五、过滤检索（Filtered query）

（5.0版本起已再也不存在，没必要关注）ide

过滤的查询容许您过滤查询的结果。以下的例子，咱们在标题或摘要中查询名为“Elasticsearch”的图书，可是咱们但愿将结果过滤到只有20个或更多评论的结果。

POST /bookdb_index/book/_search
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "range" : {
                    "num_reviews": {
                        "gte": 20
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews"]
}


[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.5955761,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "publisher": "oreilly",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide"
        }
      }
    ]

注意：已过滤的查询不要求存在要过滤的查询。若是没有指定查询，则运行 match_all 查询，基本上返回索引中的全部文档，而后对其进行过滤。
实际上，首先运行过滤器，减小须要查询的表面积。此外，过滤器在第一次使用后被缓存，这使得它很是有效

更新： 已筛选的查询已推出的Elasticsearch 5.X版本中移除，有利于布尔查询。这是与上面重写的使用bool查询相同的示例。返回的结果是彻底同样的。

GET bookdb_index/book/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "elasticsearch",
            "fields": ["title","summary"]
          }
        }
      ],
      "filter": {
        "range": {
          "num_reviews": {
            "gte": 20
          }
        }
      }
    }
  },
  "_source" : ["title","summary","publisher", "num_reviews"]
}

1六、多个过滤器检索（Multiple Filters）

（5.x再也不支持，无需关注）
多个过滤器能够经过使用布尔过滤器进行组合。

在下一个示例中，过滤器肯定返回的结果必须至少包含20个评论，不得在2015年以前发布，而且应该由oreilly发布

POST /bookdb_index/book/_search
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "bool": {
                    "must": {
                        "range" : { "num_reviews": { "gte": 20 } }
                    },
                    "must_not": {
                        "range" : { "publish_date": { "lte": "2014-12-31" } }
                    },
                    "should": {
                        "term": { "publisher": "oreilly" }
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews", "publish_date"]
}


[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.5955761,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "publisher": "oreilly",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]

1七、 Function 得分：Field值因子（ Function Score: Field Value Factor）

可能有一种状况，您想要将文档中特定字段的值归入相关性分数的计算。这在您但愿基于其受欢迎程度提高文档的相关性的状况下是有表明性的场景

在咱们的例子中，咱们但愿增长更受欢迎的书籍（按评论数量判断）。这可使用field_value_factor函数得分

GET bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title","summary"]
        }
      },
      "field_value_factor": {
        "field": "num_reviews",
        "modifier": "log1p",
        "factor": 2
      }
    }
  },
  "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[Results]
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1.5694137,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.4725765,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.14181662,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.13297246,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      }
    ]
  }

注1：咱们能够运行一个常规的multi_match查询，并按num_reviews字段排序，可是咱们失去了相关性得分的好处。
注2：有许多附加参数能够调整对原始相关性分数
（如“ modifier ”，“ factor ”，“boost_mode”等）的加强效果的程度。
详见 Elasticsearch guide.

1八、 Function 得分：衰减函数( Function Score: Decay Functions )

假设，咱们不是想经过一个字段的值逐渐增长得分，以获取理想的结果。举例：价格范围、数字字段范围、日期范围。在咱们的例子中，咱们正在搜索2014年6月左右出版的“ search engines ”的书籍。

GET bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title", "summary"]
        }
      },
      "functions": [
        {
          "exp": {
            "publish_date": {
              "origin": "2014-06-15",
              "scale": "30d",
              "offset": "7d"
            }
          }
        }
      ],
      "boost_mode": "replace"
    }
  },
  "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[Results]
  "hits": {
    "total": 4,
    "max_score": 0.22793062,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.22793062,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.0049215667,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.000009612435,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.0000049185574,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      }
    ]
  }

1九、Function得分：脚本得分（ Function Score: Script Scoring ）

在内置计分功能不符合您需求的状况下，能够选择指定用于评分的Groovy脚本

在咱们的示例中，咱们要指定一个考虑到publish_date的脚本，而后再决定考虑多少评论。较新的书籍可能没有这么多的评论，因此他们不该该为此付出“代价”

得分脚本以下所示:

publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value

if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {
  my_score = Math.log(2.5 + num_reviews)
} else {
  my_score = Math.log(1 + num_reviews)
}
return my_score

要动态使用评分脚本，咱们使用script_score参数

GET /bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title","summary"]
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "params": {
                "threshold": "2015-07-30"
              },  
              "lang": "groovy", 
              "source": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);"
            }
          }
        }
      ]
    }
  },
  "_source": ["title","summary","publish_date", "num_reviews"]
}

注1：要使用动态脚本，必须为config / elasticsearch.yml文件中的Elasticsearch实例启用它。也可使用已经存储在Elasticsearch服务器上的脚本。查看 Elasticsearch reference docs 以获取更多信息。
注2： JSON不能包含嵌入的换行符，所以分号用于分隔语句。
原文做者： by Tim Ojo Aug. 05, 16 · Big Data Zone
原文地址：https://dzone.com/articles/23-useful-elasticsearch-example-queries

注意：ES6.3 怎样启用 groovy 脚本？配置未成功
script.allowed_types: inline & script.allowed_contexts: search, update

Java API 实现

Java API 实现上面的查询，代码见 https://github.com/whirlys/elastic-example/tree/master/UsefullESSearchSkill

注：Java API 实现仍在测试中，尽快上传

更多内容请访问个人我的博客：http://laijianfeng.org
参考文章：
铭毅天下:[译]你必须知道的23个最有用的Elasticseaerch检索技巧
英文原文：23 Useful Elasticsearch Example Queries

本文分享自微信公众号 - 小旋锋（whirlysBigData）。
若有侵权，请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一块儿分享。