Elasticsearch 结构化搜索、keyword、Term查询

时间 2021-03-17

标签 html app less elasticsearch 分布式 ide 网站 code htm 栏目日志分析繁體版

原文原文链接

前言

Elasticsearch 中的结构化搜索，即面向数值、日期、时间、布尔等类型数据的搜索，这些数据类型格式精确，一般使用基于词项的term精确匹配或者prefix前缀匹配。本文还将新版本的“text”，“keyword”进行说明，还有Term查询。html

结构化搜索

结构化搜索（Structured search）是指对结构化的数据进行搜索。好比日期、时间和数字都是结构化的，它们有精确的格式，咱们能够对这些格式进行逻辑操做。比较常见的操做包括比较数字或时间的范围、断定两个值的大小、前缀匹配等。app

文本也能够是结构化的。如彩色笔能够有离散的颜色集合：红（red）、绿（green）、蓝（blue）。一个博客可能被标记了关键词分布式（distributed）和搜索（search）。电商网站上的商品都有 UPCs（通用产品码 Universal Product Codes）或其余的惟一标识，它们都须要听从严格规定的、结构化的格式。less

在结构化查询中，咱们获得的结果只有“是”或“否”两个值，能够根据场景须要，决定结构化搜索是否须要打分，但一般咱们是不须要打分的。elasticsearch

精确值查找

让咱们如下面的例子开始介绍，建立并索引一些表示产品的文档，文档里有字段 price ，productID，show，createdAt，tags （ 价格，产品ID，是否展现，建立时间， 打标信息）分布式

POST products/_doc/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10, "productID" : "XHDK-A-1293-#fJ3", "show":true, "createdAt":"2021-03-03", "tags":"abc" }
{ "index": { "_id": 2 }}
{ "price" : 20, "productID" : "KDKE-B-9947-#kL5", "show":true, "createdAt":"2021-03-04" }
{ "index": { "_id": 3 }}
{ "price" : 30, "productID" : "JODL-X-1937-#pV7", "show":false, "createdAt":"2021-03-05"}
{ "index": { "_id": 4 }}
{ "price" : 30, "productID" : "QQPX-R-3956-#aD8", "show":true, "createdAt":"2021-03-06"}

数字

如今咱们想要作的是查找具备某个价格的全部产品，假设咱们要获取价格是20元的商品，咱们能够使用 term 查询，以下ide

GET products/_search
{
  "query": {
    "term": {
      "price": 20
    }
  }
}

一般查找一个精确值的时候，咱们不但愿对查询进行评分计算。只但愿对文档进行包括或排除的计算，因此咱们会使用 constant_score 查询以非评分模式来执行 term 查询并以1.0做为统一评分。网站

最终组合的结果是一个 constant_score 查询，它包含一个 term 查询：ui

GET products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "price": 20
        }
      }
    }
  }
}

对于数字，通常还有范围查询code

GET products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 10,
            "lte": 20
          }
        }
      }
    }
  }
}

range 支持的选项htm

gt: > 大于（greater than）
lt: < 小于（less than）
gte: >= 大于或等于（greater than or equal to）
lte: <= 小于或等于（less than or equal to）

布尔值

GET products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "show": true
        }
      }
    }
  }
}

日期

搜索必定时间范围内的文档

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "createdAt": {
            "gte": "now-9d"
          }
        }
      }
    }
  }
}

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "createdAt": {
            "gte": "2021-01-05"
          }
        }
      }
    }
  }
}

日期匹配表达式

y 年
M 月
w 周
d 天
H/h 小时
m 分钟
s 秒

文本

POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "terms": {
          "productID.keyword": [
            "XHDK-A-1293-#fJ3",
            "KDKE-B-9947-#kL5"
          ]
        }
      }
    }
  }
}

“productID.keyword”中的“keyword”不是关键字，而是Elasticsearch在插入文档的时候，自动为“productID”生成的子字段，名字是“keyword”。

null 处理

存在用“exists”，不存在用“must_not”搭配“exists”

// 存在“tags”字段
POST products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "exists": {
                    "field":"tags"
                }
            }
        }
    }
}

// 不存在“tags”字段，老版本用“missing”关键字，如今已经废除了
POST products/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must_not": {
            "exists": {
              "field": "tags"
            }
          }
        }
      }
    }
  }
}

注意，新版本不要再使用“missing”关键字，如今已经废除了，用“must_not”作取反。
使用“missing”会报错，报错信息以下：

"reason": "no [query] registered for [missing]"

keyword

在2.x版本里面文本使用的是string字段。
5.0以后，把string字段设置为了过期字段，引入text与keyword字段，这两个字段均可以存储字符串使用。

“text”用于全文搜索，“keyword”用于结构化搜索。“keyword”相似Java中的枚举。在新版本中，若是没有本身建立mapping，那么在文本的处理中，会把文本自动映射为“text”，同时会生成一个子字段“keyword”，类型是“keyword”。

在存储上，“text”会被分词器进行分词，而“keyword”会被原样保留。好比“Rabit is jumping”，“text”的状况下可能被存储为“rabit”，“jump”，而“keyword”状况下就会存储为“Rabit is jumping”。

Term查询

在ES中，term查询，对输入不作分词，会将输入做为一个总体，在倒排索引中查找精确的词项，而且使用相关性算分公式为每一个包含该词项的文档进行相关度算分。

好比上面的（"productID": "QQPX-R-3956-#aD8"），会被分词为“qqpx”，“r”，“3956”，“ad8”。

“productID.keyword”的类型是keyword，因此即便使用match查询，最终也会变成Term查询。

// "productID.keyword": "qqpx-r-3956-#ad8" 没搜索出数据，其余都有
GET products/_search
{
  "query": {
    "match": {
      //"productID": "QQPX-R-3956-#aD8"
      //"productID": "qqpx"
      //"productID": "qqpx-r-3956-#ad8"
      //"productID.keyword": "QQPX-R-3956-#aD8"
      "productID.keyword": "qqpx-r-3956-#ad8"
    }
  }
}

// "productID": "qqpx" 与 "productID.keyword": "QQPX-R-3956-#aD8" 能够搜索出数据，其余不行
GET products/_search
{
  "query": {
    "term": {
      "productID": "QQPX-R-3956-#aD8"
      //"productID": "qqpx"
      //"productID": "qqpx-r-3956-#ad8"
      //"productID.keyword": "QQPX-R-3956-#aD8"
      //"productID.keyword": "qqpx-r-3956-#ad8"
    }
  }
}