Elasticsearch 理解mapping中的store属性

时间 2021-08-13

标签 html 数组 app elasticsearch ide code htm 索引文档栏目日志分析繁體版

原文原文链接

默认状况下，对字段值进行索引以使其可搜索，但不存储它们 (store)。这意味着能够查询该字段，可是没法检索原始字段值。在这里咱们必须理解的一点是: 若是一个字段的mapping中含有store属性为true，那么有一个单独的存储空间为这个字段作存储，并且这个存储是独立于_source的存储的。它具备更快的查询。存储该字段会占用磁盘空间。若是须要从文档中提取（即在脚本中和聚合），它会帮助减小计算。在聚合时，具备store属性的字段会比不具备这个属性的字段快。此选项的可能值为false和true。html

一般这可有可无。该字段值已是_source字段的一部分，默认状况下已存储。若是您只想检索单个字段或几个字段的值，而不是整个_source的值，则能够使用source filtering来实现。数组

在某些状况下，存储字段可能颇有意义。例如，若是您有一个带有标题，日期和很大的内容字段的文档，则可能只想检索标题和日期，而没必要从较大的_source字段中提取这些字段。app

接下来咱们仍是经过一个具体的例子来解释这个，虽然上面的描述有点绕口。elasticsearch

首先咱们来建立一个叫作my_index的索引：ide

PUT my_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "store": true 
      },
      "date": {
        "type": "date",
        "store": true 
      },
      "content": {
        "type": "text"
      }
    }
  }
}

在上面的mapping中，咱们把title及date字段里的store属性设置为true，代表有一个单独的index fragement是为它们而配备的，并存储它们的值。咱们来写入一个文档到my_index索引中：ui

PUT my_index/_doc/1
{
  "title": "Some short title",
  "date": "2015-01-01",
  "content": "A very long content field..."
}

接下来，咱们来作一个搜索：code

GET my_index/_search

显示的结果是：htm

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "Some short title",
          "date" : "2015-01-01",
          "content" : "A very long content field..."
        }
      }
    ]
  }

在上面咱们能够在_source中看到这个文档的title，date及content字段。索引

咱们能够经过source filtering的方法提早咱们想要的字段：文档

GET my_index/_search
{
  "_source": ["title", "date"]
}

显示的结果是：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "date" : "2015-01-01",
          "title" : "Some short title"
        }
      }
    ]
  }

显然上面的结果显示咱们想要的字段date及title是能够从_source里获取的。

咱们也能够经过以下的方法来获取这两个字段的值：

GET my_index/_search
{
  "stored_fields": [
    "title",
    "date"
  ]
}

返回的结果是：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "date" : [
            "2015-01-01T00:00:00.000Z"
          ],
          "title" : [
            "Some short title"
          ]
        }
      }
    ]
  }

在上面，咱们能够看出来在fields里有一个date及title的数组返回查询的结果。

也许咱们不少人想知道到底这个store到底有什么用途呢？若是都能从_source里获得字段的值。

有一种就是咱们在开头咱们已经说明的状况：咱们有时候并不想存下全部的字段在_source里，由于该字段的内容很大，或者咱们根本就不想存_source，可是有些字段，咱们仍是想要获取它们的内容。那么在这种状况下，咱们就能够使用store来实现。

咱们仍是用一个例子来讲明。首先建立一个叫作my_index1的索引：

PUT my_index1
{
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "title": {
        "type": "text",
        "store": true
      },
      "date": {
        "type": "date",
        "store": true
      },
      "content": {
        "type": "text",
        "store": false
      }
    }
  }
}

由于咱们认为content字段的内容可能会很大，那么我不想存这个字段。在上面，咱们也把_source的enabled开关设置为false，代表将不存储任何的source字段。接下来写入一个文档到my_index1里去：

PUT my_index1/_doc/1
{
  "title": "Some short title",
  "date": "2015-01-01",
  "content": "A very long content field..."
}

一样咱们来作一个搜索：

GET my_index1/_search
{
  "query": {
    "match": {
      "content": "content"
    }
  }
}

咱们能够看到搜索的结果：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "my_index1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821
      }
    ]
  }

在此次的显示中，咱们没有看到_source字段，这是由于咱们已经把它给disabled了。可是咱们能够经过以下的方法来获取那些store 字段：

GET my_index1/_search
{
  "stored_fields": [
    "title",
    "date"
  ],
  "query": {
    "match": {
      "content": "content"
    }
  }
}

返回结果是：

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "my_index1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "fields" : {
          "date" : [
            "2015-01-01T00:00:00.000Z"
          ],
          "title" : [
            "Some short title"
          ]
        }
      }
    ]
  }

咱们能够在返回结果里查看到date及title的值。

能够合理地存储字段的另外一种状况是，对于那些未出如今_source字段（例如copy_to字段）中的字段。您能够参阅个人另一篇文章“如何使用Elasticsearch中的copy_to来提升搜索效率”。

若是你想了解更多关于Elasticsearch的存储，能够阅读文章“Elasticsearch：inverted index，doc_values及source”。

参考：