ES9-mapping参数

时间 2019-12-19

标签 es9 mapping 参数繁體版

原文原文链接

1.概述

ElasticSearch提供了丰富的参数对文档字段进行定义，好比字段的分词器、字段权重、日期格式、检索模型等等。能够查看官网每一个参数的定义及使用：https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-params.html。html

2.analyzer

分词器对索引和查询有效：https://www.elastic.co/guide/en/elasticsearch/reference/6.1/analyzer.htmlgit

咱们要测试分词器参数使用首先要安装分词器组件，从https://github.com/medcl/elasticsearch-analysis-ik/releases下载和elasticsearch相匹配的组件版本，这里下载elasticsearch-analysis-ik-6.2.3.zip文件，拷贝到elasticsearch安装目录的plugins文件夹下面，解压，删除zip文件，重启elasticsearch（必定要重启才生效）。github

定义索引：json

DELETE my_index

PUT my_index

使用ik_smart分词session

GET my_index/_analyze
{
  "analyzer": "ik_smart",
  "text": "安徽省长江流域"
}

结果app

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "长江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

定义mapping，指定字段分词器elasticsearch

PUT my_index/fulltext/_mapping
{
  "properties": {
    "content":{
      "type": "text",
      "analyzer": "ik_max_word",
      "search_analyzer": "ik_max_word"
    }
  }
}

添加文档ide

PUT my_index/fulltext/1
{
  "content":"软件测试是很是复杂的工做"
}

PUT my_index/fulltext/2
{
  "content":"发改委表示，上半年审核批准固定资产项目102个"
}

PUT my_index/fulltext/3
{
  "content":"全球最大资产管理公司贝莱德成立区块链研究组"
}

PUT my_index/fulltext/4
{
  "content":"资本投资疯狂，工业产能过剩"
}

经过关键字查询区块链

GET my_index/fulltext/_search
{
  "query": {
    "match": {
      "content": "资产"
    }
  }
}

查询结果测试

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5897495,
    "hits": [
      {
        "_index": "my_index",
        "_type": "fulltext",
        "_id": "2",
        "_score": 0.5897495,
        "_source": {
          "content": "发改委表示，上半年审核批准固定资产项目102个"
        }
      },
      {
        "_index": "my_index",
        "_type": "fulltext",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "content": "全球最大资产管理公司贝莱德成立区块链研究组"
        }
      }
    ]
  }
}

3.normalizer

normalizer用于解析前的标准化配置，好比把全部的字符转化为小写等。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/normalizer.html

定义映射

DELETE my_index

PUT my_index
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": ["lowercase", "asciifolding"]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "foo": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

索引文档

PUT my_index/my_type/1
{
  "foo": "BÀR"
}

PUT my_index/my_type/2
{
  "foo": "bar"
}

PUT my_index/my_type/3
{
  "foo": "baz"
}

POST my_index/_refresh

GET my_index/_search
{
  "query": {
    "match": {
      "foo": "BAR"
    }
  }
}

因为设置foo字段索引时会进行标准化，保存是“BAR”会被转化为“bar”进行保存，在搜索时也会将搜索条件中的“BAR”转化为“bar”进行匹配。

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "foo": "bar"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "foo": "BÀR"
        }
      }
    ]
  }
}

经过查询能够统计字段“foo”被反向索引个数

GET my_index/_search
{
  "size": 0,
  "aggs": {
    "foo_terms": {
      "terms": {
        "field": "foo"
      }
    }
  }
}

能够看到"bar"被索引2个，"baz"被索引1个

{
  "took": 14,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "foo_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "bar",
          "doc_count": 2
        },
        {
          "key": "baz",
          "doc_count": 1
        }
      ]
    }
  }
}

4.boost

能够经过指定一个boost值来控制每一个查询子句的相对权重，该值默认为1。一个大于1的boost会增长该查询子句的相对权重。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-boost.html#mapping-boost

DELETE my_index

PUT my_index

PUT my_index/my_type/1
{
  "title":"quick brown fox"
}

GET my_index/_search
{
    "query": {
        "match" : {
            "title": {
                "query": "quick brown fox",
                "boost":2
            }
        }
    }
}

设定权重2，默认1

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.7260926,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1.7260926,
        "_source": {
          "title": "quick brown fox"
        }
      }
    ]
  }
}

5.coerce

数据并不老是干净的，在json中有些熟悉的值的类型不必定就是该数据格式定义的类型，例如json中一个字符串类型"5"表示的意思有可能就是数字类型5。coerce默认为true，elasticsearch会自动将"5"转化为5保存。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/coerce.html#coerce

建立索引，定义文档结构：该文档中包含两个字段，都是integer类型，一个关闭coerce

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type":{
      "properties": {
        "number_one":{
          "type": "integer"
        },
        "number_tow":{
          "type": "integer",
          "coerce":false
        }
      }
    }
  }
}

保存数据

PUT my_index/my_type/1
{
  "number_one":"5"
}

PUT my_index/my_type/2
{
  "number_tow":"5"
}

第一个保存成功，第二个保存失败

{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse [number_tow]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse [number_tow]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Integer value passed as String"
    }
  },
  "status": 400
}

6.copy_to

copy_to属性用于配置自定义的_all字段。换言之，就是多个字段能够合并成一个超级字段。好比，first_name和last_name能够合并为full_name字段。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/copy-to.html

建立索引，定义文档结构，包含三个字段"first_name"、"last_name"、"full_name"，将first_name和last_name的值赋给full_name。

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type":{
      "properties": {
        "first_name":{
          "type": "text",
          "copy_to": "full_name"
        },
        "last_name":{
          "type": "text",
          "copy_to": "full_name"
        },
        "full_name":{
          "type": "text"
        }
      }
    }
  }
}

保存数据

PUT my_index/my_type/1
{
  "first_name":"John",
  "last_name":"Smith"
}

GET my_index/my_type/_search
{
  "query": {
    "match": {
      "full_name": "John Smith"
    }
  }
}

查询时能够经过first_name对应的值，或者last_name对应的值也能够经过full_name查询同时对应first_name或者last_name。

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "first_name": "John",
          "last_name": "Smith"
        }
      }
    ]
  }
}

7.doc_values

doc_values是为了加快排序、聚合操做，在创建倒排索引的时候，额外增长一个列式存储映射，是一个空间换时间的作法。默认是开启的，对于肯定不须要聚合或者排序的字段能够关闭。

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/doc-values.html#doc-values

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type":{
      "properties": {
        "status_code":{
          "type": "keyword"
        },
        "session_id":{
          "type": "keyword",
          "doc_values":false
        }
      }
    }
  }
}

8.dynamic

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/dynamic.html

属性用于检测新发现的字段，有三个取值:

true：新发型的字段添加到映射中（默认）。

false：新检测的字段被忽略，必须显示添加新字段。

strict：若是检测到新字段就会触发异常，并拒绝保存。

定义索引

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic":"strict",
      "properties": {
        "title":{
          "type": "text"
        }
      }
    }
  }
}

保存文档数据

PUT my_index/my_type/2
{
  "title":"this is a test",
  "content":"上半年上海市货币信贷运行平稳 我的住房贷款增速回落"
}

由于content字段没有在mapping中定义，且设置dynamic为strict。保存是异常

{
  "error": {
    "root_cause": [
      {
        "type": "strict_dynamic_mapping_exception",
        "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
      }
    ],
    "type": "strict_dynamic_mapping_exception",
    "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed"
  },
  "status": 400
}

9.enabled

ELasticseaech默认会索引全部的字段，enabled设为false的字段，es会跳过字段内容，该字段只能从_source中获取，可是不可搜。

以下建立索引，插入数据

DELETE my_index

PUT my_index
{
  "mappings": {
    "my_type":{
      "properties": {
        "name":{
          "enabled":false
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "name":"sean",
  "title":"this is a test"
}

搜索name

GET /my_index/_search
{
  "query": {
    "match": {
      "name": "sean"
    }
  }
}

由于name字段设置enabled为false，因此不能做为条件搜索

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}