ElasticSearch提供了丰富的参数对文档字段进行定义,好比字段的分词器、字段权重、日期格式、检索模型等等。能够查看官网每一个参数的定义及使用:https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-params.html。html
分词器对索引和查询有效:https://www.elastic.co/guide/en/elasticsearch/reference/6.1/analyzer.htmlgit
咱们要测试分词器参数使用首先要安装分词器组件,从https://github.com/medcl/elasticsearch-analysis-ik/releases下载和elasticsearch相匹配的组件版本,这里下载elasticsearch-analysis-ik-6.2.3.zip文件,拷贝到elasticsearch安装目录的plugins文件夹下面,解压,删除zip文件,重启elasticsearch(必定要重启才生效)。github
定义索引:json
DELETE my_index PUT my_index
使用ik_smart分词session
GET my_index/_analyze { "analyzer": "ik_smart", "text": "安徽省长江流域" }
结果app
{ "tokens": [ { "token": "安徽省", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0 }, { "token": "长江流域", "start_offset": 3, "end_offset": 7, "type": "CN_WORD", "position": 1 } ] }
定义mapping,指定字段分词器elasticsearch
PUT my_index/fulltext/_mapping { "properties": { "content":{ "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_max_word" } } }
添加文档ide
PUT my_index/fulltext/1 { "content":"软件测试是很是复杂的工做" } PUT my_index/fulltext/2 { "content":"发改委表示,上半年审核批准固定资产项目102个" } PUT my_index/fulltext/3 { "content":"全球最大资产管理公司贝莱德成立区块链研究组" } PUT my_index/fulltext/4 { "content":"资本投资疯狂,工业产能过剩" }
经过关键字查询区块链
GET my_index/fulltext/_search { "query": { "match": { "content": "资产" } } }
查询结果测试
{ "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.5897495, "hits": [ { "_index": "my_index", "_type": "fulltext", "_id": "2", "_score": 0.5897495, "_source": { "content": "发改委表示,上半年审核批准固定资产项目102个" } }, { "_index": "my_index", "_type": "fulltext", "_id": "3", "_score": 0.2876821, "_source": { "content": "全球最大资产管理公司贝莱德成立区块链研究组" } } ] } }
normalizer用于解析前的标准化配置,好比把全部的字符转化为小写等。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/normalizer.html
定义映射
DELETE my_index PUT my_index { "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [], "filter": ["lowercase", "asciifolding"] } } } }, "mappings": { "my_type": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } }
索引文档
PUT my_index/my_type/1 { "foo": "BÀR" } PUT my_index/my_type/2 { "foo": "bar" } PUT my_index/my_type/3 { "foo": "baz" } POST my_index/_refresh
GET my_index/_search { "query": { "match": { "foo": "BAR" } } }
因为设置foo字段索引时会进行标准化,保存是“BAR”会被转化为“bar”进行保存,在搜索时也会将搜索条件中的“BAR”转化为“bar”进行匹配。
{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.2876821, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 0.2876821, "_source": { "foo": "bar" } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 0.2876821, "_source": { "foo": "BÀR" } } ] } }
经过查询能够统计字段“foo”被反向索引个数
GET my_index/_search { "size": 0, "aggs": { "foo_terms": { "terms": { "field": "foo" } } } }
能够看到"bar"被索引2个,"baz"被索引1个
{ "took": 14, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 0, "hits": [] }, "aggregations": { "foo_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "bar", "doc_count": 2 }, { "key": "baz", "doc_count": 1 } ] } } }
能够经过指定一个boost值来控制每一个查询子句的相对权重,该值默认为1。一个大于1的boost会增长该查询子句的相对权重。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/mapping-boost.html#mapping-boost
DELETE my_index PUT my_index PUT my_index/my_type/1 { "title":"quick brown fox" } GET my_index/_search { "query": { "match" : { "title": { "query": "quick brown fox", "boost":2 } } } }
设定权重2,默认1
{ "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.7260926, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1.7260926, "_source": { "title": "quick brown fox" } } ] } }
数据并不老是干净的,在json中有些熟悉的值的类型不必定就是该数据格式定义的类型,例如json中一个字符串类型"5"表示的意思有可能就是数字类型5。coerce默认为true,elasticsearch会自动将"5"转化为5保存。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/coerce.html#coerce
建立索引,定义文档结构:该文档中包含两个字段,都是integer类型,一个关闭coerce
DELETE my_index PUT my_index { "mappings": { "my_type":{ "properties": { "number_one":{ "type": "integer" }, "number_tow":{ "type": "integer", "coerce":false } } } } }
保存数据
PUT my_index/my_type/1 { "number_one":"5" } PUT my_index/my_type/2 { "number_tow":"5" }
第一个保存成功,第二个保存失败
{ "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [number_tow]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [number_tow]", "caused_by": { "type": "illegal_argument_exception", "reason": "Integer value passed as String" } }, "status": 400 }
copy_to属性用于配置自定义的_all字段。换言之,就是多个字段能够合并成一个超级字段。好比,first_name和last_name能够合并为full_name字段。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/copy-to.html
建立索引,定义文档结构,包含三个字段"first_name"、"last_name"、"full_name",将first_name和last_name的值 赋给full_name。
DELETE my_index PUT my_index { "mappings": { "my_type":{ "properties": { "first_name":{ "type": "text", "copy_to": "full_name" }, "last_name":{ "type": "text", "copy_to": "full_name" }, "full_name":{ "type": "text" } } } } }
保存数据
PUT my_index/my_type/1 { "first_name":"John", "last_name":"Smith" } GET my_index/my_type/_search { "query": { "match": { "full_name": "John Smith" } } }
查询时能够经过first_name对应的值,或者last_name对应的值也能够经过full_name查询同时对应first_name或者last_name。
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.5753642, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 0.5753642, "_source": { "first_name": "John", "last_name": "Smith" } } ] } }
doc_values是为了加快排序、聚合操做,在创建倒排索引的时候,额外增长一个列式存储映射,是一个空间换时间的作法。默认是开启的,对于肯定不须要聚合或者排序的字段能够关闭。
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/doc-values.html#doc-values
DELETE my_index PUT my_index { "mappings": { "my_type":{ "properties": { "status_code":{ "type": "keyword" }, "session_id":{ "type": "keyword", "doc_values":false } } } } }
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/dynamic.html
属性用于检测新发现的字段,有三个取值:
true:新发型的字段添加到映射中(默认)。
false:新检测的字段被忽略,必须显示添加新字段。
strict:若是检测到新字段就会触发异常,并拒绝保存。
定义索引
DELETE my_index PUT my_index { "mappings": { "my_type": { "dynamic":"strict", "properties": { "title":{ "type": "text" } } } } }
保存文档数据
PUT my_index/my_type/2 { "title":"this is a test", "content":"上半年上海市货币信贷运行平稳 我的住房贷款增速回落" }
由于content字段没有在mapping中定义,且设置dynamic为strict。保存是异常
{ "error": { "root_cause": [ { "type": "strict_dynamic_mapping_exception", "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed" } ], "type": "strict_dynamic_mapping_exception", "reason": "mapping set to strict, dynamic introduction of [content] within [my_type] is not allowed" }, "status": 400 }
ELasticseaech默认会索引全部的字段,enabled设为false的字段,es会跳过字段内容,该字段只能从_source中获取,可是不可搜。
以下建立索引,插入数据
DELETE my_index PUT my_index { "mappings": { "my_type":{ "properties": { "name":{ "enabled":false } } } } } PUT my_index/my_type/1 { "name":"sean", "title":"this is a test" }
搜索name
GET /my_index/_search { "query": { "match": { "name": "sean" } } }
由于name字段设置enabled为false,因此不能做为条件搜索
{ "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }