下面先简单描述一下mapping是什么?html
自动或手动为index中的type创建的一种数据结构和相关配置,简称为mapping
dynamic mapping,自动为咱们创建index,建立type,以及type对应的mapping,mapping中包含了每一个field对应的数据类型,以及如何分词等设置web
当咱们插入几条数据,让ES自动为咱们创建一个索引数组
PUT /website/article/1 { "post_date": "2019-08-21", "title": "my first article", "content": "this is my first article in this website", "author_id": 11400 } PUT /website/article/2 { "post_date": "2019-08-22", "title": "my second article", "content": "this is my second article in this website", "author_id": 11400 } PUT /website/article/3 { "post_date": "2019-08-23", "title": "my third article", "content": "this is my third article in this website", "author_id": 11400 }
查看mapping数据结构
GET /website/_mapping { "website": { "mappings": { "article": { "properties": { "author_id": { "type": "long" }, "content": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "post_date": { "type": "date" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } }
上面是插入数据自动生成的mapping,还有手动生成的mapping。这种自动或手动为index中的type创建的一种数据结构和相关配置,称为mapping。app
尝试各类搜索ide
GET /website/article/_search?q=2019 //3条结果 GET /website/article/_search?q=2019-08-21 //3条结果 GET /website/article/_search?q=post_date:2019-08-21 //1条结果 GET /website/article/_search?q=post_date:2019 //0条结果
搜索结果为何不一致,由于es自动创建mapping的时候,设置了不一样的field不一样的data type。不一样的data type的分词、搜索等行为是不同的。因此出现了_all field和post_date field的搜索表现彻底不同。
下面是手动建立的mapping。post
PUT /test_mapping { "mappings" : { "properties" : { "author_id" : { "type" : "long" }, "content" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "post_date" : { "type" : "date" }, "title" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } }
也就是某个field必须所有匹配才能返回相应的document
示例:测试
GET /website/article/_search?q=post_date:2019-08-21 //1条结果 GET /website/article/_search?q=post_date:2019 //0条结果
exact value,搜索的时候,必须输入2019-08-21,才能搜索出来
若是你输入一个21,是搜索不出来的ui
full text与exact value不同,不是说单纯的只是匹配完整的一个值,而是能够对值进行拆分词语后(分词)进行匹配,也能够经过缩写、时态、大小写、同义词等进行匹配。
示例:this
GET /website/article/_search?q=2019 //3条结果 GET /website/article/_search?q=2019-08-21 //3条结果
下面演示一下倒排索引简单创建的过程,固然实际中倒排索引的创建过程会很是的复杂。
doc1: I really liked my small dogs, and I think my mom also liked them.
doc2: He never liked any dogs, so I hope that my mom will not expect me to liked him.
分词,初步的倒排索引的创建
word doc1 doc2 I * * really * liked * * my * * small * dogs * and * think * mom * * also * them * He * never * any * so * hope * that * will * not * expect * me * to * him *
搜索 mother like little dog, 不会有任何结果
mother
like
little
dog
这确定不是咱们想要的结果。好比mother和mom其实根本就没有区别。可是却检索不到。可是作下测试发现ES是能够查到的。实际上ES在创建倒排索引的时候,还会执行一个操做,就是会对拆分的各个单词进行相应的处理,以提高后面搜索的时候可以搜索到相关联的文档的几率。像时态的转换,单复数的转换,同义词的转换,大小写的转换。这个过程称为正则化(normalization)
mother-> mom
liked -> like
small -> little
dogs -> dog
这样从新创建倒排索引:
word doc1 doc2 I * * really * like * * my * * little * dog * and * think * mom * * also * them * He * never * any * so * hope * that * will * not * expect * me * to * him *
查询:mother like little dog 分词正则化
mother -> mom
like -> like
little -> little
dog -> dog
doc1和doc2都会搜索出来
doc1:I really liked my small dogs, and I think my mom also liked them.
doc2:He never liked any dogs, so I hope that my mom will not expect me to liked him.
切分词语,normalization(提高recall召回率)
给你一段句子,而后将这段句子拆分红一个一个的单个的单词,同时对每一个单词进行normalization(时态转换,单复数转换),分瓷器
recall,召回率:搜索的时候,增长可以搜索到的结果的数量
一个分词器,很重要,将一段文本进行各类处理,最后处理好的结果才会拿去创建倒排索引
内置分词器的介绍:
待分词:Set the shape to semi-transparent by calling set_trans(5) standard analyzer:set, the, shape, to, semi, transparent, by, calling, set_trans, 5(默认的是standard) simple analyzer:set, the, shape, to, semi, transparent, by, calling, set, trans whitespace analyzer:Set, the, shape, to, semi-transparent, by, calling, set_trans(5) language analyzer(特定的语言的分词器,好比说,english,英语分词器):set, shape, semi, transpar, call, set_tran, 5
mapping引入案例遗留问题大揭秘
GET /_search?q=2019
搜索的是_all field,document全部的field都会拼接成一个大串,进行分词
2019-01-02 my second article this is my second article in this website 11400
doc1 doc2 doc3 2019 * * * 01 * 02 * 03 *
_all,2017,天然会搜索到3个docuemnt
GET /_search?q=post_date:2019-01-01
date,会做为exact value去创建索引
doc1 doc2 doc3 2017-01-01 * 2017-01-02 * 2017-01-03 *
语法:
GET /_analyze { "analyzer": "standard", "text": "Text to analyze" }
{ "tokens": [ { "token": "text", "start_offset": 0, "end_offset": 4, "type": "<ALPHANUM>", "position": 0 }, { "token": "to", "start_offset": 5, "end_offset": 7, "type": "<ALPHANUM>", "position": 1 }, { "token": "analyze", "start_offset": 8, "end_offset": 15, "type": "<ALPHANUM>", "position": 2 } ] }
mapping本质上就是index的type的元数据,决定了数据类型,创建倒排索引的行为,还有进行搜索的行为。
string text:字符串类型 byte:字节类型 short:短整型 integer:整型 long:长整型 float:浮点型 boolean:布尔类型 date:时间类型
固然还有一些高级类型,像数组,对象object,但其底层都是text字符串类型
true or false -> boolean 123 -> long 123.45 -> float 2017-01-01 -> date "hello world" -> string text
查看mapping
语法:GET /{index}/_mapping
GET /{index}/_mapping/{type}
注意:只能建立index时手动创建mapping,或者新增field mapping,可是不能update field mapping。
"analyzer": "standard":自动分词
date:日期
keyword:不分词
# 建立索引 PUT /website { "mappings": { "properties": { "author_id": { "type": "long" }, "title": { "type": "text", "analyzer": "standard" }, "content": { "type": "text" }, "post_date": { "type": "date" }, "publisher_id": { "type": "keyword" } } } } #修改字段的mapping PUT /website { "mappings": { "properties": { "author_id": { "type": "text" } } } } { "error": { "root_cause": [ { "type": "resource_already_exists_exception", "reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists", "index_uuid": "5xLohnJITHqCwRYInmBFmA", "index": "website" } ], "type": "resource_already_exists_exception", "reason": "index [website/5xLohnJITHqCwRYInmBFmA] already exists", "index_uuid": "5xLohnJITHqCwRYInmBFmA", "index": "website" }, "status": 400 } #增长mapping的字段 PUT /website/_mapping { "properties": { "new_field": { "type": "text" } } } { "acknowledged" : true }
{ "tags": ["tag1", "tag2"] }
创建索引时与string是同样的,数据类型不能混
null,[],[null]
PUT /company/employee/1 { "address": { "country": "china", "province": "guangdong", "city": "guangzhou" }, "name": "jack", "age": 27, "join_date": "2017-01-01" }
查看mapping
GET /company/_mapping/employee
{ "company": { "mappings": { "employee": { "properties": { "address": { "properties": { "city": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "country": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "province": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } }, "age": { "type": "long" }, "join_date": { "type": "date" }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } }
object field底层解析
{ "address": { "country": "china", "province": "guangdong", "city": "guangzhou" }, "name": "jack", "age": 27, "join_date": "2017-01-01" }
↓↓↓↓
{ "name": [jack], "age": [27], "join_date": [2017-01-01], "address.country": [china], "address.province": [guangdong], "address.city": [guangzhou] }
{ "authors": [ { "age": 26, "name": "Jack White"}, { "age": 55, "name": "Tom Jones"}, { "age": 39, "name": "Kitty Smith"} ] }
↓↓↓↓
{ "authors.age": [26, 55, 39], "authors.name": [jack, white, tom, jones, kitty, smith] }