删除以前的实验索引git
curl -XDELETE http://127.0.0.1:9200/synctest/article
output:
{"acknowledged":true}
复制代码
建立新mappinggithub
curl -XPUT 'http://127.0.0.1:9200/servcie/_mapping/massage' -d ' { "massage":{ "properties":{ "location":{ "type":"geo_point" }, "name":{ "type":"string" }, "age":{ "type":"integer" }, "address":{ "type":"string" }, "price":{ "type":"double", "index":"not_analyzed" }, "is_open":{ "type":"boolean" } } } }'
复制代码
查看新建立的mappingbash
curl -XGET http://127.0.0.1:9200/servcie/massage/_mapping?pretty
{
"servcie" : {
"mappings" : {
"massage" : {
"properties" : {
"address" : {
"type" : "string"
},
"age" : {
"type" : "integer"
},
"is_open" : {
"type" : "boolean"
},
"location" : {
"type" : "geo_point"
},
"name" : {
"type" : "string"
},
"price" : {
"type" : "double"
}
}
}
}
}
}
复制代码
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"text":"波多菠萝蜜"}'
{
"tokens" : [ {
"token" : "波",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
}, {
"token" : "多",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
}, {
"token" : "菠",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
}, {
"token" : "萝",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
}, {
"token" : "蜜",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
} ]
}
复制代码
分词器 是由一个分解器(tokenizer)、零个或多个词元过滤器(token filters)组成app
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"text":"abc dsf,sdsf"}'
复制代码
若是使用中文检索,还必须使用中文分词,平时使用最多的可能就要属IK分词器了。curl
./bin/plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v1.9.3/elasticsearch-analysis-ik-1.9.3.zip
复制代码
重启后查看插件(是否加载成功)elasticsearch
curl -XGET http://localhost:9200/_cat/plugins
Marrow analysis-ik 1.9.3 j
复制代码
使用ik分词分析测试
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"analyzer":"ik","text":"波多菠萝蜜"}'
{
"tokens" : [ {
"token" : "波",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "多",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
}, {
"token" : "菠萝蜜",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "菠萝",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "菠",
"start_offset" : 2,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 4
}, {
"token" : "萝",
"start_offset" : 3,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 5
}, {
"token" : "蜜",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 6
} ]
}
复制代码
能够看到已经多 菠萝、菠萝蜜进行了分词url
随着社会发展和不一样的业务术语, 有些新的词汇,并无收录到咱们的IK分词器, 即便使用match_pharse等查询也存在检索不到数据状况,那咱们该怎么办呢?spa
举个例子, 好比咱们但愿能检索出 “吊炸天” 这个词(1.9.3版本的IK并无被收录)插件
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"analyzer":"ik","text":"吊炸每天不容"}'
{
"tokens" : [ {
"token" : "吊",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "炸",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
}, {
"token" : "每天",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
}, {
"token" : "不容",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 3
} ]
}
复制代码
若是必须的话, 这个时候咱们就须要 修改IK的词库了
咱们 修改analysis-ik/config/ik/custom 下 mydict.dic 文件, 这个文件是专门为咱们拓展词汇准备的, 再最后面添加好新词后保存并重启es便可
curl -XPOST 'http://127.0.0.1:9200/_analyze?pretty' -d '{"analyzer":"ik","text":"吊炸每天不容"}'
{
"tokens" : [ {
"token" : "吊炸天",
"start_offset" : 0,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 0
}, {
"token" : "吊",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_WORD",
"position" : 1
}, {
"token" : "炸",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 2
}, {
"token" : "每天",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
}, {
"token" : "不容",
"start_offset" : 4,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 4
} ]
}
复制代码
咱们能够看到已经对“吊炸天”进行了单独的分词.