Elasticsearch自带的分词器对中文分词不是很友好,因此咱们下载开源的IK分词器来解决这个问题。首先进入到plugins目录中下载分词器,下载完成后而后解压,再重启es便可。具体步骤以下: 注意:elasticsearch的版本和ik分词器的版本须要保持一致,否则在重启的时候会失败。能够在这查看全部版本,选择合适本身版本的右键复制连接地址便可。在该连接中找到符合本身版本的:https://github.com/medcl/elasticsearch-analysis-ik/releasesgit
docker exec -it elasticsearch /bin/bash cd /usr/share/elasticsearch/plugins/ elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.5.1/elasticsearch-analysis-ik-7.5.1.zip exit docker restart elasticsearch
因为经过上述方式安装因为网络问题可能实现不了,因此能够经过离线安装github
经过https://github.com/medcl/elasticsearch-analysis-ik/releases下载对应版本安装包 在es的plugins文件下(/usr/share/elasticsearch/plugins/)建立ik文件夹 cd /usr/share/elasticsearch/plugins/ mkdir ik 将下载好的安装包拷贝在这个文件夹下,同时减压便可
注意:安装es的ik分词器须要安装jdkdocker
测试:bash
POST http://localhost:9200/_analyze?pretty=true { "analyzer": "ik_max_word", "text": "中国人民的儿子" }
结果:网络
{ "tokens" : [ { "token" : "中国人民", "start_offset" : 0, "end_offset" : 4, "type" : "CN_WORD", "position" : 0 }, { "token" : "中国人", "start_offset" : 0, "end_offset" : 3, "type" : "CN_WORD", "position" : 1 }, { "token" : "中国", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 2 }, { "token" : "国人", "start_offset" : 1, "end_offset" : 3, "type" : "CN_WORD", "position" : 3 }, { "token" : "人民", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 4 }, { "token" : "的", "start_offset" : 4, "end_offset" : 5, "type" : "CN_CHAR", "position" : 5 }, { "token" : "儿子", "start_offset" : 5, "end_offset" : 7, "type" : "CN_WORD", "position" : 6 } ] }