Elasticsearch 内置的分词器对中文不友好,会把中文分红单个字来进行全文检索,不能达到想要的结果git
IK Analysis for Elasticsearch:https://github.com/medcl/elasticsearch-analysis-ikgithub
ik 带有两个分词器json
建立一个名叫 iktest 的索引,设置它的分析器用 ik ,分词器用 ik_max_word,并建立一个 article 的类型,里面有一个 subject 的字段,指定其使用 ik_max_word 分词器安全
[root@k8s-0001 bin]# curl -H "Content-Type: application/json" -XPUT 'http://114.116.97.49:9200/iktest?pretty' -d '{ > "settings" : { > "analysis" : { > "analyzer" : { > "ik" : { > "tokenizer" : "ik_max_word" > } > } > } > }, > "mappings" : { > "article" : { > "dynamic" : true, > "properties" : { > "subject" : { > "type" : "text", > "analyzer" : "ik_max_word" > } > } > } > } > }' { "acknowledged" : true, "shards_acknowledged" : true, "index" : "iktest" }
批量添加几条数据,这里我指定元数据 _id 方便查看,subject 内容为我随便找的几条新闻的标题网络
[root@k8s-0001 bin]# curl -H "Content-Type: application/json" -XPOST http://114.116.97.49:9200/iktest/article/_bulk?pretty -d ' > { "index" : { "_id" : "1" } } > {"subject" : ""闺蜜"崔顺实被韩检方传唤 韩总统府促彻查真相" } > { "index" : { "_id" : "2" } } > {"subject" : "韩举行"护国训练" 青瓦台:决不准国家安全出问题" } > { "index" : { "_id" : "3" } } > {"subject" : "媒体称FBI已经取得搜查令 检视希拉里电邮" } > { "index" : { "_id" : "4" } } > {"subject" : "村上春树获安徒生奖 演讲中谈及欧洲排外问题" } > { "index" : { "_id" : "5" } } > {"subject" : "希拉里团队炮轰FBI 参院民主党领袖批其“违法”" } > ' { "took" : 10, "errors" : false, "items" : [ { "index" : { "_index" : "iktest", "_type" : "article", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } }, { "index" : { "_index" : "iktest", "_type" : "article", "_id" : "2", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } }, { "index" : { "_index" : "iktest", "_type" : "article", "_id" : "3", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } }, { "index" : { "_index" : "iktest", "_type" : "article", "_id" : "4", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1, "status" : 201 } }, { "index" : { "_index" : "iktest", "_type" : "article", "_id" : "5", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } } ] }
查询 “希拉里和韩国”app
[root@k8s-0001 bin]# curl -H "Content-Type: application/json" -XPOST http://114.116.97.49:9200/iktest/article/_search?pretty -d' > { > "query" : { "match" : { "subject" : "希拉里和韩国" }}, > "highlight" : { > "pre_tags" : ["<font color='red'>"], > "post_tags" : ["</font>"], > "fields" : { > "subject" : {} > } > } > } > ' { "took" : 5, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.2876821, "hits" : [ { "_index" : "iktest", "_type" : "article", "_id" : "5", "_score" : 0.2876821, "_source" : { "subject" : "希拉里团队炮轰FBI 参院民主党领袖批其“违法”" }, "highlight" : { "subject" : [ "<font color=red>希拉里</font>团队炮轰FBI 参院民主党领袖批其“违法”" ] } }, { "_index" : "iktest", "_type" : "article", "_id" : "3", "_score" : 0.2876821, "_source" : { "subject" : "媒体称FBI已经取得搜查令 检视希拉里电邮" }, "highlight" : { "subject" : [ "媒体称FBI已经取得搜查令 检视<font color=red>希拉里</font>电邮" ] } } ] } }
网络词语突飞猛进,如何让新出的网络热词(或特定的词语)实时的更新到咱们的搜索当中呢
先用 ik 测试一下 :curl
Elasticsearch 中文分词器 IK 配置和使用elasticsearch