elasticsearch实现like查询

问题

elasticsearch查询须要实现相似于mysql的like查询效果,例如值为hello中国233的记录,便可以经过中国查询出记录,也能够经过llo查询出记录。mysql

可是elasticsearch的查询都是基于分词查询,hello中国233会默认分词为hello233。当使用hello查询时能够匹配到该记录,可是使用llo查询时,匹配不到该记录。sql

解决

因为记录内容分词的结果的粒度不够细,致使分词查询匹配不到记录,所以解决方案是将记录内容以每一个字符进行分词。即把hello中国233分词为helo23bash

elasticsearch默认没有如上效果的分词器,能够经过自定义分词器实现该效果:经过字符过滤器,将字符串的每个字符间添加一个空格,再使用空格分词器将字符串拆分红字符。app

效果

默认分词

PUT /like_search
{
  "mappings": {
    "like_search_type": {
      "properties": {
        "name": {
          "type": "text"
        }
      }
    }
  }
}

PUT /like_search/like_search_type/1
{
  "name": "hello中国233"
}
复制代码

分词效果elasticsearch

GET /like_search/_analyze
{
  "text": [
    "hello中国233"
    ]
}
复制代码
{
  "tokens": [
    {
      "token": "hello",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "中",
      "start_offset": 5,
      "end_offset": 6,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "国",
      "start_offset": 6,
      "end_offset": 7,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "233",
      "start_offset": 7,
      "end_offset": 10,
      "type": "<NUM>",
      "position": 3
    }
  ]
}
复制代码

elasticsearch默认使用standard分词器,以下经过llo查询不到hello中国233的记录。spa

GET /like_search/_search
{
  "query": {
    "match_phrase": {
      "name": "llo"
    }
  }
}
复制代码

自定义分词

PUT /like_search
{
  "settings": {
    "analysis": {
      "analyzer": {
        "char_analyzer": {
          "char_filter": [
            "split_by_whitespace_filter"
          ],
          "tokenizer": "whitespace"
        }
      },
      "char_filter": {
        "split_by_whitespace_filter": {
          "type": "pattern_replace",
          "pattern": "(.+?)",
          "replacement": "$1 "
        }
      }
    }
  },
  "mappings": {
    "like_search_type": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "char_analyzer"
        }
      }
    }
  }
}

PUT /like_search/like_search_type/1
{
  "name": "hello中国233"
}
复制代码

分词效果code

GET /like_search/_analyze
{
  "analyzer": "char_analyzer", 
  "text": [
    "hello中国233"
    ]
}
复制代码
{
  "tokens": [
    {
      "token": "h",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 0
    },
    {
      "token": "e",
      "start_offset": 1,
      "end_offset": 1,
      "type": "word",
      "position": 1
    },
    {
      "token": "l",
      "start_offset": 2,
      "end_offset": 2,
      "type": "word",
      "position": 2
    },
    {
      "token": "l",
      "start_offset": 3,
      "end_offset": 3,
      "type": "word",
      "position": 3
    },
    {
      "token": "o",
      "start_offset": 4,
      "end_offset": 4,
      "type": "word",
      "position": 4
    },
    {
      "token": "中",
      "start_offset": 5,
      "end_offset": 5,
      "type": "word",
      "position": 5
    },
    {
      "token": "国",
      "start_offset": 6,
      "end_offset": 6,
      "type": "word",
      "position": 6
    },
    {
      "token": "2",
      "start_offset": 7,
      "end_offset": 7,
      "type": "word",
      "position": 7
    },
    {
      "token": "3",
      "start_offset": 8,
      "end_offset": 8,
      "type": "word",
      "position": 8
    },
    {
      "token": "3",
      "start_offset": 9,
      "end_offset": 9,
      "type": "word",
      "position": 9
    }
  ]
}
复制代码

使用自定义的分词器,以下经过llo能够查询到hello中国233的记录。token

GET /like_search/_search
{
  "query": {
    "match_phrase": {
      "name": "llo"
    }
  }
}
复制代码
相关文章
相关标签/搜索