elasticsearch实现like查询

时间 2019-11-08

标签 elasticsearch 实现查询栏目日志分析繁體版

原文原文链接

问题

elasticsearch查询须要实现相似于mysql的like查询效果，例如值为hello中国233的记录，便可以经过中国查询出记录，也能够经过llo查询出记录。mysql

可是elasticsearch的查询都是基于分词查询，hello中国233会默认分词为hello、中、国、233。当使用hello查询时能够匹配到该记录，可是使用llo查询时，匹配不到该记录。sql

解决

因为记录内容分词的结果的粒度不够细，致使分词查询匹配不到记录，所以解决方案是将记录内容以每一个字符进行分词。即把hello中国233分词为h、e、l、o、中、国、2、3。bash

elasticsearch默认没有如上效果的分词器，能够经过自定义分词器实现该效果：经过字符过滤器，将字符串的每个字符间添加一个空格，再使用空格分词器将字符串拆分红字符。app

效果

默认分词

PUT /like_search
{
  "mappings": {
    "like_search_type": {
      "properties": {
        "name": {
          "type": "text"
        }
      }
    }
  }
}

PUT /like_search/like_search_type/1
{
  "name": "hello中国233"
}
复制代码

分词效果elasticsearch

GET /like_search/_analyze
{
  "text": [
    "hello中国233"
    ]
}
复制代码

{
  "tokens": [
    {
      "token": "hello",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "中",
      "start_offset": 5,
      "end_offset": 6,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "国",
      "start_offset": 6,
      "end_offset": 7,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "233",
      "start_offset": 7,
      "end_offset": 10,
      "type": "<NUM>",
      "position": 3
    }
  ]
}
复制代码

elasticsearch默认使用standard分词器，以下经过llo查询不到hello中国233的记录。spa

GET /like_search/_search
{
  "query": {
    "match_phrase": {
      "name": "llo"
    }
  }
}
复制代码

自定义分词

PUT /like_search
{
  "settings": {
    "analysis": {
      "analyzer": {
        "char_analyzer": {
          "char_filter": [
            "split_by_whitespace_filter"
          ],
          "tokenizer": "whitespace"
        }
      },
      "char_filter": {
        "split_by_whitespace_filter": {
          "type": "pattern_replace",
          "pattern": "(.+?)",
          "replacement": "$1 "
        }
      }
    }
  },
  "mappings": {
    "like_search_type": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "char_analyzer"
        }
      }
    }
  }
}

PUT /like_search/like_search_type/1
{
  "name": "hello中国233"
}
复制代码

分词效果code

GET /like_search/_analyze
{
  "analyzer": "char_analyzer", 
  "text": [
    "hello中国233"
    ]
}
复制代码

{
  "tokens": [
    {
      "token": "h",
      "start_offset": 0,
      "end_offset": 0,
      "type": "word",
      "position": 0
    },
    {
      "token": "e",
      "start_offset": 1,
      "end_offset": 1,
      "type": "word",
      "position": 1
    },
    {
      "token": "l",
      "start_offset": 2,
      "end_offset": 2,
      "type": "word",
      "position": 2
    },
    {
      "token": "l",
      "start_offset": 3,
      "end_offset": 3,
      "type": "word",
      "position": 3
    },
    {
      "token": "o",
      "start_offset": 4,
      "end_offset": 4,
      "type": "word",
      "position": 4
    },
    {
      "token": "中",
      "start_offset": 5,
      "end_offset": 5,
      "type": "word",
      "position": 5
    },
    {
      "token": "国",
      "start_offset": 6,
      "end_offset": 6,
      "type": "word",
      "position": 6
    },
    {
      "token": "2",
      "start_offset": 7,
      "end_offset": 7,
      "type": "word",
      "position": 7
    },
    {
      "token": "3",
      "start_offset": 8,
      "end_offset": 8,
      "type": "word",
      "position": 8
    },
    {
      "token": "3",
      "start_offset": 9,
      "end_offset": 9,
      "type": "word",
      "position": 9
    }
  ]
}
复制代码

使用自定义的分词器，以下经过llo能够查询到hello中国233的记录。token

GET /like_search/_search
{
  "query": {
    "match_phrase": {
      "name": "llo"
    }
  }
}
复制代码