【ES】match_phrase与regexp

时间 2019-11-13

标签 match phrase regexp 繁體版

原文原文链接

刚开始接触es，因为弄不清楚match_phrase和regexp致使不少查询结果与预想的不一样。在这整理一下。正则表达式

regexp：针对的是单个词项spa

match_phrase：针对的是多个词项的相对位置code

它们的查询结果跟分析器分词的方式有很大关系。regexp

好比，我有两个字符串"HELLO-world" 和 "hello.WORLD"，字段名称是title。blog

针对"HELLO-world"，看下面两个语句。第二个是能够匹配的，第一个不能够。token

{ "regexp": { "title": "hello-w.*" }} 
{ "match_phrase": { "title": "hello world" }}

分析一下，能够看到，HELLO-world被分为了两个单词，hello和world。字符串

-GET _analyze
{        
    "field": "title",
    "text": "HELLO-world"
}
---------------------------
{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "world",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

首先，es是没有大写的，全部的字符都被转换成了小写。其次，"-"字符丢失了。it

regexp是针对单个词项的，不管是hello仍是world，都不符合正则条件，故没有匹配。io

match_phrase是针对多个词项的。首先match_phrase的"hello world"被分为了hello和world两个单词，而后这两个单词在title的分词中均可以找到，而且相对位置知足条件，故语句能够匹配。class

再看 "hello.WORLD"

{ "regexp": { "title": "hello\\.w.*" }} 
{ "match_phrase": { "title": "hello world" }}

结果是，第一个能够匹配，而第二个不能。

缘由看分词结果：

-GET_analyze
{        
    "field": "title",
    "text": "hello.WORLD"
}
-------------------------------
{
  "tokens" : [
    {
      "token" : "hello.world",
      "start_offset" : 0,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

坑爹的状况出现了，"."并不会被切分，整个"hello.world"被视做了一个词项。

match_phrase在词项中查找hello和world都查找不到，故不会匹配

regexp则能找到一个知足正则表达式的词项，故能够匹配。

ES的分词处理很是重要，很大的影响了查询结果！

1. es中match_phrase和term区别
2. match VS match_phrase
3. Elasticsearch之match_phrase小坑记录
4. RegExp
5. RegExp.$1
6. javaScript---RegExp
7. 5.4 RegExp
8. javascript RegExp
9. js RegExp
10. RegExp-dotAll
更多相关文章...
• MySQL REGEXP：正则表达式查询 - MySQL教程
• XSL-FO 与 XSLT - XSL-FO 教程
• Composer 安装与使用
• Java Agent入门实战（一）-Instrumentation介绍与使用