Elasticsearch-初识查询

时间 2019-12-12

标签 elasticsearch 查询栏目日志分析繁體版

原文原文链接

本小节主要讲述关于Elasticsearch的几种常见查询,但愿本身在使用时候再回来看此文更能快速理解其中含义.数据库

本文全部实践基于Elasticsearch 2.3.3数组

咱们先从查询小苍苍这个用户开始今天的话题:

1. 第一种方式(全字段检索)

由于咱们已肯定要查询name字段,不推荐使用,而且数据并不许确缓存

curl  http://127.0.0.1:9200/synctest/article/_search?q=小苍苍
复制代码

2. 第二种方式(term 表示包含某精确值)

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
-d '{ "filter":{ "term":{ "name":"小苍苍" } } }'
复制代码

一般的规则是，使用查询（query）语句来进行全文搜索或者其它任何须要影响相关性得分的搜索。除此之外的状况都使用过滤（filters)。bash

推荐使用语句query+filter,将会缓存filter部分数据,而后再进行评分过滤。下面咱们将遇到这种组合模式curl

注意这里的term用法含义表示为包含某精确值，也就是说当 "name":["小苍苍","小衣衣"],条件也是成立的。

3. 第二种方式(query term查询)

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
-d '{ "query":{ "term":{ "name":"小苍苍" } } }'

{
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "synctest",
      "_type" : "article",
      "_id" : "1",
      "_score" : 0.30685282,
      "_source" : {
        "name" : "小苍苍",
      }
    } ]
  }
}
复制代码

默认query term也会自带评分, 若是不需此功能能够去掉, 更好的提供性能和缓存性能

4. 第四种方式 (filtered filter 关闭评分)

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
-d '{ "query":{ "filtered":{ "filter":{ "term":{ "name":"小苍苍" } } } } }'

{
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "synctest",
      "_type" : "article",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "name" : "小苍苍",
      }
    } ]
  }
}

复制代码

使用 filter 并不计算得分，且它能够缓存文档, 因此当你不须要评分时候, 大部分场景下用它去查询小苍苍能够提升检索性能优化

你还能够使用 constant_score 来关闭评分url

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
-d ' { "query":{ "constant_score":{ "filter":{ "term":{ "name":"小苍苍" } } } } } '
复制代码

多条件组合使用

1. select * from article where name in ("小苍苍","小衣衣");

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
 -d '{ "query":{ "constant_score":{ "filter":{ "terms":{ "name":[ "小苍苍", "小衣衣" ] } } } } }'
复制代码

若是咱们想要获取2002年的某个用户,如何实现呢 (若是实现不一样的OR、AND条件呢)spa

咱们须要的更加复杂的查询-组合过滤器

{
   "bool" : {
      "must" :     [],  #AND
      "should" :   [],  #OR
      "must_not" : [],  #NOT
   }
}
复制代码

must 全部的语句都必须（must）匹配，与 AND 等价。
must_not 全部的语句都不能（must not）匹配，与 NOT 等价。
至少有一个语句要匹配，与 OR 等价。

select * from article where year=2002 and name like %苍天空%

curl  http://127.0.0.1:9200/synctest/article/_search?pretty 
 -d '{ "query":{ "bool":{ "must":[ { "term":{ "year":2002 } }, { "match":{ "user_name":"苍天空" } } ] } } }'
复制代码

match等于like描述并不许确,而是取决于设置分词器模糊查询的结果. 禁用评分能够将query替换为filter3d

select * from article where (year=2002 or name='麒麟臂') and name not like %苍天空%

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d ' { "query":{ "bool":{ "should":[ { "term":{ "year":2002 } }, { "term":{ "name":"麒麟臂" } } ], "must_not":{ "match":{ "user_name":"苍天空" } } } } }'
复制代码

咱们发现must_not 并非数组格式的,由于咱们只有一个条件,当有多个条件时, 能够将must提炼成数组

相似(只关注语法便可):

{
    "query":{
        "bool":{
            "should":[
                {
                    "term":{
                        "year":2002
                    }
                },
                {
                    "term":{
                        "name":"麒麟臂"
                    }
                }
            ],
            "must_not":[
                {
                    "match":{
                        "user_name":"苍天空"
                    }
                },
                {
                    "term":{
                        "job":"teacher"
                    }
                }
            ]
        }
    }
}
复制代码

更加灵活的should

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d ' { "query":{ "bool":{ "should":[ { "term":{ "id":1 } }, { "match":{ "user_name":"苍天空" } }, { "match":{ "nick_name":"小苍苍" } } ], "minimum_should_match":2 } } } '
复制代码

minimum_should_match = 2 最少匹配两项, 若是不须要评分功能,能够直接将最外层query 替换为 filter 便可

还有另外一种模式,实际中用处也很是大,咱们来看看 query 和 filtered 的组合是有很大优点的,下面咱们再看这条查询语句:

当咱们有时候须要 分词查询 和 term 精确查询一块儿使用时,咱们是但愿term不须要缓存数据,而match根据匹配度进行排序

{
    "query":{
        "bool":{
            "must":[
                {
                    "match":{
                        "user_name":"小仓鼠"
                    }
                },
                {
                    "term":{
                        "id":1
                    }
                }
            ]
        }
    }
}
复制代码

当咱们使用上面的语句查询的时候,并非最优解, 咱们发现term参与了评分, 咱们进行优化

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d ' { "query":{ "bool":{ "must":[ { "match":{ "user_name":"小苍苍" } } ], "filter":{ "term":{ "id":1 } } } } } '
复制代码

经过观察max_score值,发现只对 user_name 进行了过滤, 这是很是重要的, 由于es能够首先执行 filter 并对此进行缓存优化。

范围查询

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d ' { "query":{ "constant_score":{ "filter":{ "range":{ "id":{ "gte":1, "lte":4 } } } } } } '
复制代码

finish--分页和返回指定的字段

curl  http://127.0.0.1:9200/synctest/article/_search?pretty  -d ' { "from":1, "size":1, "query":{ "terms":{ "id":[ 1, 2, 6, 9, 15 ] } }, "sort":{ "id":{ "order":"desc" } } } '
复制代码

咱们使用了 from+size 的分页方式, 注意es的from+size模式分页是有局限和限制的,咱们后面再讲. 咱们还使用了 sort 对 id 进行倒叙排序。

可是咱们在数据库操做中, 还常常使用返回某些字段呢, 尽可能放弃select * 吧。

{
    "from":1,
    "size":1,
    "_source":[
        "id",
        "name"
    ],
    "query":{
        "terms":{
            "id":[
                1,
                2,
                6,
                9,
                15
            ]
        }
    },
    "sort":{
        "id":{
            "order":"desc"
        }
    }
}
复制代码

使用 _source 便可,若是仍是内嵌的对象, 还能够使用 userinfo.* 表示userinfo对象下面的字段所有返回。

到这里结束吧-接下来咱们详细看下Elasticsearch的评分是如何操做的，咱们如何更精细的控制它, 来作更加定制化的推荐。

欢迎关注我啦, 呆呆熊一点通 :