Elasticsearch深刻搜索之结构化搜索及JavaAPI的使用

1、Es中建立索引

1.建立索引:html

在以前的Es插件的安装和使用中说到建立索引自定义分词器和建立type,当时是分开写的,其实建立索引时也能够建立type,并指定分词器。spring

PUT /my_index { "settings": { "analysis": { "analyzer": { "ik_smart_pinyin": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] }, "ik_max_word_pinyin": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type" : "pinyin", "keep_separate_first_letter" : true, "keep_full_pinyin" : true, "keep_original" : true, "limit_first_letter_length" : 16, "lowercase" : true, "remove_duplicated_term" : true } } } }, "mappings": { "my_type":{ "properties": { "id":{ "type": "integer" }, "name":{ "type": "text", "analyzer": "ik_max_word_pinyin" }, "age":{ "type":"integer" } } } } }

2.添加数据缓存

POST /my_index/my_type/_bulk
{ "index": { "_id":1}}
{ "id":1,"name": "张三","age":20}
{ "index": { "_id": 2}}
{ "id":2,"name": "张四","age":22}
{ "index": { "_id": 3}}
{ "id":3,"name": "张三李四王五","age":20}app

3.查看数据类型elasticsearch

GET /my_index/my_type/_mapping 结果: { "my_index": { "mappings": { "my_type": { "properties": { "age": { "type": "integer" }, "id": { "type": "integer" }, "name": { "type": "text", "analyzer": "ik_max_word_pinyin" } } } } } }

 

2、结合JAVA(在这以前需在项目中配置好es,网上有好多例子能够参考)ide

1.建立Es实体类post

package com.example.es_query_list.entity.es;

import lombok.Getter;
import lombok.Setter;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;

@Setter
@Getter
@Document(indexName = "my_index",type = "my_type")
public class User {
@Id
private Integer id;
private String name;
private Integer age;
}
 

2.建立dao层性能

package com.example.es_query_list.repository.es; import com.example.es_query_list.entity.es.User; import org.springframework.data.elasticsearch.repository.ElasticsearchRepository; public interface EsUserRepository extends ElasticsearchRepository<User,Integer> { }

 

3、基本工做完成后,开始查询ui

1.精确值查询spa

查询非文本类型数据

GET /my_index/my_type/_search { "query": { "term": { "age": { "value": "20" } } } } 结果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "name": "张三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "name": "李四", "age": 20 } } ] } }

2.查询文本类型

{ "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }

这时小伙伴们可能看到查询结果为空,为何精确匹配却查不到我输入的准确值呢???以前说过我们在建立type时,字段指定的分词器,若是输入未被分析出来的词是查不到结果的,让咱们证实一下!!!!

首先先查看一下我们查询的词被分析成哪几部分

GET my_index/_analyze { "text":"张三李四王五", "analyzer": "ik_max_word" } 结果: { "tokens": [ { "token": "张三李四", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 }, { "token": "张三", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 1 }, { "token": "三", "start_offset": 1, "end_offset": 2, "type": "TYPE_CNUM", "position": 2 }, { "token": "李四", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 3 }, { "token": "四", "start_offset": 3, "end_offset": 4, "type": "TYPE_CNUM", "position": 4 }, { "token": "王", "start_offset": 4, "end_offset": 5, "type": "CN_CHAR", "position": 5 }, { "token": "五", "start_offset": 5, "end_offset": 6, "type": "TYPE_CNUM", "position": 6 } ] }

结果说明,张三李四王五被没有被分析成张三李四王五,因此查询结果为空。

解决方法:更新type中字段属性值,自定义一个映射指定类型为keyword类型,该类型在es中是指不会被分词器分析,也就是说这就是传说中的准确不能再准确的值了

POST /my_index/_mapping/my_type { "properties": { "name": { "type": "text", "analyzer": "ik_max_word_pinyin", "fields": { "keyword":{  //自定义映射名
                "type": "keyword" } } } } }

设置好完成后,需将原有的数据删除在添加一遍,再次查询就能查到了

public List<User> termQuery() { QueryBuilder queryBuilder = QueryBuilders.termQuery("age",20); // QueryBuilder queryBuilder = QueryBuilders.termQuery("name.keyword","张三李四王五");
        SearchQuery searchQuery = new NativeSearchQueryBuilder() .withIndices("my_index") .withTypes("my_type") .withQuery(queryBuilder) .build(); List<User> list = template.queryForList(searchQuery,User.class); return list; }

 

4、组合过滤器

布尔过滤器

注意:官方文档有点问题,在5.X后,filtered 被bool代替了,The filtered query is replaced by the bool query。

一个 bool 过滤器由三部分组成:

{ "bool" : { "must" : [], "should" : [], "must_not" : [], } }

 

must全部的语句都 必须(must) 匹配,与 AND 等价。

must_not全部的语句都 不能(must not) 匹配,与 NOT 等价。

should至少有一个语句要匹配,与 OR 等价。

 就这么简单! 当咱们须要多个过滤器时,只须将它们置入 bool 过滤器的不一样部分便可。
GET /my_index/my_type/_search { "query" : { "bool" : { "should" : [ { "term" : {"age" : 20}}, { "term" : {"age" : 30}} ], "must" : { "term" : {"name.keyword" : "张三"} } } } }

 

public List<User> boolQuery() { BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); boolQueryBuilder.should(QueryBuilders.termQuery("age",20)); boolQueryBuilder.should(QueryBuilders.termQuery("age",30)); boolQueryBuilder.must(QueryBuilders.termQuery("name.keyword","张三")); SearchQuery searchQuery = new NativeSearchQueryBuilder() .withIndices("my_index") .withTypes("my_type") .withQuery(boolQueryBuilder) .build(); List<User> list = template.queryForList(searchQuery,User.class); return list; }

 

嵌套布尔过滤器

尽管 bool 是一个复合的过滤器,能够接受多个子过滤器,须要注意的是 bool 过滤器自己仍然还只是一个过滤器。 这意味着咱们能够将一个 bool 过滤器置于其余 bool 过滤器内部,这为咱们提供了对任意复杂布尔逻辑进行处理的能力。

GET /my_index/my_type/_search { "query" : { "bool" : { "should" : [ { "term" : {"age" : 20}}, { "bool" : { "must": [ {"term": { "name.keyword": { "value": "李四" } }} ] }} ] } } } 结果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "张三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "张三李四王五", "age": 20 } } ] } }

由于 term 和 bool 过滤器是兄弟关系,他们都处于外层的布尔逻辑 should 的内部,返回的命中文档至少须匹配其中一个过滤器的条件。

这两个 term 语句做为兄弟关系,同时处于 must 语句之中,因此返回的命中文档要必须都能同时匹配这两个条件。

 

 5、查找多个精确值

GET my_index/my_type/_search { "query": { "terms": { "age": [ 20, 22 ] } } } 结果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 1, "_source": { "id": 2, "name": "张四", "age": 22 } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "张三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "张三李四王五", "age": 20 } } ] } }

 必定要了解 term 和 terms 是 包含(contains) 操做,而非 等值(equals) (判断)。 

TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("age",list);

 

6、范围查询

一、数字范围查询

GET my_index/my_type/_search { "query": { "range": { "age": { "gte": 10, "lte": 20 } } } } 结果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "id": 1, "name": "张三", "age": 20 } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "id": 3, "name": "张三李四王五", "age": 20 } } ] } }

注:gt(大于)   gte(大于等于)   lt(小于)  lte(小于等于)

RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("age").gte(10).lte(20);

2.对于时间范围查询

更新type,添加时间字段

POST /my_index/_mapping/my_type
{
"properties": {
"date":{
"type":"date",
"format":"yyyy-MM-dd"
}
}
}

添加数据:

POST /my_index/my_type/_bulk { "index": { "_id":4}} { "id":4,"name": "赵六","age":20,"date":"2018-10-1"} { "index": { "_id": 5}} { "id":5,"name": "对七","age":22,"date":"2018-11-20"} { "index": { "_id": 6}} { "id":6,"name": "王八","age":20,"date":"2018-7-28"}

 

查询:

GET my_index/my_type/_search { "query": { "range": { "date": { "gte": "2018-10-20", "lte": "2018-11-29" } } } } 结果: { "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "5", "_score": 1, "_source": { "id": 5, "name": "对七", "age": 22, "date": "2018-11-20" } } ] } }

 

RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("date").gte("2018-10-20").lte("2018-11-29");

 

7、处理null值

1.添加数据

POST /my_index/posts/_bulk { "index": { "_id": "1" }} { "tags" : ["search"] } { "index": { "_id": "2" }} { "tags" : ["search", "open_source"] } { "index": { "_id": "3" }} { "other_field" : "some data" } { "index": { "_id": "4" }} { "tags" : null } { "index": { "_id": "5" }} { "tags" : ["search", null]          }

 

2.查询指定字段存在的数据

GET /my_index/posts/_search { "query" : { "constant_score" : {    //不在去计算评分,默认都是1
            "filter" : { "exists" : { "field" : "tags" } } } } } 结果: { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "posts", "_id": "5", "_score": 1, "_source": { "tags": [ "search", null ] } }, { "_index": "my_index", "_type": "posts", "_id": "2", "_score": 1, "_source": { "tags": [ "search", "open_source" ] } }, { "_index": "my_index", "_type": "posts", "_id": "1", "_score": 1, "_source": { "tags": [ "search" ] } } ] } }

 

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.filter(QueryBuilders.constantScoreQuery(QueryBuilders.existsQuery("tags")));

 

3.查询指定字段缺失数据

注:Filter Query Missing 已经从 ES 5 版本移除

 

GET /my_index/posts/_search { "query" : { "bool": { "must_not": [ {"constant_score": { "filter": { "exists": { "field": "tags" }} }} ] } } } 查询结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "posts", "_id": "4", "_score": 1, "_source": { "tags": null } }, { "_index": "my_index", "_type": "posts", "_id": "3", "_score": 1, "_source": { "other_field": "some data" } } ] } }

 

注:处理null值,当字段内容为空时,将自定义将其当作为null值处理

boolQueryBuilder.mustNot(QueryBuilders.boolQuery().filter(QueryBuilders.constantScoreQuery(QueryBuilders.existsQuery("tags"))));

 

 8、关于缓存

1.核心

   其核心实际是采用一个 bitset 记录与过滤器匹配的文档。Elasticsearch 积极地把这些 bitset 缓存起来以备随后使用。一旦缓存成功,bitset 能够复用 任何 已使用过的相同过滤器,而无需再次计算整个过滤器。

这些 bitsets 缓存是“智能”的:它们以增量方式更新。当咱们索引新文档时,只需将那些新文档加入已有 bitset,而不是对整个缓存一遍又一遍的重复计算。和系统其余部分同样,过滤器是实时的,咱们无需担忧缓存过时问题。

 2.独立的过滤器缓存

  属于一个查询组件的 bitsets 是独立于它所属搜索请求其余部分的。这就意味着,一旦被缓存,一个查询能够被用做多个搜索请求。bitsets 并不依赖于它所存在的查询上下文。这样使得缓存能够加速查询中常用的部分,从而下降较少、易变的部分所带来的消耗。

一样,若是单个请求重用相同的非评分查询,它缓存的 bitset 能够被单个搜索里的全部实例所重用。

让咱们看看下面例子中的查询,它查找知足如下任意一个条件的电子邮件:

 

查询条件(例子):(1)在收件箱中,且没有被读过的  (2)不在 收件箱中,但被标注重要的

 

GET /inbox/emails/_search { "query": { "constant_score": { "filter": { "bool": { "should": [ { "bool": { 1 "must": [ { "term": { "folder": "inbox" }}, { "term": { "read": false }} ] }}, { "bool": {                              2     "must_not": { "term": { "folder": "inbox" } }, "must": { "term": { "important": true } } }} ] } } } } }

 

 1和2共用的一个过滤器,因此使用同一个bitset

尽管其中一个收件箱的条件是 must 语句,另外一个是 must_not 语句,但他们二者是彻底相同的。这意味着在第一个语句执行后, bitset 就会被计算而后缓存起来供另外一个使用。当再次执行这个查询时,收件箱的这个过滤器已经被缓存了,因此两个语句都会使用已缓存的 bitset 。

这点与查询表达式(query DSL)的可组合性结合得很好。它易被移动到表达式的任何地方,或者在同一查询中的多个位置复用。这不只能方便开发者,并且对提高性能有直接的益处。

 

3.自动缓存行为

在 Elasticsearch 的较早版本中,默认的行为是缓存一切能够缓存的对象。这也一般意味着系统缓存 bitsets 太富侵略性,从而由于清理缓存带来性能压力。不只如此,尽管不少过滤器都很容易被评价,但本质上是慢于缓存的(以及从缓存中复用)。缓存这些过滤器的意义不大,由于能够简单地再次执行过滤器。

检查一个倒排是很是快的,而后绝大多数查询组件却不多使用它。例如 term 过滤字段 "user_id" :若是有上百万的用户,每一个具体的用户 ID 出现的几率都很小。那么为这个过滤器缓存 bitsets 就不是很合算,由于缓存的结果极可能在重用以前就被剔除了。

这种缓存的扰动对性能有着严重的影响。更严重的是,它让开发者难以区分有良好表现的缓存以及无用缓存。

为了解决问题,Elasticsearch 会基于使用频次自动缓存查询。若是一个非评分查询在最近的 256 次查询中被使用过(次数取决于查询类型),那么这个查询就会做为缓存的候选。可是,并非全部的片断都能保证缓存 bitset 。只有那些文档数量超过 10,000 (或超过总文档数量的 3% )才会缓存 bitset 。由于小的片断能够很快的进行搜索和合并,这里缓存的意义不大。

一旦缓存了,非评分计算的 bitset 会一直驻留在缓存中直到它被剔除。剔除规则是基于 LRU 的:一旦缓存满了,最近最少使用的过滤器会被剔除。

相关文章
相关标签/搜索