search(7)- elastic4s-search-filter模式

 如今咱们能够开始探讨ES的核心环节:搜索search了。search又分filter,query两种模式。filter模式即筛选模式:将符合筛选条件的记录做为结果找出来。query模式则分两个步骤:先筛选,而后对每条符合条件记录进行类似度计算。就是多了个评分过程。若是咱们首先要实现传统数据库的查询功能的话,那么用filter模式就足够了。filter模式一样能够利用搜索引擎的分词功能产生高质量的查询结果,并且filter是能够进缓存的,执行起来效率更高。这些功能数据库管理系统是没法达到的。ES的filter模式是在bool查询框架下实现的,以下:node

GET /_search { "query": { "bool": { "filter": [ { "term":  { "status": "published" }}, { "range": { "publish_date": { "gte": "2015-01-01" }}} ] } } }

下面是一个最简单的示范:正则表达式

  val filterTerm = search("bank") .query( boolQuery().filter(termQuery("city.keyword","Brogan")))

产生的请求json以下:数据库

POST /bank/_search { "query":{ "bool":{ "filter":[ { "term":{"city.keyword":{"value":"Brogan"}} } ] } } }

先说明一下这个查询请求:这是一个词条查询termQuery,要求条件彻底匹配,包括大小写,确定没法用通过分词器分析过的字段,因此用city.keyword。json

返回查询结果json:缓存

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 0.0, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "1", "_score" : 0.0, "_source" : { "account_number" : 1, "balance" : 39225, "firstname" : "Amber", "lastname" : "Duke", "age" : 32, "gender" : "M", "address" : "880 Holmes Lane", "employer" : "Pyrami", "email" : "amberduke@pyrami.com", "city" : "Brogan", "state" : "IL" } } ] } }

咱们来看看elasitic4s是怎样表达上面json结果的:首先,返回的类型是 Reponse[SearchResponse]。Response类定义以下:框架

sealed trait Response[+U] { def status: Int // the http status code of the response
  def body: Option[String]         // the http response body if the response included one
  def headers: Map[String, String] // any http headers included in the response
  def result: U                    // returns the marshalled response U or throws an exception
  def error: ElasticError          // returns the error or throw an exception
  def isError: Boolean             // returns true if this is an error response
  final def isSuccess: Boolean = !isError // returns true if this is a success
 def map[V](f: U => V): Response[V] def flatMap[V](f: U => Response[V]): Response[V] final def fold[V](ifError: => V)(f: U => V): V = if (isError) ifError else f(result) final def fold[V](onError: RequestFailure => V, onSuccess: U => V): V = this match { case failure: RequestFailure => onError(failure) case RequestSuccess(_, _, _, result) => onSuccess(result) } final def foreach[V](f: U => V): Unit          = if (!isError) f(result) final def toOption: Option[U] = if (isError) None else Some(result) }

Response[+U]是个高阶类,若是把U替换成SearchResponse, 那么返回的结果值能够用def result: SearchResponse来获取。status表明标准HTTP返回状态,isError,isSuccess表明执行状况,error是确切的异常消息。返回结果的头部信息在headers内。咱们再看看这个SearchResponse类的定义:this

case class SearchResponse(took: Long, @JsonProperty("timed_out") isTimedOut: Boolean, @JsonProperty("terminated_early") isTerminatedEarly: Boolean, private val suggest: Map[String, Seq[SuggestionResult]], @JsonProperty("_shards") private val _shards: Shards, @JsonProperty("_scroll_id") scrollId: Option[String], @JsonProperty("aggregations") private val _aggregationsAsMap: Map[String, Any], hits: SearchHits) {...} case class SearchHits(total: Total, @JsonProperty("max_score") maxScore: Double, hits: Array[SearchHit]) { def size: Long = hits.length def isEmpty: Boolean = hits.isEmpty def nonEmpty: Boolean = hits.nonEmpty } case class SearchHit(@JsonProperty("_id") id: String, @JsonProperty("_index") index: String, @JsonProperty("_type") `type`: String, @JsonProperty("_version") version: Long, @JsonProperty("_seq_no") seqNo: Long, @JsonProperty("_primary_term") primaryTerm: Long, @JsonProperty("_score") score: Float, @JsonProperty("_parent") parent: Option[String], @JsonProperty("_shard") shard: Option[String], @JsonProperty("_node") node: Option[String], @JsonProperty("_routing") routing: Option[String], @JsonProperty("_explanation") explanation: Option[Explanation], @JsonProperty("sort") sort: Option[Seq[AnyRef]], private val _source: Map[String, AnyRef], fields: Map[String, AnyRef], @JsonProperty("highlight") private val _highlight: Option[Map[String, Seq[String]]], private val inner_hits: Map[String, Map[String, Any]], @JsonProperty("matched_queries") matchedQueries: Option[Set[String]]) extends Hit {...}

返回结果的重要部分如 _score, _source,fields都在SearchHit里。完整的返回结果处理示范以下:搜索引擎

 val filterTerm  = client.execute(search("bank") .query( boolQuery().filter(termQuery("city.keyword","Brogan")))).await

  if (filterTerm.isSuccess) { if (filterTerm.result.nonEmpty) filterTerm.result.hits.hits.foreach {hit => println(hit.sourceAsMap)} } else println(s"Error: ${filterTerm.error.reason}")

传统查询方式中前缀查询用的比较多:spa

POST /bank/_search { "query":{ "bool":{ "filter":[ { "prefix":{"city.keyword":{"value":"Bro"}} } ] } } } val filterPrifix = client.execute(search("bank") .query( boolQuery().filter(prefixQuery("city.keyword","Bro"))) .sourceInclude("address","city","state") ).await
  if (filterPrifix.isSuccess) { if (filterPrifix.result.nonEmpty) filterPrifix.result.hits.hits.foreach {hit => println(hit.sourceAsMap)} } else println(s"Error: ${filterPrifix.error.reason}") .... Map(address -> 880 Holmes Lane, city -> Brogan, state -> IL) Map(address -> 810 Nostrand Avenue, city -> Brooktrails, state -> GA) Map(address -> 295 Whitty Lane, city -> Broadlands, state -> VT) Map(address -> 511 Heath Place, city -> Brookfield, state -> OK) Map(address -> 918 Bridge Street, city -> Brownlee, state -> HI) Map(address -> 806 Pierrepont Place, city -> Brownsville, state -> MI)

正则表达式查询也有:code

POST /bank/_search { "query":{ "bool":{ "filter":[ { "regexp":{"address.keyword":{"value":".*bridge.*"}} } ] } } } val filterRegex = client.execute(search("bank") .query( boolQuery().filter(regexQuery("address.keyword",".*bridge.*"))) .sourceInclude("address","city","state") ).await
  if (filterRegex.isSuccess) { if (filterRegex.result.nonEmpty) filterRegex.result.hits.hits.foreach {hit => println(hit.sourceAsMap)} } else println(s"Error: ${filterRegex.error.reason}") .... Map(address -> 384 Bainbridge Street, city -> Elizaville, state -> MS) Map(address -> 721 Cambridge Place, city -> Efland, state -> ID)

固然,ES用bool查询来实现复合式查询,咱们能够把一个bool查询放进filter框架,以下:

POST /bank/_search { "query":{ "bool":{ "filter":[ { "regexp":{"address.keyword":{"value":".*bridge.*"}} }, { "bool": { "must": [ { "match" : {"lastname" : "lane"}} ] } } ] } } }

elastic4s QueryDSL 语句和返回结果以下:

  val filterBool  = client.execute(search("bank") .query( boolQuery().filter(regexQuery("address.keyword",".*bridge.*"), boolQuery().must(matchQuery("lastname","lane")))) .sourceInclude("lastname","address","city","state") ).await
  if (filterBool.isSuccess) { if (filterBool.result.nonEmpty) filterBool.result.hits.hits.foreach {hit => println(s"score: ${hit.score}, ${hit.sourceAsMap}")} } else println(s"Error: ${filterBool.error.reason}") ... score: 0.0, Map(address -> 384 Bainbridge Street, city -> Elizaville, state -> MS, lastname -> Lane)

score: 0.0 ,说明filter不会进行评分。可能执行效率会有所提升吧。

相关文章
相关标签/搜索