现在我们可以开始探讨ES的核心环节:搜索search了。search又分filter,query两种模式。filter模式即筛选模式:将符合筛选条件的记录作为结果找出来。query模式则分两个步骤:先筛选,然后对每条符合条件记录进行相似度计算。就是多了个评分过程。如果我们首先要实现传统数据库的查询功能的话,那么用filter模式就足够了。filter模式同样可以利用搜索引擎的分词功能产生高质量的查询结果,而且filter是可以进缓存的,执行起来效率更高。这些功能数据库管理系统是无法达到的。ES的filter模式是在bool查询框架下实现的,如下:

  1. GET /_search
  2. {
  3. "query": {
  4. "bool": {
  5. "filter": [
  6. { "term": { "status": "published" }},
  7. { "range": { "publish_date": { "gte": "2015-01-01" }}}
  8. ]
  9. }
  10. }
  11. }

下面是一个最简单的示范:

  1. val filterTerm = search("bank")
  2. .query(
  3. boolQuery().filter(termQuery("city.keyword","Brogan")))

产生的请求json如下:

  1. POST /bank/_search
  2. {
  3. "query":{
  4. "bool":{
  5. "filter":[
  6. {
  7. "term":{"city.keyword":{"value":"Brogan"}}
  8. }
  9. ]
  10. }
  11. }
  12. }

先说明一下这个查询请求:这是一个词条查询termQuery,要求条件完全匹配,包括大小写,肯定无法用经过分词器分析过的字段,所以用city.keyword。

返回查询结果json:

  1. {
  2. "took" : 1,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 1,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 0.0,
  16. "hits" : [
  17. {
  18. "_index" : "bank",
  19. "_type" : "_doc",
  20. "_id" : "1",
  21. "_score" : 0.0,
  22. "_source" : {
  23. "account_number" : 1,
  24. "balance" : 39225,
  25. "firstname" : "Amber",
  26. "lastname" : "Duke",
  27. "age" : 32,
  28. "gender" : "M",
  29. "address" : "880 Holmes Lane",
  30. "employer" : "Pyrami",
  31. "email" : "amberduke@pyrami.com",
  32. "city" : "Brogan",
  33. "state" : "IL"
  34. }
  35. }
  36. ]
  37. }
  38. }

我们来看看elasitic4s是怎样表达上面json结果的:首先,返回的类型是 Reponse[SearchResponse]。Response类定义如下:

  1. sealed trait Response[+U] {
  2. def status: Int // the http status code of the response
  3. def body: Option[String] // the http response body if the response included one
  4. def headers: Map[String, String] // any http headers included in the response
  5. def result: U // returns the marshalled response U or throws an exception
  6. def error: ElasticError // returns the error or throw an exception
  7. def isError: Boolean // returns true if this is an error response
  8. final def isSuccess: Boolean = !isError // returns true if this is a success
  9. def map[V](f: U => V): Response[V]
  10. def flatMap[V](f: U => Response[V]): Response[V]
  11. final def fold[V](ifError: => V)(f: U => V): V = if (isError) ifError else f(result)
  12. final def fold[V](onError: RequestFailure => V, onSuccess: U => V): V = this match {
  13. case failure: RequestFailure => onError(failure)
  14. case RequestSuccess(_, _, _, result) => onSuccess(result)
  15. }
  16. final def foreach[V](f: U => V): Unit = if (!isError) f(result)
  17. final def toOption: Option[U] = if (isError) None else Some(result)
  18. }

Response[+U]是个高阶类,如果把U替换成SearchResponse, 那么返回的结果值可以用def result: SearchResponse来获取。status代表标准HTTP返回状态,isError,isSuccess代表执行情况,error是确切的异常消息。返回结果的头部信息在headers内。我们再看看这个SearchResponse类的定义:

  1. case class SearchResponse(took: Long,
  2. @JsonProperty("timed_out") isTimedOut: Boolean,
  3. @JsonProperty("terminated_early") isTerminatedEarly: Boolean,
  4. private val suggest: Map[String, Seq[SuggestionResult]],
  5. @JsonProperty("_shards") private val _shards: Shards,
  6. @JsonProperty("_scroll_id") scrollId: Option[String],
  7. @JsonProperty("aggregations") private val _aggregationsAsMap: Map[String, Any],
  8. hits: SearchHits) {...}
  9. case class SearchHits(total: Total,
  10. @JsonProperty("max_score") maxScore: Double,
  11. hits: Array[SearchHit]) {
  12. def size: Long = hits.length
  13. def isEmpty: Boolean = hits.isEmpty
  14. def nonEmpty: Boolean = hits.nonEmpty
  15. }
  16. case class SearchHit(@JsonProperty("_id") id: String,
  17. @JsonProperty("_index") index: String,
  18. @JsonProperty("_type") `type`: String,
  19. @JsonProperty("_version") version: Long,
  20. @JsonProperty("_seq_no") seqNo: Long,
  21. @JsonProperty("_primary_term") primaryTerm: Long,
  22. @JsonProperty("_score") score: Float,
  23. @JsonProperty("_parent") parent: Option[String],
  24. @JsonProperty("_shard") shard: Option[String],
  25. @JsonProperty("_node") node: Option[String],
  26. @JsonProperty("_routing") routing: Option[String],
  27. @JsonProperty("_explanation") explanation: Option[Explanation],
  28. @JsonProperty("sort") sort: Option[Seq[AnyRef]],
  29. private val _source: Map[String, AnyRef],
  30. fields: Map[String, AnyRef],
  31. @JsonProperty("highlight") private val _highlight: Option[Map[String, Seq[String]]],
  32. private val inner_hits: Map[String, Map[String, Any]],
  33. @JsonProperty("matched_queries") matchedQueries: Option[Set[String]])
  34. extends Hit {...}

返回结果的重要部分如 _score, _source,fields都在SearchHit里。完整的返回结果处理示范如下:

  1. val filterTerm = client.execute(search("bank")
  2. .query(
  3. boolQuery().filter(termQuery("city.keyword","Brogan")))).await
  4. if (filterTerm.isSuccess) {
  5. if (filterTerm.result.nonEmpty)
  6. filterTerm.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
  7. } else println(s"Error: ${filterTerm.error.reason}")

传统查询方式中前缀查询用的比较多:

  1. POST /bank/_search
  2. {
  3. "query":{
  4. "bool":{
  5. "filter":[
  6. {
  7. "prefix":{"city.keyword":{"value":"Bro"}}
  8. }
  9. ]
  10. }
  11. }
  12. }
  13. val filterPrifix = client.execute(search("bank")
  14. .query(
  15. boolQuery().filter(prefixQuery("city.keyword","Bro")))
  16. .sourceInclude("address","city","state")
  17. ).await
  18. if (filterPrifix.isSuccess) {
  19. if (filterPrifix.result.nonEmpty)
  20. filterPrifix.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
  21. } else println(s"Error: ${filterPrifix.error.reason}")
  22. ....
  23. Map(address -> 880 Holmes Lane, city -> Brogan, state -> IL)
  24. Map(address -> 810 Nostrand Avenue, city -> Brooktrails, state -> GA)
  25. Map(address -> 295 Whitty Lane, city -> Broadlands, state -> VT)
  26. Map(address -> 511 Heath Place, city -> Brookfield, state -> OK)
  27. Map(address -> 918 Bridge Street, city -> Brownlee, state -> HI)
  28. Map(address -> 806 Pierrepont Place, city -> Brownsville, state -> MI)

正则表达式查询也有:

  1. POST /bank/_search
  2. {
  3. "query":{
  4. "bool":{
  5. "filter":[
  6. {
  7. "regexp":{"address.keyword":{"value":".*bridge.*"}}
  8. }
  9. ]
  10. }
  11. }
  12. }
  13. val filterRegex = client.execute(search("bank")
  14. .query(
  15. boolQuery().filter(regexQuery("address.keyword",".*bridge.*")))
  16. .sourceInclude("address","city","state")
  17. ).await
  18. if (filterRegex.isSuccess) {
  19. if (filterRegex.result.nonEmpty)
  20. filterRegex.result.hits.hits.foreach {hit => println(hit.sourceAsMap)}
  21. } else println(s"Error: ${filterRegex.error.reason}")
  22. ....
  23. Map(address -> 384 Bainbridge Street, city -> Elizaville, state -> MS)
  24. Map(address -> 721 Cambridge Place, city -> Efland, state -> ID)

当然,ES用bool查询来实现复合式查询,我们可以把一个bool查询放进filter框架,如下:

  1. POST /bank/_search
  2. {
  3. "query":{
  4. "bool":{
  5. "filter":[
  6. {
  7. "regexp":{"address.keyword":{"value":".*bridge.*"}}
  8. },
  9. {
  10. "bool": {
  11. "must": [
  12. { "match" : {"lastname" : "lane"}}
  13. ]
  14. }
  15. }
  16. ]
  17. }
  18. }
  19. }

elastic4s QueryDSL 语句和返回结果如下:

  1. val filterBool = client.execute(search("bank")
  2. .query(
  3. boolQuery().filter(regexQuery("address.keyword",".*bridge.*"),
  4. boolQuery().must(matchQuery("lastname","lane"))))
  5. .sourceInclude("lastname","address","city","state")
  6. ).await
  7. if (filterBool.isSuccess) {
  8. if (filterBool.result.nonEmpty)
  9. filterBool.result.hits.hits.foreach {hit => println(s"score: ${hit.score}, ${hit.sourceAsMap}")}
  10. } else println(s"Error: ${filterBool.error.reason}")
  11. ...
  12. score: 0.0, Map(address -> 384 Bainbridge Street, city -> Elizaville, state -> MS, lastname -> Lane)

score: 0.0 ,说明filter不会进行评分。可能执行效率会有所提高吧。

 

版权声明:本文为tiger-xc原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/tiger-xc/p/12782333.html