search(14)- elastic4s-统计范围:global, filter,post-filter bucket
聚合一般作用在query范围内。不带query的aggregation请求实际上是在match_all{}查询范围内进行统计的:
GET /cartxns/_search
{
"aggs": {
"all_colors": {
"terms": {"field" : "color.keyword"}
}
}
}
}
GET /cartxns/_search
{
"query": {
"match_all": {}
},
"aggs": {
"all_colors": {
"terms": {"field" : "color.keyword"}
}
}
}
}
上面这两个请求结果相同:
"aggregations" : {
"all_colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "red",
"doc_count" : 4
},
{
"key" : "blue",
"doc_count" : 2
},
{
"key" : "green",
"doc_count" : 2
}
]
}
}
虽然很多时候我们都希望在query作用域下进行统计,但也会碰到需要统计不含任何query条件的汇总数。比如在统计某个车款平价售价的同时又需要知道全部车款的平均售价。这里全部车款平价售价就是一种global bucket统计:
GET /cartxns/_search
{
"query" : {
"match" : {"make.keyword": "ford"}
}
, "aggs": {
"avg_ford": {
"avg": {
"field": "price"
}
},
"avg_all" : {
"global": {},
"aggs": {
"avg_price": {
"avg": {"field": "price"}
}
}
}
}
}
搜索结果和聚合结果如下:
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.2809337,
"hits" : [
{
"_index" : "cartxns",
"_type" : "_doc",
"_id" : "NGVXAnIBSDa1Wo5UqLc3",
"_score" : 1.2809337,
"_source" : {
"price" : 30000,
"color" : "green",
"make" : "ford",
"sold" : "2014-05-18"
}
},
{
"_index" : "cartxns",
"_type" : "_doc",
"_id" : "OWVYAnIBSDa1Wo5UTrf8",
"_score" : 1.2809337,
"_source" : {
"price" : 25000,
"color" : "blue",
"make" : "ford",
"sold" : "2014-02-12"
}
}
]
},
"aggregations" : {
"avg_all" : {
"doc_count" : 8,
"avg_price" : {
"value" : 26500.0
}
},
"avg_ford" : {
"value" : 27500.0
}
}
用elastic4s来表达:
val aggGlob = search("cartxns").query(
matchQuery("make.keyword","ford")
).aggregations(
avgAggregation("single_avg").field("price"),
globalAggregation("all_avg").subaggs(
avgAggregation("avg_price").field("price")
)
)
println(aggGlob.show)
val globResult = client.execute(aggGlob).await
if (globResult.isSuccess) {
val gavg = globResult.result.aggregations.global("all_avg").avg("avg_price")
val savg = globResult.result.aggregations.avg("single_avg")
println(s"${savg.value},${gavg.value}")
globResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
} else println(s"error: ${globResult.error.causedBy.getOrElse("unknown")}")
...
POST:/cartxns/_search?
StringEntity({"query":{"match":{"make.keyword":{"query":"ford"}}},"aggs":{"single_avg":{"avg":{"field":"price"}},"all_avg":{"global":{},"aggs":{"avg_price":{"avg":{"field":"price"}}}}}},Some(application/json))
27500.0,26500.0
Map(price -> 30000, color -> green, make -> ford, sold -> 2014-05-18)
Map(price -> 25000, color -> blue, make -> ford, sold -> 2014-02-12)
filter-bucket的作用是:在query结果内再进行筛选后统计。比如:查询所有honda车款交易,但只统计honda某个月销售:
GET /cartxns/_search
{
"query": {
"match": {
"make.keyword": "honda"
}
},
"aggs": {
"sales_this_month": {
"filter": {
"range" : {"sold" : { "from" : "2014-10-01", "to" : "2014-11-01" }}
},
"aggs": {
"month_total": {
"sum": {"field": "price"}
}
}
}
}
}
首先,查询结果应该不受影响。同时还得到查询结果车款某个月的销售额:
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.9444616,
"hits" : [
{
"_index" : "cartxns",
"_type" : "_doc",
"_id" : "MmVXAnIBSDa1Wo5UqLc3",
"_score" : 0.9444616,
"_source" : {
"price" : 10000,
"color" : "red",
"make" : "honda",
"sold" : "2014-10-28"
}
},
{
"_index" : "cartxns",
"_type" : "_doc",
"_id" : "M2VXAnIBSDa1Wo5UqLc3",
"_score" : 0.9444616,
"_source" : {
"price" : 20000,
"color" : "red",
"make" : "honda",
"sold" : "2014-11-05"
}
},
{
"_index" : "cartxns",
"_type" : "_doc",
"_id" : "N2VXAnIBSDa1Wo5UqLc3",
"_score" : 0.9444616,
"_source" : {
"price" : 20000,
"color" : "red",
"make" : "honda",
"sold" : "2014-11-05"
}
}
]
},
"aggregations" : {
"sales_this_month" : {
"doc_count" : 1,
"month_total" : {
"value" : 10000.0
}
}
}
elastic4s示范如下:
val aggfilter = search("cartxns").query(
matchQuery("make.keyword","honda")
).aggregations(
filterAgg("sales_the_month",rangeQuery("sold").gte("2014-10-01").lte("2014-11-01"))
.subaggs(sumAggregation("monthly_sales").field("price"))
)
println(aggfilter.show)
val filterResult = client.execute(aggfilter).await
if (filterResult.isSuccess) {
val ms = filterResult.result.aggregations.filter("sales_the_month")
.sum("monthly_sales").value
println(s"${ms}")
filterResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
} else println(s"error: ${filterResult.error.causedBy.getOrElse("unknown")}")
...
POST:/cartxns/_search?
StringEntity({"query":{"match":{"make.keyword":{"query":"honda"}}},"aggs":{"sales_the_month":{"filter":{"range":{"sold":{"gte":"2014-10-01","lte":"2014-11-01"}}},"aggs":{"monthly_sales":{"sum":{"field":"price"}}}}}},Some(application/json))
10000.0
Map(price -> 10000, color -> red, make -> honda, sold -> 2014-10-28)
Map(price -> 20000, color -> red, make -> honda, sold -> 2014-11-05)
Map(price -> 20000, color -> red, make -> honda, sold -> 2014-11-05)
最后一个是post-filter。post-filter同样是对query结果的筛选,但是在完成了整个query后对结果的筛选。也就是说如果query还涉及到聚合,那么聚合不受筛选影响:
GET /cartxns/_search
{
"query": {
"match": {
"make.keyword": "ford"
}
},
"post_filter": {
"match" : {
"color.keyword" : "blue"
}
}
,"aggs": {
"colors": {
"terms": {
"field": "color.keyword",
"size": 10
}
}
}
}
查询和聚合结果如下:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2809337,
"hits" : [
{
"_index" : "cartxns",
"_type" : "_doc",
"_id" : "OWVYAnIBSDa1Wo5UTrf8",
"_score" : 1.2809337,
"_source" : {
"price" : 25000,
"color" : "blue",
"make" : "ford",
"sold" : "2014-02-12"
}
}
]
},
"aggregations" : {
"colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "blue",
"doc_count" : 1
},
{
"key" : "green",
"doc_count" : 1
}
]
}
}
}
可以看到:查询结果显示了经过post-filter筛选的结果,但聚合并没有受到filter影响。
elastic4s示范代码:
val aggPost = search("cartxns").query(
matchQuery("make.keyword","ford")
).postFilter(matchQuery("color.keyword","blue"))
.aggregations(
termsAgg("colors","color.keyword")
)
println(aggPost.show)
val postResult = client.execute(aggPost).await
if (postResult.isSuccess) {
postResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
postResult.result.aggregations.terms("colors").buckets
.foreach(b => println(s"${b.key},${b.docCount}"))
} else println(s"error: ${postResult.error.causedBy.getOrElse("unknown")}")
...
POST:/cartxns/_search?
StringEntity({"query":{"match":{"make.keyword":{"query":"ford"}}},"post_filter":{"match":{"color.keyword":{"query":"blue"}}},"aggs":{"colors":{"terms":{"field":"color.keyword"}}}},Some(application/json))
Map(price -> 25000, color -> blue, make -> ford, sold -> 2014-02-12)
blue,1
green,1