elasticsearch - introduction to aggregations
TRANSCRIPT
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
Introduction to Aggregations
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
facets (elasticsearch < 1.0)
facets
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
out-of-the-box facets (elasticsearch < 1.0)
• terms
• range
• histogram / date histogram
• filter/query
• statistical
• geo distance
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
terms facet• Divides documents into buckets based on a
value of a selected term
• Calculates statistics on some other field of these document for each bucket
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
index of large US cities
{ "rank": "21", "city": "Boston", "state": "MA", "population2012": "636479", "population2010": "617594", "land_area": "48.277", "density": "12793", "ansi": "619463", "location": { "lat": "42.332", "lon": "71.0202" }}
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
example: terms facet request
$ curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "facets": { "stat1": { "terms_stats": { "key_field": "state", "value_field": "density" } } }}'
group by this field
calculate stats for this field
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
example: terms facet response "facets" : {
"stat1" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "CA", "count" : 69, "total_count" : 69, "min" : 1442.0, "max" : 17179.0, "total" : 383545.0, "mean" : 5558.623188405797 }, { "term" : "TX", "count" : 32, "total_count" : 32, "min" : 1096.0, "max" : 3974.0, "total" : 79892.0, "mean" : 2496.625 }, { "term" : "FL", "count" : 20, "total_count" : 20, "min" : 1100.0, "max" : 11136.0, "total" : 80132.0, "mean" : 4006.6 }, {
group by field
stats
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
example: range facet request
curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "facets": { "population_ranges": { "histogram": { "key_field": "population2012", "value_field": "density", "interval": 500000 } } }}'
group by this field
calculate stats by this field
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
example: terms facet response "facets" : {
"population_ranges" : { "_type" : "histogram", "entries" : [ { "key" : 0, "count" : 255, "min" : 171.0, "max" : 17346.0, "total" : 980306.0, "total_count" : 252, "mean" : 3890.1031746031745 }, { "key" : 500000, "count" : 25, "min" : 956.0, "max" : 17179.0, "total" : 116597.0, "total_count" : 25, "mean" : 4663.88 }, { "key" : 1000000, "count" : 4, "min" : 2798.0, "max" : 4020.0, "total" : 13216.0, "total_count" : 4, "mean" : 3304.0 }, {
group by field (population)
stats(density)
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
But what if I want an average density by population histogram for each state?
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
aggregations
Buckets Calculators
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
aggregations = buckets + calculators
CA
TX
MA
CO
AZ
"facets" : { "population_ranges" : { "_type" : "histogram", "entries" : [ { "key" : 0, "count" : 255, "min" : 171.0, "max" : 17346.0, "total" : 980306.0, "total_count" : 252, "mean" : 3890.1031746031745 }, { "key" : 500000, "count" : 25, "min" : 956.0, "max" : 17179.0, "total" : 116597.0, "total_count" : 25, "mean" : 4663.88 }, { "key" : 1000000, "count" : 4, "min" : 2798.0, "max" : 4020.0, "total" : 13216.0, "total_count" : 4, "mean" : 3304.0 }, {
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
aggregations = buckets + calculators
CA
TX
MA
CO
AZ
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
example: density by state aggregation
$ curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_state" : { "terms" : { "field" : "state" }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } }}'
group by this field
calculate stats for this field
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
aggregation response "aggregations" : { "mean_density_by_state" : { "terms" : [ { "term" : "CA", "doc_count" : 69, "mean_density" : { "value" : 5558.623188405797 } }, { "term" : "TX", "doc_count" : 32, "mean_density" : { "value" : 2496.625 } }, { "term" : "FL", "doc_count" : 20, "mean_density" : { "value" : 4006.6 } }, { "term" : "CO", "doc_count" : 11, "mean_density" : { "value" : 2944.4 } }, { "term" : "AZ", "doc_count" : 10, "mean_density" : { "value" : 2604.9 } }, {
group by state
density stats
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
example: density by population aggregation
$ curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_population" : { "histogram" : { "field" : "population2012", "interval": 500000 }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } }}'
group by population
calculate stats density
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
aggregation response "aggregations" : { "mean_density_by_population" : [ { "key" : 0, "doc_count" : 255, "mean_density" : { "value" : 3890.1031746031745 } }, { "key" : 500000, "doc_count" : 25, "mean_density" : { "value" : 4663.88 } }, { "key" : 1000000, "doc_count" : 4, "mean_density" : { "value" : 3304.0 } }, { "key" : 1500000, "doc_count" : 1, "mean_density" : { "value" : 11379.0 }
group by population
density stats
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
example: density by population by state
$ curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_population_by_state": { "terms" : { "field" : "state" }, "aggs": { "mean_density_by_population" : { "histogram" : { "field" : "population2012", "interval": 500000 }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } } } }}'
group by population
calculate stats on density
group by state
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
aggregation response "aggregations" : { "mean_density_by_population_by_state" : { "terms" : [ { "term" : "CA", "doc_count" : 69, "mean_density_by_population" : [ { "key" : 0, "doc_count" : 64, "mean_density" : { "value" : 5382.453125 } }, { "key" : 500000, "doc_count" : 3, "mean_density" : { "value" : 8985.333333333334 } }, { "key" : 1000000, "doc_count" : 1, "mean_density" : { "value" : 4020.0 } }, { "key" : 3500000, "doc_count" : 1, "mean_density" : { "value" : 8092.0 } } ] }, { "term" : "TX", "doc_count" : 32, "mean_density_by_population" : [ { "key" : 0, "doc_count" : 26, "mean_density" : { "value" : 2408.3076923076924
group by population
stats on density
group by state
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
out-of-the-box aggregation calculators(elasticsearch >= 1.0)
• avg
• min
• max
• sum
• count
• stats
• extended stats
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
out-of-the-box aggregation bucketizers(elasticsearch >= 1.0)
• global
• filter
• missing
• terms
• range
• date range
• ip range
• histogram
• date histogram
• geo distance
• nested
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited
aggregations 2.0 (aka bucket reducers)(elasticsearch 2.0)
apply arbitrary functions on buckets• first derivative• second derivative• exponential weighted moving
average• outlier detection