elasticsearch - introduction to aggregations

22
Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited Introduction to Aggregations

Upload: enterprisesearchmeetup

Post on 16-Aug-2015

217 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

Introduction to Aggregations

Page 2: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

facets (elasticsearch < 1.0)

facets

Page 3: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

out-of-the-box facets (elasticsearch < 1.0)

• terms

• range

• histogram / date histogram

• filter/query

• statistical

• geo distance

Page 4: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

terms facet• Divides documents into buckets based on a

value of a selected term

• Calculates statistics on some other field of these document for each bucket

Page 5: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

index of large US cities

{ "rank": "21", "city": "Boston", "state": "MA", "population2012": "636479", "population2010": "617594", "land_area": "48.277", "density": "12793", "ansi": "619463", "location": { "lat": "42.332", "lon": "71.0202" }}

Page 6: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

example: terms facet request

$ curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "facets": { "stat1": { "terms_stats": { "key_field": "state", "value_field": "density" } } }}'

group by this field

calculate stats for this field

Page 7: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

example: terms facet response "facets" : {

"stat1" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "CA", "count" : 69, "total_count" : 69, "min" : 1442.0, "max" : 17179.0, "total" : 383545.0, "mean" : 5558.623188405797 }, { "term" : "TX", "count" : 32, "total_count" : 32, "min" : 1096.0, "max" : 3974.0, "total" : 79892.0, "mean" : 2496.625 }, { "term" : "FL", "count" : 20, "total_count" : 20, "min" : 1100.0, "max" : 11136.0, "total" : 80132.0, "mean" : 4006.6 }, {

group by field

stats

Page 8: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

example: range facet request

curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "facets": { "population_ranges": { "histogram": { "key_field": "population2012", "value_field": "density", "interval": 500000 } } }}'

group by this field

calculate stats by this field

Page 9: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

example: terms facet response "facets" : {

"population_ranges" : { "_type" : "histogram", "entries" : [ { "key" : 0, "count" : 255, "min" : 171.0, "max" : 17346.0, "total" : 980306.0, "total_count" : 252, "mean" : 3890.1031746031745 }, { "key" : 500000, "count" : 25, "min" : 956.0, "max" : 17179.0, "total" : 116597.0, "total_count" : 25, "mean" : 4663.88 }, { "key" : 1000000, "count" : 4, "min" : 2798.0, "max" : 4020.0, "total" : 13216.0, "total_count" : 4, "mean" : 3304.0 }, {

group by field (population)

stats(density)

Page 10: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

But what if I want an average density by population histogram for each state?

Page 11: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

aggregations

Buckets Calculators

Page 12: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

aggregations = buckets + calculators

CA

TX

MA

CO

AZ

"facets" : { "population_ranges" : { "_type" : "histogram", "entries" : [ { "key" : 0, "count" : 255, "min" : 171.0, "max" : 17346.0, "total" : 980306.0, "total_count" : 252, "mean" : 3890.1031746031745 }, { "key" : 500000, "count" : 25, "min" : 956.0, "max" : 17179.0, "total" : 116597.0, "total_count" : 25, "mean" : 4663.88 }, { "key" : 1000000, "count" : 4, "min" : 2798.0, "max" : 4020.0, "total" : 13216.0, "total_count" : 4, "mean" : 3304.0 }, {

Page 13: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

aggregations = buckets + calculators

CA

TX

MA

CO

AZ

Page 14: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

example: density by state aggregation

$ curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_state" : { "terms" : { "field" : "state" }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } }}'

group by this field

calculate stats for this field

Page 15: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

aggregation response "aggregations" : { "mean_density_by_state" : { "terms" : [ { "term" : "CA", "doc_count" : 69, "mean_density" : { "value" : 5558.623188405797 } }, { "term" : "TX", "doc_count" : 32, "mean_density" : { "value" : 2496.625 } }, { "term" : "FL", "doc_count" : 20, "mean_density" : { "value" : 4006.6 } }, { "term" : "CO", "doc_count" : 11, "mean_density" : { "value" : 2944.4 } }, { "term" : "AZ", "doc_count" : 10, "mean_density" : { "value" : 2604.9 } }, {

group by state

density stats

Page 16: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

example: density by population aggregation

$ curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_population" : { "histogram" : { "field" : "population2012", "interval": 500000 }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } }}'

group by population

calculate stats density

Page 17: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

aggregation response "aggregations" : { "mean_density_by_population" : [ { "key" : 0, "doc_count" : 255, "mean_density" : { "value" : 3890.1031746031745 } }, { "key" : 500000, "doc_count" : 25, "mean_density" : { "value" : 4663.88 } }, { "key" : 1000000, "doc_count" : 4, "mean_density" : { "value" : 3304.0 } }, { "key" : 1500000, "doc_count" : 1, "mean_density" : { "value" : 11379.0 }

group by population

density stats

Page 18: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

example: density by population by state

$ curl -XGET "localhost:9200/test-data/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_population_by_state": { "terms" : { "field" : "state" }, "aggs": { "mean_density_by_population" : { "histogram" : { "field" : "population2012", "interval": 500000 }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } } } }}'

group by population

calculate stats on density

group by state

Page 19: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

aggregation response "aggregations" : { "mean_density_by_population_by_state" : { "terms" : [ { "term" : "CA", "doc_count" : 69, "mean_density_by_population" : [ { "key" : 0, "doc_count" : 64, "mean_density" : { "value" : 5382.453125 } }, { "key" : 500000, "doc_count" : 3, "mean_density" : { "value" : 8985.333333333334 } }, { "key" : 1000000, "doc_count" : 1, "mean_density" : { "value" : 4020.0 } }, { "key" : 3500000, "doc_count" : 1, "mean_density" : { "value" : 8092.0 } } ] }, { "term" : "TX", "doc_count" : 32, "mean_density_by_population" : [ { "key" : 0, "doc_count" : 26, "mean_density" : { "value" : 2408.3076923076924

group by population

stats on density

group by state

Page 20: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

out-of-the-box aggregation calculators(elasticsearch >= 1.0)

• avg

• min

• max

• sum

• count

• stats

• extended stats

Page 21: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

out-of-the-box aggregation bucketizers(elasticsearch >= 1.0)

• global

• filter

• missing

• terms

• range

• date range

• ip range

• histogram

• date histogram

• geo distance

• nested

Page 22: ElasticSearch - Introduction to Aggregations

Copyright Elasticsearch 2013. Copying, publishing and/or distributing without written permission is strictly prohibited

aggregations 2.0 (aka bucket reducers)(elasticsearch 2.0)

apply arbitrary functions on buckets• first derivative• second derivative• exponential weighted moving

average• outlier detection