elasticsearch meetup final_2014_04

14
Elasticsearch for reporting analytics on communities Elasticsearch Meetup - 23 April 2014 Marc Harrison

Upload: marcharrison

Post on 26-Jan-2015

120 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Elasticsearch meetup final_2014_04

Elasticsearch for reporting analytics on communities

Elasticsearch Meetup - 23 April 2014

Marc Harrison

Page 2: Elasticsearch meetup final_2014_04

Lithium makes software that helps brands better connect with their customers

Our social software helps companies respond on social networks and build trusted content on a community they own.

Page 3: Elasticsearch meetup final_2014_04

Empower brands to distill terabytes of daily data into understanding participation

▪ fast

▪ flexible

▪ scalable

What products/services are generating the

most conversations?

Who is authoring content that

generates the most kudos/likes?

Are customer posts getting timely

replies?

What types of content does this

audience segment look for?

Page 4: Elasticsearch meetup final_2014_04

Lithium Social Intelligence (LSI)

Page 5: Elasticsearch meetup final_2014_04

Cluster specs▪ One of our clusters – elastic search 1.0

• 7+ billion documents/4.4+ TB and growing fast!• 21 nodes (3 masters, 2 clients, 16 data)

Page 6: Elasticsearch meetup final_2014_04

Lessons learned

▪ Bulk loading

▪ Faceting

Page 7: Elasticsearch meetup final_2014_04

Bulk initial load / rebuild of data

Hadoop

mysql streamTransform/

route

JSON Elasticsearch

Page 8: Elasticsearch meetup final_2014_04

Bulk loading▪ Make sure ingest logic is robust

• Idempotent for bulk reply - ‘_id’ • Include revision based on processor/time• Check cluster/index status to make sure ready to ingest

▪ Know the cache and thread pool sizes• Bulk – fixed - # of processors - queue size 50• Handle back off and retry

▪ How many docs?• Like capacity - test with data –

• number of shards• index.refresh_interval: 30s• indices.memory.index_buffer_size: 5%• indices.memory.*• index.translog.*

Page 9: Elasticsearch meetup final_2014_04

Search - time series pattern for scale

Page 10: Elasticsearch meetup final_2014_04

Faceting

▪ Don't forget about memory!• Strings - not_analyzed• Numbers long vs int, double vs float, etc• Do you need seconds/minutes when faceting?• fielddata format - doc_values (1.0)• Admin API’s allow checking field data size + evictions

• indices.cache.filter.size: 15%• indices.fielddata.cache.size: 45%

Page 11: Elasticsearch meetup final_2014_04

Faceting II

▪ Accuracy• shard_size• Number of shards• Cardinality• Routing

▪ Great custom plugin framework• Uniques• Array faceting

Page 12: Elasticsearch meetup final_2014_04

Impact

▪ Order of magnitude improvement

▪ Developers able to focus on improving insights

▪ community + elasticsearch + hadoop + horton works = exciting

Page 13: Elasticsearch meetup final_2014_04

Select settings (data center)• bootstrap.mlockall: true• cluster.routing.allocation.disk.threshold_enabled: true• http.compression: true• transport.tcp.compress: true• gateway.recover_after_data_nodes: 13• gateway.recover_after_master_nodes: 2• gateway.recover_after_time: 3m• gateway.expected_nodes: 17• indices.memory.index_buffer_size: 5%• indices.cache.filter.size: 15%• indices.fielddata.cache.size: 45%• index.store.type: mmapfs• index.translog.flush_threshold_ops: 10000• action.auto_create_index: false• action.disable_delete_all_indices: true• cluster.routing.allocation.node_initial_primaries_recoveries: 4• cluster.routing.allocation.node_concurrent_recoveries: 15• indices.recovery.max_bytes_per_sec: 100mb• indices.recovery.concurrent_streams: 5• discovery.zen.minimum_master_nodes: 2• index.search.slowlog.threshold.query.warn: 5s• index.search.slowlog.threshold.query.info: 1s• index.indexing.slowlog.threshold.index.warn: 5s• plugin.mandatory: lithium-unique-facets

Page 14: Elasticsearch meetup final_2014_04

Questions?