building a lambda architecture with elasticsearch at yieldbot

13
May 06, 2014 Building a Lambda Architecture with Elasticsearch at Yieldbot Richard Shea, CTO @shearic David White, Platform Architect @dtabwhite

Upload: yieldbot

Post on 19-Aug-2014

531 views

Category:

Engineering


7 download

DESCRIPTION

2014-05-06 Presentation to Boston Elasticsearch Meetup on Yieldbot's use of Elasticsearch in a Lambda Architecture

TRANSCRIPT

Page 1: Building a Lambda Architecture with Elasticsearch at Yieldbot

May 06, 2014

Building a Lambda Architecture with Elasticsearch at Yieldbot

Richard Shea, CTO

@shearic

David White, Platform

Architect@dtabwhite

Page 2: Building a Lambda Architecture with Elasticsearch at Yieldbot

Batch computation layer (canonical eg. Hadoop -> HBase)

Real-time computation layer (canonical eg. Storm -> Cassandra)

Serving layer (query HBase, query Cassandra, mix and return)

Slide 2

Lambda Architecture Summary

Page 3: Building a Lambda Architecture with Elasticsearch at Yieldbot

Clickstreams of Events(pageviews, impressions, clicks, etc)

Events contain attributes

Aggregating Counts and Performance

Breakdowns by Several Dimensions

Slide 3

Our Use Case

Page 4: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 4

Our Prior Approach

Two different types of systems

Two different access patterns

Query ability limited

Batch(Hbase)

Realtime(Redis)

Page 5: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 5

Kafka

Persisted event queue

Consumers keep track of offset

Horizontally scalable, topics can be partitioned, etc.

Page 6: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 6

Real-time Layer of Lambda with ES

Daily Index of “raw” events – each event is a document

Elasticsearch Kafka River to index

Real-time processing is trivial, just indexing events

Aggregation of Real-time info pushed to query-time

Page 7: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 7

Batch Layer of Lambda with ES

Monthly Index of Aggregated Data Documents

Hourly Re-index events from archived, covers real-time issues

Aggregate desires breakdowns into documents

When done, note most recent hour completed

Page 8: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 8

Serving Layer of Lambda with ES

Query Aggregated Data Documents as much as possible

Query Raw events from last aggregated available to present

Combine Aggregated and Raw query results together and return

We use Node.js, natural fit

Page 9: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 9

Why Elasticsearch?

- calculations query-time and flexible - real-time is simple

Real-time

- some pre-calculation

- query-time ties it together

Batch

Serving

- queries are flexible

- batch and real-time query access patterns similar

Page 10: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 10

More Elasticsearch Goodies

Kibana

- Mostly real-time events

- Aggregated documents useful too

Snapshotting for backups

Real-time data daily indexes are optimized

Page 11: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 11

Future

ES Aggregations

Split cluster with Tribe Nodes

Aggregation via Spark

Page 12: Building a Lambda Architecture with Elasticsearch at Yieldbot

Slide 12

Good Lessons

Use index aliases

Build in operational plan to re-index

doc_values for raw events and high cardinality query

results

Page 13: Building a Lambda Architecture with Elasticsearch at Yieldbot

Thank You