fifth elephant - 2014: live analytical dashboards at scale

21
Live analytical dashboards at scale - SQL style Shashwat Agarwal

Upload: agshashwat

Post on 15-Jan-2015

188 views

Category:

Data & Analytics


1 download

DESCRIPTION

https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-style

TRANSCRIPT

Page 1: fifth elephant - 2014: Live analytical dashboards at scale

Live analytical dashboards at scale - SQL style

Shashwat Agarwal

Page 2: fifth elephant - 2014: Live analytical dashboards at scale

Live

Analytical

Page 3: fifth elephant - 2014: Live analytical dashboards at scale

Live

Analytical

Page 4: fifth elephant - 2014: Live analytical dashboards at scale

What we haveS

ervi

ces

(A lo

t of t

hem

) Events(millions of updates)

Information

Page 5: fifth elephant - 2014: Live analytical dashboards at scale

Challenges

• Metric Definition• Scale• Reliability

Page 6: fifth elephant - 2014: Live analytical dashboards at scale

Metric Definition

• Not just count of events; but• func of

• fields from one or more related events/entities

• on each event or a batch of events (for statistical analysis)

• for a set of dimensions

Page 7: fifth elephant - 2014: Live analytical dashboards at scale

Scale Challenges

• Dimensional Lookup• High throughput (write), • Low Latency (query)• MultiDimensional Store

Page 8: fifth elephant - 2014: Live analytical dashboards at scale

Reliability Challenges

• Accuracy• Consistency• Fault tolerance

Page 9: fifth elephant - 2014: Live analytical dashboards at scale

Solution?

Real time + Scale == Stream Processing

Kafka Storm

Page 10: fifth elephant - 2014: Live analytical dashboards at scale

Storage

• MultiDimensional support• Optimized for Time series query• Low query response times• High write throughput• Scalable

TSD*

* OpenTSDB does not support kerberose

Page 11: fifth elephant - 2014: Live analytical dashboards at scale

Metric Definition

• Not scalable to write storm topologies for each metrics

• Require DSL for non-tech folks

Introducing... Esper

Page 12: fifth elephant - 2014: Live analytical dashboards at scale

Storm Topology - 1

Dim Lookup

Dim Lookup

Kafka Spouts

Enricher Bolts

Kafka Bolts

{ id: a123-234, time: 1234, entityId: OD12 …}

Event

{ id: a123-234, time: 1234, entityId: OD12 …}

Enriched Event

Dim Store

Page 13: fifth elephant - 2014: Live analytical dashboards at scale

Storm Topology - 3’

TSDKafka Spouts

Esper Bolts

TSD Bolts

{ id: a123-234, time: 1234, entityId: OD12 …}

Enriched Event

( metric name, [dim name-value-pairs]*, value, ts )

Page 14: fifth elephant - 2014: Live analytical dashboards at scale

Time Batching

• Event time• Enables

• calculate statistics• windowed join• out of order events

Page 15: fifth elephant - 2014: Live analytical dashboards at scale

Reliability

Faults

Upgrades

Metrics Def changes

Last good Checkpoint

Reset Checkpoint

Replay

Transactional Storm

Page 16: fifth elephant - 2014: Live analytical dashboards at scale

Storm Topology - 2

Kafka Spouts

TIme Batch Bolt HBase Bolt

{ id: a123-234, time: 1234, entityId: OD12 …}

Enriched Event

Page 17: fifth elephant - 2014: Live analytical dashboards at scale

HBase Time Batch Schema

Table 1 - Event Queue

• Key<event_ns>_slot_<batchId>batchId is constructed from event timestamp

• Value(each column - Event JSON)

Page 18: fifth elephant - 2014: Live analytical dashboards at scale

HBase Time Batch Schema

Table 2 - Event Queue Update Log

• Key<event_ns>_log_<batchId>_<version>batchId is constructed from event timestampversion is timestamp at which batch was updated

• ValueVersion

Page 19: fifth elephant - 2014: Live analytical dashboards at scale

Storm Topology - 3

TSD

Time Batch Spout

Esper Bolts

TSD Bolts

( metric name, [dim name-value-pairs]*, value, ts )

Page 20: fifth elephant - 2014: Live analytical dashboards at scale

Learnings

• Replayability• Event and Entity Schema• Checkpointing• Bootstrapping• Sidelining• Fault Tolerance

Page 21: fifth elephant - 2014: Live analytical dashboards at scale

Questions ??sb.lk/hasgeek