fifth elephant - 2014: live analytical dashboards at scale
Embed Size (px)
DESCRIPTION
https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-styleTRANSCRIPT

Live analytical dashboards at scale - SQL style
Shashwat Agarwal

Live
Analytical

Live
Analytical

What we haveS
ervi
ces
(A lo
t of t
hem
) Events(millions of updates)
Information

Challenges
• Metric Definition• Scale• Reliability

Metric Definition
• Not just count of events; but• func of
• fields from one or more related events/entities
• on each event or a batch of events (for statistical analysis)
• for a set of dimensions

Scale Challenges
• Dimensional Lookup• High throughput (write), • Low Latency (query)• MultiDimensional Store

Reliability Challenges
• Accuracy• Consistency• Fault tolerance

Solution?
Real time + Scale == Stream Processing
Kafka Storm

Storage
• MultiDimensional support• Optimized for Time series query• Low query response times• High write throughput• Scalable
TSD*
* OpenTSDB does not support kerberose

Metric Definition
• Not scalable to write storm topologies for each metrics
• Require DSL for non-tech folks
Introducing... Esper

Storm Topology - 1
Dim Lookup
Dim Lookup
Kafka Spouts
Enricher Bolts
Kafka Bolts
{ id: a123-234, time: 1234, entityId: OD12 …}
Event
{ id: a123-234, time: 1234, entityId: OD12 …}
Enriched Event
Dim Store

Storm Topology - 3’
TSDKafka Spouts
Esper Bolts
TSD Bolts
{ id: a123-234, time: 1234, entityId: OD12 …}
Enriched Event
( metric name, [dim name-value-pairs]*, value, ts )

Time Batching
• Event time• Enables
• calculate statistics• windowed join• out of order events

Reliability
Faults
Upgrades
Metrics Def changes
Last good Checkpoint
Reset Checkpoint
Replay
Transactional Storm

Storm Topology - 2
Kafka Spouts
TIme Batch Bolt HBase Bolt
{ id: a123-234, time: 1234, entityId: OD12 …}
Enriched Event

HBase Time Batch Schema
Table 1 - Event Queue
• Key<event_ns>_slot_<batchId>batchId is constructed from event timestamp
• Value(each column - Event JSON)

HBase Time Batch Schema
Table 2 - Event Queue Update Log
• Key<event_ns>_log_<batchId>_<version>batchId is constructed from event timestampversion is timestamp at which batch was updated
• ValueVersion

Storm Topology - 3
TSD
Time Batch Spout
Esper Bolts
TSD Bolts
( metric name, [dim name-value-pairs]*, value, ts )

Learnings
• Replayability• Event and Entity Schema• Checkpointing• Bootstrapping• Sidelining• Fault Tolerance

Questions ??sb.lk/hasgeek