fifth elephant - 2014: live analytical dashboards at scale
DESCRIPTION
https://funnel.hasgeek.com/fifthel2014/1152-live-analytical-dashboards-at-scale-sql-styleTRANSCRIPT
Live analytical dashboards at scale - SQL style
Shashwat Agarwal
Live
Analytical
Live
Analytical
What we haveS
ervi
ces
(A lo
t of t
hem
) Events(millions of updates)
Information
Challenges
• Metric Definition• Scale• Reliability
Metric Definition
• Not just count of events; but• func of
• fields from one or more related events/entities
• on each event or a batch of events (for statistical analysis)
• for a set of dimensions
Scale Challenges
• Dimensional Lookup• High throughput (write), • Low Latency (query)• MultiDimensional Store
Reliability Challenges
• Accuracy• Consistency• Fault tolerance
Solution?
Real time + Scale == Stream Processing
Kafka Storm
Storage
• MultiDimensional support• Optimized for Time series query• Low query response times• High write throughput• Scalable
TSD*
* OpenTSDB does not support kerberose
Metric Definition
• Not scalable to write storm topologies for each metrics
• Require DSL for non-tech folks
Introducing... Esper
Storm Topology - 1
Dim Lookup
Dim Lookup
Kafka Spouts
Enricher Bolts
Kafka Bolts
{ id: a123-234, time: 1234, entityId: OD12 …}
Event
{ id: a123-234, time: 1234, entityId: OD12 …}
Enriched Event
Dim Store
Storm Topology - 3’
TSDKafka Spouts
Esper Bolts
TSD Bolts
{ id: a123-234, time: 1234, entityId: OD12 …}
Enriched Event
( metric name, [dim name-value-pairs]*, value, ts )
Time Batching
• Event time• Enables
• calculate statistics• windowed join• out of order events
Reliability
Faults
Upgrades
Metrics Def changes
Last good Checkpoint
Reset Checkpoint
Replay
Transactional Storm
Storm Topology - 2
Kafka Spouts
TIme Batch Bolt HBase Bolt
{ id: a123-234, time: 1234, entityId: OD12 …}
Enriched Event
HBase Time Batch Schema
Table 1 - Event Queue
• Key<event_ns>_slot_<batchId>batchId is constructed from event timestamp
• Value(each column - Event JSON)
HBase Time Batch Schema
Table 2 - Event Queue Update Log
• Key<event_ns>_log_<batchId>_<version>batchId is constructed from event timestampversion is timestamp at which batch was updated
• ValueVersion
Storm Topology - 3
TSD
Time Batch Spout
Esper Bolts
TSD Bolts
( metric name, [dim name-value-pairs]*, value, ts )
Learnings
• Replayability• Event and Entity Schema• Checkpointing• Bootstrapping• Sidelining• Fault Tolerance
Questions ??sb.lk/hasgeek