from rivulets to rivers: elastic stream processing in heron
Post on 05-Apr-2017
49 Views
Preview:
TRANSCRIPT
From Rivulets to Rivers:Elastic Stream Processing in Heron
Bill Graham , Twitter - @billgrahamAshvin Agrawal, MicrosoftAvrilia Floratou, Microsoft
Prediction is very difficult, especially if it’s about the future.- Nils Bohr
We cannot direct the wind, but we can adjust the sails.- Dolly Parton
Outline
Heron Overview
Elastic Scaling Opportunities
Current Implementation
Work in Progress – Auto-scaling
Heron
A realtime, distributed, fault-tolerant stream processing engine.
About Heron Developed by Twitter in 2014 Open sourced in May 2016 Storm API compatible Isolation at all levels:
- Topology- Container- Task (process-based)
At least once, at most once semantics Backpressure Low resource overhead (< 10%)
Logical Topology
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
Physical Execution
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
Packing Plan How to distribute instances onto containers? IPacking.pack()
Topology SubmissionContainer 2
Container 0
Stream Manager
S1 S2 B3
B4 B5 B6
TopologyMaster
HeronClient
HeronScheduler
Container 3
Stream Manager
S1 B2 B3
B4 B5 B6
Container 1
Stream Manager
S1 S2 B3
B4 B5 B6
heron submit
PackingPlan
Instances RegisterStream Manager RegistersData Flows
Containers AllocatedProcesses Initialize
Data Rate Variations
Parallelism Challenges
Anticipating component parallelism is difficult
Changing parallelism is costly - O(hour)
- code change, review, merge, build, kill, submit
Tuning for load spikes or valleys is manual - O(day)
Under-provisioning leads to back pressure leads to support costs
Over-provisioning is the norm
Over-provisioning
25%
40%
CPU Used
CPU Requested
Elastic Scaling Opportunity
Reduce administration cost
Reduce support cost
Reduce hardware cost
Provide better SLA
User Tasks Heron System Tasks
Ordinary Topology Management Process
Kill Topology
Submit Topology
Create Packing
AcquireResource
s
Monitor / Estimate
Build State
Start Topology
Install Topology
Time Consuming Tasks
ReleasesResources
Low-cost Topology “update”
32 2 34 4
User Tasks Heron System Tasks
Optimized Topology Scale-up Process
KillTopology
Submit Topology
Create Packing
AcquireResources
Monitor / Estimate
Build State
Start Topology
Install Topology
Update Topology
PauseTopology
Un-PauseTopology
Add / Reduce
ResourcesPrepare
Components
heron “update” …
Minimizes Disruption
Aggressively Prunes Containers
Aims to Maintain Uniform Component Distribution
$ heron update my_cluster/user/dev MyTopology \--component-parallelism=bolt1:20 \--component-parallelism=bolt2:40
Available in 0.14.5
Execution Time O(mins)
Customizable Through IRepacking.repack()
Current Limitations
Automated state transition not yet supported
- Component scaling event notification : IUpdatable.update()
- Example: KafkaSpout queue partition mappings
Fields group routing might change
- Workaround: pause topology > cache flush interval before scaling
Algorithmic Auto-Scaling
User Tasks Heron System TasksUser Tasks Heron System Tasks
Algorithmic Auto-Scaling …
Submit Topology
Create Packing
AcquireResources
Monitor / Estimate
Build State
Start Topology
Install Topology
Update Topology
PauseTopology
Un-PauseTopology
Add / Reduce
ResourcesPrepare
Components
Auto-Scaling
Heron should automatically identify
variations in the incoming load and
react to them.
Dhalion periodically observes the state of the topology and determines
whether resources should be scaled up or
down.
Heron uses Dhalion to adjust to external shocks.
Dhalion is a framework that provides
self-regulating capabilities to Heron and will be
open-sourced in the near future.
Using Dhalion to Auto-Scale
Pending Packets Detector
Backpressure Detector
Processing Rate Skew Detector
Resource Underprovisioning
Diagnoser
Data Skew Diagnoser
Slow Instances Diagnoser
Resolver Invocation
Diagnosis
Symptom Detection Diagnosis Generation
Bolt Scale Down
Resolver
Bolt Scale Up Resolver
Restart Instances Resolver
Resource Overprovisioning
Diagnoser
Data Skew Resolver
Sym
ptom
s
Resolution
Metrics
Dhalion’s scales up and down the topology resources as needed while still keeping the topology in a steady state where backpressure is not observed
Initial Results
0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119
-0.20
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40Spout Splitter Bolt
Time (in minutes)
Nor
mal
ized
Thr
ough
put Scale
Down
Scale Up
S1
S2
S3Dhalion is able to adjust the
topology resources on-the-fly when workload spikes occur.
Our policy eventually reaches a healthy state where
backpressure is not observed and the overall throughput is
maximized.
Future Plans
Use Dhalion to enforce throughput and latency
SLOsand to auto-tune Heron
topologies.
Open-source Dhalion and the auto-scaling
policy as part of Heron.
Combine scaling with stateful stream
processing.
Get Involved
http://github.com/twitter/heron
http://heronstreaming.io
@heronstreaming
Up Next
Anomaly detection in real-time data streams using Heron
Arun Kejariwal, Machine ZoneKarthik Ramasamy, Twitter
Questions?
top related