large-scale stream processing in the hadoop ecosystem · large-scale stream processing in the...
TRANSCRIPT
Large-Scale Stream Processing in the Hadoop Ecosystem
Gyula Fó[email protected]
Márton [email protected]
This talk
§ Stream processing by example
§Open source stream processors
§ Runtime architecture and programming model
§Counting words…
§ Fault tolerance and stateful processing
§Closing
2Apache: Big Data Europe2015-‐‑09-‐‑28
Stream processing by example
2015-‐‑09-‐‑28 Apache: Big Data Europe 3
Streaming applications
ETL style operations• Filter incoming data,
Log analysis• High throughput, connectors,
at-least-once processing
Window aggregations• Trending tweets,
User sessions, Stream joins• Window abstractions
2015-‐‑09-‐‑28 Apache: Big Data Europe 4
Input
Input
InputInput
Process/Enrich
Streaming applications
Machine learning• Fitting trends to the evolving
stream, Stream clustering• Model state, cyclic flows
Pattern recognition• Fraud detection, Triggering
signals based on activity• Exactly-once processing
5Apache: Big Data Europe2015-‐‑09-‐‑28
Open source stream processors
2015-‐‑09-‐‑28 Apache: Big Data Europe 6
Apache Streaming landscape
72015-‐‑09-‐‑28 Apache: Big Data Europe
Apache Storm
§ Started in 2010, development driven by BackType, then Twitter
§ Pioneer in large scale stream processing§Distributed dataflow abstraction (spouts &
bolts)
82015-‐‑09-‐‑28 Apache: Big Data Europe
Apache Flink§ Started in 2008 as a research project
(Stratosphere) at European universities§Unique combination of low latency streaming
and high throughput batch analysis§ Flexible operator states and windowing
9
Batch data
Kafka, RabbitMQ, ...
HDFS, JDBC, ...
Stream Data
2015-‐‑09-‐‑28 Apache: Big Data Europe
Apache Spark§ Started in 2009 at UC Berkley, Apache since 2013§ Very strong community, wide adoption§ Unified batch and stream processing over a
batch runtime§ Good integration with batch programs
102015-‐‑09-‐‑28 Apache: Big Data Europe
Apache Samza
§Developed at LinkedIn, open sourced in 2013§ Builds heavily on Kafka’s log based philosophy§ Pluggable messaging system and execution
backend
112015-‐‑09-‐‑28 Apache: Big Data Europe
System comparison
12
Streamingmodel Native Micro-batching Native Native
API Compositional Declarative Compositional Declarative
Fault tolerance Record ACKs RDD-based Log-based Checkpoints
Guarantee At-least-once Exactly-once At-least-once Exactly-once
State Only in Trident State as DStream
Statefuloperators
Statefuloperators
Windowing Not built-in Time based Not built-in Policy based
Latency Very-Low Medium Low LowThroughput Medium High High High
2015-‐‑09-‐‑28 Apache: Big Data Europe
Runtime and programming model
2015-‐‑09-‐‑28 Apache: Big Data Europe 13
Native Streaming
2015-‐‑09-‐‑28 Apache: Big Data Europe 14
Distributed dataflow runtime
§ Storm, Samza and Flink§General properties
• Long standing operators• Pipelined execution• Usually possible to create
cyclic flows
2015-‐‑09-‐‑28 Apache: Big Data Europe 15
Pros• Full expressivity• Low-latency execution• Stateful operators
Cons• Fault-tolerance is hard• Throughput may suffer• Load balancing is an
issue
Distributed dataflow runtime
§ Storm• Dynamic typing + Kryo• Dynamic topology rebalancing
§ Samza• Almost every component pluggable• Full task isolation, no backpressure (buffering
handled by the messaging layer)§ Flink• Strongly typed streams + custom serializers• Flow control mechanism• Memory management
2015-‐‑09-‐‑28 Apache: Big Data Europe 16
Micro-batching
2015-‐‑09-‐‑28 Apache: Big Data Europe 17
Micro-batch runtime§ Implemented by Apache Spark§General properties• Computation broken down
to time intervals• Load aware scheduling• Easy interaction with batch
2015-‐‑09-‐‑28 Apache: Big Data Europe 18
Pros• Easy to reason about• High-throughput• FT comes for “free”• Dynamic load balancing
Cons• Latency depends on
batch size• Limited expressivity• Stateless by nature
Programming model
2015-‐‑09-‐‑28 Apache: Big Data Europe 19
Declarative
§ Expose a high-level API§ Operators are higher order
functions on abstract data stream types
§ Advanced behavior such as windowing is supported
§ Query optimization
Compositional
§ Offer basic building blocks for composing custom operators and topologies
§ Advanced behavior such as windowing is often missing
§ Topology needs to be hand-optimized
Programming model
2015-‐‑09-‐‑28 Apache: Big Data Europe 20
DStream, DataStream§ Transformations abstract
operator details§ Suitable for engineers and data
analysts
Spout, Consumer, Bolt, Task, Topology
§ Direct access to the execution graph / topology
• Suitable for engineers
Counting words…
2015-‐‑09-‐‑28 Apache: Big Data Europe 21
WordCount
2015-‐‑09-‐‑28 Apache: Big Data Europe 22
storm budapest flinkapache storm sparkstreaming samza stormflink apache flinkbigdata stormflink streaming
(storm, 4)(budapest, 1)(flink, 4)(apache, 2)(spark, 1)(streaming, 2)(samza, 1)(bigdata, 1)
StormAssembling the topology
232015-‐‑09-‐‑28 Apache: Big Data Europe
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new SentenceSpout(), 5);builder.setBolt("split", new Splitter(), 8).shuffleGrouping("spout");builder.setBolt("count", new Counter(), 12)
.fieldsGrouping("split", new Fields("word"));
public class Counter extends BaseBasicBolt {Map<String, Integer> counts = new HashMap<String, Integer>();
public void execute(Tuple tuple, BasicOutputCollector collector) {String word = tuple.getString(0);Integer count = counts.containsKey(word) ? counts.get(word) + 1 : 1;counts.put(word, count);collector.emit(new Values(word, count));
}}
Rolling word count bolt
Samza
2015-‐‑09-‐‑28 Apache: Big Data Europe 24
public class WordCountTask implements StreamTask {private KeyValueStore<String, Integer> store;
public void process( IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) {
String word = envelope.getMessage();Integer count = store.get(word);if(count == null){count = 0;}store.put(word, count + 1);collector.send(new OutgoingMessageEnvelope(new
SystemStream("kafka", ”wc"), Tuple2.of(word, count))); }
}
Rolling word count task
Flink
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ").map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
case class Word (word: String, frequency: Int)
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ").map(word => Word(word,1))}
.window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS))
.groupBy("word").sum("frequency")
.print()
Rolling word count
Window word count
252015-‐‑09-‐‑28 Apache: Big Data Europe
SparkWindow word count
Rolling word count (kind of)
262015-‐‑09-‐‑28 Apache: Big Data Europe
Fault tolerance and stateful processing
2015-‐‑09-‐‑28 Apache: Big Data Europe 27
Fault tolerance intro
§ Fault-tolerance in streaming systems is inherently harder than in batch• Can’t just restart computation• State is a problem• Fast recovery is crucial• Streaming topologies run 24/7 for a long period
§ Fault-tolerance is a complex issue• No single point of failure is allowed• Guaranteeing input processing• Consistent operator state• Fast recovery • At-least-once vs Exactly-once semantics
2015-‐‑09-‐‑28 Apache: Big Data Europe 28
Storm record acknowledgements
§ Track the lineage of tuples as they are processed (anchors and acks)
§ Special “acker” bolts track each lineage DAG (efficient xor based algorithm)
§ Replay the root of failed (or timed out) tuples
2015-‐‑09-‐‑28 Apache: Big Data Europe 29
Samza offset tracking§ Exploits the properties of a durable, offset
based messaging layer§ Each task maintains its current offset, which
moves forward as it processes elements§ The offset is checkpointed and restored on
failure (some messages might be repeated)
2015-‐‑09-‐‑28 Apache: Big Data Europe 30
Flink checkpointing
§ Based on consistent global snapshots§Algorithm designed for stateful dataflows
(minimal runtime overhead) § Exactly-once semantics
31Apache: Big Data Europe2015-‐‑09-‐‑28
Spark RDD recomputation
§ Immutable data model with repeatable computation
§ Failed RDDs are recomputed using their lineage
§ Checkpoint RDDs to reduce lineage length
§ Parallel recovery of failedRDDs
§ Exactly-once semantics
2015-‐‑09-‐‑28 Apache: Big Data Europe 32
State in streaming programs
§ Almost all non-trivial streaming programs are stateful
§ Stateful operators (in essence):𝒇: 𝒊𝒏, 𝒔𝒕𝒂𝒕𝒆 ⟶ 𝒐𝒖𝒕, 𝒔𝒕𝒂𝒕𝒆.
§ State hangs around and can be read and modified as the stream evolves
§ Goal: Get as close as possible while maintaining scalability and fault-tolerance
33Apache: Big Data Europe2015-‐‑09-‐‑28
§ States available only in Trident API§Dedicated operators for state updates and
queries§ State access methods• stateQuery(…)• partitionPersist(…)• persistentAggregate(…)
§ It’s very difficult toimplement transactionalstates
Exactly-‐‑once guarantee
34Apache: Big Data Europe2015-‐‑09-‐‑28
§ Stateless runtime by design• No continuous operators• UDFs are assumed to be stateless
§ State can be generated as a separate stream of RDDs: updateStateByKey(…)
𝒇: 𝑺𝒆𝒒[𝒊𝒏𝒌], 𝒔𝒕𝒂𝒕𝒆𝒌 ⟶ 𝒔𝒕𝒂𝒕𝒆.𝒌§ 𝒇 is scoped to a specific key
§ Exactly-once semantics
35Apache: Big Data Europe2015-‐‑09-‐‑28
§ Stateful dataflow operators(Any task can hold state)
§ State changes are storedas a log by Kafka
§Custom storage engines canbe plugged in to the log
§ 𝒇 is scoped to a specific task§At-least-once processing
semantics
36Apache: Big Data Europe2015-‐‑09-‐‑28
§ Stateful dataflow operators (conceptually similar to Samza)
§ Two state access patterns• Local (Task) state• Partitioned (Key) state
§ Proper API integration• Java: OperatorState interface• Scala: mapWithState, flatMapWithState…
§ Exactly-once semantics by checkpointing
37Apache: Big Data Europe2015-‐‑09-‐‑28
Performance
§ Throughput/Latency• A cost of a network hop is 25+ msecs• 1 million records/sec/core is nice
§ Size of Network Buffers/Batching§ Buffer Timeout§Cost of Fault Tolerance§Operator chaining/Stages§ Serialization/Types
2015-‐‑09-‐‑28 Apache: Big Data Europe 38
Closing
2015-‐‑09-‐‑28 Apache: Big Data Europe 39
Comparison revisited
40
Streamingmodel Native Micro-batching Native Native
API Compositional Declarative Compositional Declarative
Fault tolerance Record ACKs RDD-based Log-based Checkpoints
Guarantee At-least-once Exactly-once At-least-once Exactly-once
State Only in Trident State as DStream
Statefuloperators
Statefuloperators
Windowing Not built-in Time based Not built-in Policy based
Latency Very-Low Medium Low LowThroughput Medium High High High
2015-‐‑09-‐‑28 Apache: Big Data Europe
Summary
§ Streaming applications and stream processors are very diverse
§ 2 main runtime designs• Dataflow based (Storm, Samza, Flink)• Micro-batch based (Spark)
§ The best framework varies based on application specific needs
§ But high-level APIs are nice J
2015-‐‑09-‐‑28 Apache: Big Data Europe 41
Thank you!
List of Figures (in order of usage)
§ https://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/CPT-FSM-abcd.svg/326px-CPT-FSM-abcd.svg.png
§ https://storm.apache.org/images/topology.png
§ https://databricks.com/wp-content/uploads/2015/07/image11-1024x655.png
§ https://databricks.com/wp-content/uploads/2015/07/image21-1024x734.png
§ https://people.csail.mit.edu/matei/papers/2012/hotcloud_spark_streaming.pdf, page 2.
§ http://www.slideshare.net/ptgoetz/storm-hadoop-summit2014, page 69-71.
§ http://samza.apache.org/img/0.9/learn/documentation/container/checkpointing.svg
§ https://databricks.com/wp-content/uploads/2015/07/image41-1024x602.png
§ https://storm.apache.org/documentation/images/spout-vs-state.png
§ http://samza.apache.org/img/0.9/learn/documentation/container/stateful_job.png
2015-‐‑09-‐‑28 Apache: Big Data Europe 43