Download - Flink 0.10 @ Bay Area Meetup (October 2015)
![Page 1: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/1.jpg)
Stephan Ewen, Kostas TzoumasFlink committers
co-founders @ data Artisans@StephanEwen, @kostas_tzoumas
Flink-0.10
![Page 2: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/2.jpg)
What is Flink
2
Gelly
Tabl
e
ML
SAM
OA
DataSet (Java/Scala) DataStream (Java/Scala)
Hado
op M
/R
Local Remote Yarn Tez Embedded
Data
flow
Data
flow
(WiP
)
MRQ
L
Tabl
e
Casc
adin
g (W
iP)
Streaming dataflow runtime
![Page 3: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/3.jpg)
Flink 0.10 Summary
Focus on operational readiness• high availability• monitoring• integration with other systems
First-class support for event time
Refined DataStream API: easy and powerful
3
![Page 4: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/4.jpg)
Improved DataStream API Stream data analysis differs from batch data analysis
by introducing time
Streams are unbounded and produce data over time
Simple as batch API if handling time in a simple way
Powerful if you want to handle time in an advanced way (out-of-order records,preliminary results, etc)
4
![Page 5: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/5.jpg)
Improved DataStream API
5
case class Event(location: Location, numVehicles: Long)
val stream: DataStream[Event] = …;
stream .filter { evt => isIntersection(evt.location) }
![Page 6: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/6.jpg)
Improved DataStream API
6
case class Event(location: Location, numVehicles: Long)
val stream: DataStream[Event] = …;
stream .filter { evt => isIntersection(evt.location) } .keyBy("location") .timeWindow(Time.of(15, MINUTES), Time.of(5, MINUTES)) .sum("numVehicles")
![Page 7: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/7.jpg)
Improved DataStream API
7
case class Event(location: Location, numVehicles: Long)
val stream: DataStream[Event] = …;
stream .filter { evt => isIntersection(evt.location) } .keyBy("location") .timeWindow(Time.of(15, MINUTES), Time.of(5, MINUTES)) .trigger(new Threshold(200)) .sum("numVehicles")
![Page 8: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/8.jpg)
Improved DataStream API
8
case class Event(location: Location, numVehicles: Long)
val stream: DataStream[Event] = …;
stream .filter { evt => isIntersection(evt.location) } .keyBy("location") .timeWindow(Time.of(15, MINUTES), Time.of(5, MINUTES)) .trigger(new Threshold(200)) .sum("numVehicles")
.keyBy( evt => evt.location.grid ) .mapWithState { (evt, state: Option[Model]) => { val model = state.orElse(new Model()) (model.classify(evt), Some(model.update(evt))) }}
![Page 9: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/9.jpg)
IoT / Mobile Applications
9
Events occur on devices
Queue / Log
Events analyzed in a
data streaming system
Stream Analysis
Events stored in a log
![Page 10: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/10.jpg)
IoT / Mobile Applications
10
![Page 11: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/11.jpg)
IoT / Mobile Applications
11
![Page 12: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/12.jpg)
IoT / Mobile Applications
12
![Page 13: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/13.jpg)
IoT / Mobile Applications
13
Out of order !!!
First burst of eventsSecond burst of events
![Page 14: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/14.jpg)
IoT / Mobile Applications
14
Event time windows
Arrival time windows
Instant event-at-a-time
Flink supports out of order time (event time) windows,arrival time windows (and mixtures) plus low latency processing.
First burst of eventsSecond burst of events
![Page 15: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/15.jpg)
15
We need aGlobal Clockthat runs onevent time instead of processing time.
![Page 16: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/16.jpg)
16
This is a source
This is our window operator
1
0
0
0 0
1
2
1
2
1
1
This is the current event-time time
2
2
2
2
2
This is a watermark.
![Page 17: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/17.jpg)
17
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(ProcessingTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());
DataStream<Tuple2<String, Integer>> counts = text .flatMap(new ExtractHashtags()) .keyBy(“name”) .timeWindow(Time.of(5, MINUTES) .apply(new HashtagCounter());
Processing Time
![Page 18: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/18.jpg)
18
Event TimeStreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(EventTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());text = text.assignTimestamps(new MyTimestampExtractor());
DataStream<Tuple2<String, Integer>> counts = text .flatMap(new ExtractHashtags()) .keyBy(“name”) .timeWindow(Time.of(5, MINUTES) .apply(new HashtagCounter());
![Page 19: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/19.jpg)
32-35
24-27
20-23
8-110-3
4-7
19Tumbling Windows of 4 Seconds
123412
4
59
9 0
20
20
22212326323321
26
353642
![Page 20: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/20.jpg)
20
![Page 21: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/21.jpg)
High Availability and Consistency
21
No Single-Point-Of-Failureany more
Exactly-once processing semanticsacross pipeline
Checkpoints/Fault Tolerance is decoupled from windows Allows for highly flexible window implementations
ZooKeeperensemble
MultipleMasters
failover
![Page 22: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/22.jpg)
Operator State
Stateless operators System state
User defined state
22
ds.filter(_ != 0)
ds.keyBy("id").timeWindow(Time.of(5, SECONDS)).reduce(…)
public class CounterSum implements RichReduceFunction<Long> { private OperatorState<Long> counter;
@Override public Long reduce(Long v1, Long v2) throws Exception { counter.update(counter.value() + 1); return v1 + v2; }
@Override public void open(Configuration config) { counter = getRuntimeContext().getOperatorState(“counter”, 0L, false); }}
![Page 23: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/23.jpg)
Checkpoints
Consistent snapshots of distributed data stream and operator state
23
![Page 24: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/24.jpg)
Barriers
Markers for checkpoints Injected in the data flow
24
![Page 25: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/25.jpg)
25
Alignment for multi-input operators
![Page 26: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/26.jpg)
High Availability and Consistency
26
No Single-Point-Of-Failureany more
Exactly-once processing semanticsacross pipeline
Checkpoints/Fault Tolerance is decoupled from windows Allows for highly flexible window implementations
ZooKeeperensemble
MultipleMasters
failover
![Page 27: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/27.jpg)
Without high availability
27
JobManager
TaskManager
![Page 28: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/28.jpg)
With high availability
28
JobManager
TaskManager
Stand-byJobManager
Apache Zookeeper™
KEEP GOING
![Page 29: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/29.jpg)
Persisting jobs
29
JobManager
Client
TaskManagers
Apache Zookeeper™
Job
1. Submit job
![Page 30: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/30.jpg)
Persisting jobs
30
JobManager
Client
TaskManagers
Apache Zookeeper™
1. Submit job2. Persist execution graph
![Page 31: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/31.jpg)
Persisting jobs
31
JobManager
Client
TaskManagers
Apache Zookeeper™
1. Submit job2. Persist execution graph3. Write handle to ZooKeeper
![Page 32: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/32.jpg)
Persisting jobs
32
JobManager
Client
TaskManagers
Apache Zookeeper™
1. Submit job2. Persist execution graph3. Write handle to ZooKeeper4. Deploy tasks
![Page 33: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/33.jpg)
Handling checkpoints
33
JobManager
Client
TaskManagers
Apache Zookeeper™
1. Take snapshots
![Page 34: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/34.jpg)
Handling checkpoints
34
JobManager
Client
TaskManagers
Apache Zookeeper™
1. Take snapshots2. Persist snapshots3. Send handles to JM
![Page 35: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/35.jpg)
Handling checkpoints
35
JobManager
Client
TaskManagers
Apache Zookeeper™
1. Take snapshots2. Persist snapshots3. Send handles to JM4. Create global checkpoint
![Page 36: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/36.jpg)
Handling checkpoints
36
JobManager
Client
TaskManagers
Apache Zookeeper™
1. Take snapshots2. Persist snapshots3. Send handles to JM4. Create global checkpoint5. Persist global checkpoint
![Page 37: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/37.jpg)
Handling checkpoints
37
JobManager
Client
TaskManagers
Apache Zookeeper™
1. Take snapshots2. Persist snapshots3. Send handles to JM4. Create global checkpoint5. Persist global checkpoint6. Write handle to ZooKeeper
![Page 38: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/38.jpg)
Performance
38
Continuousstreaming
Latency-boundbuffering
DistributedSnapshots
High Throughput &Low Latency
With configurable throughput/latency tradeoff
![Page 39: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/39.jpg)
Batch and Streaming
39
case class WordCount(word: String, count: Int)
val text: DataStream[String] = …;
text .flatMap { line => line.split(" ") } .map { word => new WordCount(word, 1) } .keyBy("word") .window(GlobalWindows.create()) .trigger(new EOFTrigger()) .sum("count")
Batch Word Count in the DataStream API
![Page 40: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/40.jpg)
Batch and Streaming
40
Batch Word Count in the DataSet API
case class WordCount(word: String, count: Int)
val text: DataStream[String] = …;
text .flatMap { line => line.split(" ") } .map { word => new WordCount(word, 1) } .keyBy("word") .window(GlobalWindows.create()) .trigger(new EOFTrigger()) .sum("count")
val text: DataSet[String] = …;
text .flatMap { line => line.split(" ") } .map { word => new WordCount(word, 1) } .groupBy("word") .sum("count")
![Page 41: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/41.jpg)
Batch and Streaming
41
Pipelined andblocking operators Streaming Dataflow Runtime
Batch Parameters
DataSet DataStream
RelationalOptimizer
WindowOptimization
Pipelined andwindowed operators
Schedule lazilySchedule eagerly
Recompute wholeoperators Periodic checkpoints
Streaming data movement
Stateful operations
DAG recoveryFully buffered streams DAG resource management
Streaming Parameters
![Page 42: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/42.jpg)
Batch and Streaming
42
A full-fledged batch processor as well
![Page 43: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/43.jpg)
Batch and Streaming
43
A full-fledged batch processor as well
More details at Dongwon Kim's Talk"A comparative performance evaluation of Flink"
![Page 44: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/44.jpg)
Integration (picture not complete)
44
POSIX Java/ScalaCollections
POSIX
![Page 45: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/45.jpg)
Monitoring
45
Life system metrics anduser-defined accumulators/statistics
Get http://flink-m:8081/jobs/7684be6004e4e955c2a558a9bc463f65/accumulators
Monitoring REST API for custom monitoring tools
{ "id": "dceafe2df1f57a1206fcb907cb38ad97", "user-accumulators": [ { "name":"avglen", "type":"DoubleCounter", "value":"123.03259440000001" }, { "name":"genwords", "type":"LongCounter", "value":"75000000" } ] }
![Page 46: Flink 0.10 @ Bay Area Meetup (October 2015)](https://reader035.vdocuments.us/reader035/viewer/2022070516/587137861a28abf0568b60e5/html5/thumbnails/46.jpg)
Flink 0.10 Summary
Focus on operational readiness• high availability• monitoring• integration with other systems
First-class support for event time
Refined DataStream API: easy and powerful
46