flink. pure streaming
TRANSCRIPT
![Page 1: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/1.jpg)
Flink Pure Streaming
Paco GuerreroBig Data & Solutions Architect 9/21/16
![Page 2: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/2.jpg)
Not for Geeks
![Page 3: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/3.jpg)
![Page 4: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/4.jpg)
Life as Time
4
![Page 5: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/5.jpg)
Anything as Time
![Page 6: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/6.jpg)
Flink as Time
![Page 7: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/7.jpg)
Streaming vs Batch
7
![Page 8: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/8.jpg)
“Abstraction of reality used to facilitate information processing”
MicroBatch Batch
![Page 9: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/9.jpg)
Batch
![Page 10: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/10.jpg)
Batch
![Page 11: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/11.jpg)
Batch
All Input
![Page 12: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/12.jpg)
Batch
Batch Job
All Input
![Page 13: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/13.jpg)
Batch
Batch Job
All Input
All Output
Nothing about timeTimestamps used as trick to keep real time fingerprint
![Page 14: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/14.jpg)
Streaming
“Continuous processing of data that is continuously produced”
![Page 15: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/15.jpg)
Streaming
“Streaming is the next programming paradigm for data applications, and you need to start thinking in terms of streams”
“Continuous processing of data that is continuously produced”
![Page 16: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/16.jpg)
Streaming
“Streaming is the next programming paradigm for data applications, and you need to start thinking in terms of streams”
“Continuous processing of data that is continuously produced”
Data Stream: Infinite sequence of data arriving in a continuous fashion.
![Page 17: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/17.jpg)
Streaming
“Streaming is the next programming paradigm for data applications, and you need to start thinking in terms of streams”
“Continuous processing of data that is continuously produced”
Data Stream: Infinite sequence of data arriving in a continuous fashion.
Stream processing is the backbone of the new data infrastructure.
![Page 18: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/18.jpg)
Streaming
“Streaming is the next programming paradigm for data applications, and you need to start thinking in terms of streams”
“Continuous processing of data that is continuously produced”
Data Stream: Infinite sequence of data arriving in a continuous fashion.
Stream processing is the backbone of the new data infrastructure.
“The world beyond batch” A high-level tour of modern data-processing concepts. By Tyler Akidau
August 5, 2015 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
![Page 19: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/19.jpg)
Streaming
![Page 20: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/20.jpg)
Streaming
Streaming Job
![Page 21: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/21.jpg)
Streaming
Streaming Job
![Page 22: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/22.jpg)
Streaming
Streaming Job
Real Life Time !!
![Page 23: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/23.jpg)
Streaming is the biggest change in
data infraestructure since Hadoop
Streaming
![Page 24: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/24.jpg)
The biggest change is moving from
batch to streaming is handling time explicitly
Streaming
![Page 25: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/25.jpg)
Micro Batch
![Page 26: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/26.jpg)
Micro Batch
![Page 27: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/27.jpg)
Micro Batch
Batch Job 1
![Page 28: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/28.jpg)
Batch JobBatch Job 2
All Output
Batch Job 1
Micro Batch
![Page 29: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/29.jpg)
Batch JobBatch Job 2
All Input
All OutputBatch Job 3All Output
All Output
Batch Job 1
Batch Frequency ?Timestamps keeps real time fingerprint
Micro Batch
![Page 30: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/30.jpg)
Streaming Technologies
Batch StreamingMicro Batch
StateLess – Record acknowledgementsCPU bounded performanceNot expressive declarative functional API – Low Level APINot auto scalingLow level programmatic topology Poor Streaming Windows funcionalitiesNot compatible with Hadoop APIs
Streams
![Page 31: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/31.jpg)
Streaming Technologies
Batch StreamingMicro Batch
StateLess – Record acknowledgementsCPU bounded performanceNot expressive declarative functional API – Low Level APINot auto scalingLow level programmatic topology Poor Streaming Windows funcionalitiesNot compatible with Hadoop APIs
Streams
![Page 32: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/32.jpg)
Streaming Technologies
Batch StreamingMicro Batch
StateLess – Record acknowledgementsCPU bounded performanceNot expressive declarative functional API – Low Level APINot auto scalingLow level programmatic topology Poor Streaming Windows funcionalitiesNot compatible with Hadoop APIs
Streams
![Page 33: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/33.jpg)
Streaming Technologies
Batch StreamingMicro Batch
StateLess – Record acknowledgementsCPU bounded performanceNot expressive declarative functional API – Low Level APINot auto scalingLow level programmatic topology Poor Streaming Windows funcionalitiesNot compatible with Hadoop APIs
Streams
![Page 34: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/34.jpg)
Streaming Technologies
![Page 35: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/35.jpg)
Streaming Technologies
Batch StreamingMicro Batch
![Page 36: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/36.jpg)
Streaming Technologies
Batch StreamingMicro Batch
![Page 37: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/37.jpg)
Streaming Technologies
Batch StreamingMicro Batch
![Page 38: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/38.jpg)
Apache Flink
38
![Page 39: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/39.jpg)
FlinkOpen Source Stream Processing Framework. Last available Release 1.1.1
Top Level Apache Project since Dec '14
![Page 40: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/40.jpg)
FlinkOpen Source Stream Processing Framework. Last available Release 1.1.1
Top Level Apache Project since Dec '14
Main FeaturesNative Stream Low LatencyHigh throughputStatefulExactly-one guaranteesDistributedExpressive APIsAnd more ….
![Page 41: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/41.jpg)
FlinkOpen Source Stream Processing Framework. Last available Release 1.1.1
Top Level Apache Project since Dec '14
Main FeaturesNative Stream Low LatencyHigh throughputStatefulExactly-one guaranteesDistributedExpressive APIsAnd more ….
![Page 42: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/42.jpg)
Flink Flink
![Page 43: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/43.jpg)
Flink
![Page 44: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/44.jpg)
Flink Integration
![Page 45: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/45.jpg)
YARN upcoming...
Flink Integration
![Page 46: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/46.jpg)
Flink Integration
YARN upcoming...
![Page 47: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/47.jpg)
Flink Integration
YARN upcoming...
![Page 48: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/48.jpg)
upcoming...
Flink Integration
YARN upcoming...
![Page 49: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/49.jpg)
Flink Stack
![Page 50: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/50.jpg)
Flink Runtime Engine
![Page 51: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/51.jpg)
Flink Runtime Engine
![Page 52: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/52.jpg)
Distributed pipelined processing
Execute everything as Stream
Iterative ( cyclic ) dataflows
Mutable state in operations
Operate on managed memory (*)
Also works on batch !!
Job Manager
Client
Optimizer
Dataflow Graph
Flink Runtime Engine
![Page 53: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/53.jpg)
Distributed pipelined processing
Execute everything as Stream
Iterative ( cyclic ) dataflows
Mutable state in operations
Operate on managed memory (*)
Also works on batch !!Workers ( Task Managers )
Job Manager
Client
Optimizer
Dataflow Graph
Execution Graph
Flink Runtime Engine
![Page 54: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/54.jpg)
Stream Job
Batch Job
ML Job
Flink Runtime Engine
Graph Job
optimizer
optimizer
optimizer
optimizer
![Page 55: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/55.jpg)
Stream Job
Batch Job
ML Job
Flink Runtime Engine
Graph Job
optimizer
optimizer
optimizer
optimizer
![Page 56: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/56.jpg)
Tasks scheduled and executed in workers ( slots )
Tasks as chain of operators
Run operator logic in a pipelined fashion
State is kept in operators
Stream Job
Batch Job
ML Job
Flink Runtime Engine
Graph Job
optimizer
optimizer
optimizer
optimizer
![Page 57: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/57.jpg)
Tasks scheduled and executed in workers ( slots )
Tasks as chain of operators
Run operator logic in a pipelined fashion
State is kept in operators
Stream Job
Batch Job
ML Job
Flink Runtime Engine
Graph Job
optimizer
optimizer
optimizer
optimizer
![Page 58: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/58.jpg)
Tasks scheduled and executed in workers ( slots )
Tasks as chain of operators
Run operator logic in a pipelined fashion
State is kept in operators
Stream Job
Batch Job
ML Job
Flink Runtime Engine
Graph Job
optimizer
optimizer
optimizer
optimizer
![Page 59: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/59.jpg)
If you want to know one thing about Flink is that you don't need to know
the internals of Flink
![Page 60: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/60.jpg)
Events Time &
Windows
Fault Tolerance &
CorrectnessState Handling
Low Latency &
High ThroughputAPI Libraries SQL
Building Blocks
![Page 61: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/61.jpg)
Events Time &
Windows
Fault Tolerance &
CorrectnessState Handling
Low Latency &
High ThroughputAPI Libraries SQL
Building Blocks
lTime references
lOut of order events
lPowerful Windowing
![Page 62: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/62.jpg)
Event Times & Windowing
![Page 63: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/63.jpg)
Event Times & Windowing
EventTime
EventTime
![Page 64: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/64.jpg)
Event Times & Windowing
Flink Data Source
EventTime
EventTime
Ingestion Time
![Page 65: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/65.jpg)
Event Times & Windowing
Flink Data Source
Flink Window Operator
EventTime
EventTime
Ingestion Time
Processing Time
![Page 66: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/66.jpg)
Event Time: when data is generated
Ingestion time: when data is loaded from source
Processing time: when data is processed
Event time help to process out- of-order events and replay elements as the ocurred ( deterministic results )
Explicit handling of time. 3 choices:
Event Times & Windowing
![Page 67: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/67.jpg)
Event Times & Windowing
![Page 68: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/68.jpg)
Event time. Out or Order
![Page 69: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/69.jpg)
1 2 3 5 7
4 6 8 9 10
Event time. Out or Order
![Page 70: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/70.jpg)
1 2 3 5 7
4 6 8 9 10
Event time. Out or Order
Out or Order
1 2 3 5 74 6 8 9 10
![Page 71: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/71.jpg)
1 2 3 5 7
4 6 8 9 10 1 2 3 5 74 6 8 9 104
Event time. Out or Order
Ingestion Time Windows
Out or Order
1 2 3 5 74 6 8 9 10
![Page 72: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/72.jpg)
1 2 3 5 7
4 6 8 9 10 1 2 3 5 74 6 8 9 10
1 2 3
4
4 5
Event time. Out or Order
6 7 8 9 10
Event Time Windows
Ingestion Time Windows
Out or Order
1 2 3 5 74 6 8 9 10
![Page 73: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/73.jpg)
Event time. Watermarks
![Page 74: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/74.jpg)
1 2 3 5 7
4 6 8 9 10
Event time. Watermarks
![Page 75: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/75.jpg)
1 2 3 5 7
4 6 8 9 10
1 2 3 54 6 8
1 2 3 54 6 8
1 2 3
4
4 5
Event time. Watermarks
6 8
Event Time Windows
Ingestion Time Windows
Out or Order
![Page 76: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/76.jpg)
1 2 3 5 7
4 6 8 9 10
1 2 3 54 6 8 910
1 2 3 54 6 8 910
1 2 3
4
4 5
Event time. Watermarks
6 8 9 10
Event Time Windows
Ingestion Time Windows
Out or Order
Not event time before 5 will come
Late Time of 2
5
![Page 77: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/77.jpg)
1 2 3 5 7
4 6 8 9 10
1 2 3 5 74 6 8 910
1 2 3 5 74 6 8 910
1 2 3
4
4 5
Event time. Watermarks
6 7 8 9 10
Event Time Windows
Ingestion Time Windows
Out or Order
Not event time before 10 will come
Late Time of 2
10
![Page 78: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/78.jpg)
Windowing
Windows: grouping of events according to time, session*, count
![Page 79: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/79.jpg)
Windowing
Windows: grouping of events according to time, session*, count
Powerful built-in windows:
![Page 80: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/80.jpg)
Windowing
Windows: grouping of events according to time, session*, count
Powerful built-in windows: Count: number of events to trigger the window. Process X last events each Y events.
![Page 81: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/81.jpg)
Windowing
Windows: grouping of events according to time, session*, count
Powerful built-in windows: Count: number of events to trigger the window. Process X last events each Y events.
Time: l Tumbling: trigger every X time with received events
l Sliding: trigger every X time with received events in last Y time
![Page 82: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/82.jpg)
Windowing
Windows: grouping of events according to time, session*, count
Powerful built-in windows: Count: number of events to trigger the window. Process X last events each Y events.
Time: l Tumbling: trigger every X time with received events
l Sliding: trigger every X time with received events in last Y time
Session: all events from session/user X until session time expired ( Gap )
![Page 83: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/83.jpg)
Windowing
Windows: grouping of events according to time, session*, count
Powerful built-in windows: Count: number of events to trigger the window. Process X last events each Y events.
Time: l Tumbling: trigger every X time with received events
l Sliding: trigger every X time with received events in last Y time
Session: all events from session/user X until session time expired ( Gap )
High level API for user windows: Window Assigner, Trigger, Evictor
![Page 84: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/84.jpg)
Events Time &
Windows
Fault Tolerance &
CorrectnessState Handling
Low Latency &
High ThroughputAPI Libraries SQL
Building Blocks
lManaged operator state for backup/recovery
lSavepoints
![Page 85: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/85.jpg)
Stateful Streaming
Op
Stateless StreamProcessing
![Page 86: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/86.jpg)
Stateful Streaming
Op Op
State
Stateless StreamProcessing
Stateful StreamProcessing
lBuilt-in internal state in each operator for exactly-once semantics
lUser state can be declared in each operator to be saved locally in memory ( API, key/value pars )
lSnapshots: periodically local states in memory are persisted in lightweight distributed snapshots. No global pause !!
lCheckpoint as global consistent point-in-time snapshot build by set of distributed snapshots.
lPluggable state backend for snapshots:JobManager, HDFS, RocksDB
lSavepoints: user-triggered retained checkpoint
![Page 87: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/87.jpg)
Events Time &
Windows
Fault Tolerance &
CorrectnessState Handling
Low Latency &
High ThroughputAPI Libraries SQL
Building Blocks
lExactly-once semantics with managed operator state
lDistributed Snapshotting Algorithm
![Page 88: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/88.jpg)
Periodically
Chandy-Lamport Snapshots
“The global-state-detection algorithm is to be superimposed on the underlying computation: It must run concurrently with, but no alter, this underlying computation”
. Triggers snapshots asynchronously
. Embedded snapshots algorithm in stream of data ( barriers )
. No global pause, lightweight impact in performance
Handling Checkpoints
![Page 89: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/89.jpg)
Periodically
Chandy-Lamport Snapshots
“The global-state-detection algorithm is to be superimposed on the underlying computation: It must run concurrently with, but no alter, this underlying computation”
. Triggers snapshots asynchronously
. Embedded snapshots algorithm in stream of data ( barriers )
. No global pause, lightweight impact in performance
Handling Checkpoints
![Page 90: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/90.jpg)
Periodically
Chandy-Lamport Snapshots
“The global-state-detection algorithm is to be superimposed on the underlying computation: It must run concurrently with, but no alter, this underlying computation”
. Triggers snapshots asynchronously
. Embedded snapshots algorithm in stream of data ( barriers )
. No global pause, lightweight impact in performance
Handling Checkpoints
![Page 91: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/91.jpg)
snapshot
Job Manager
Periodically pushes barriers for new state
New state X+1
Ack for Snapshot state X from Task N
Handling Checkpoints
![Page 92: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/92.jpg)
snapshot
Job Manager
Handling Checkpoints
![Page 93: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/93.jpg)
snapshot
Job Manager
Handling Checkpoints
![Page 94: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/94.jpg)
snapshot
Job Manager
Handling Checkpoints
All Acks received
Register Checkpoint for restore in case of fail
![Page 95: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/95.jpg)
Streaming Fault Tolerance
In case of fail, last global checkpoint is recovered ( recovery from partial checkpoint / individual snapshots is coming )
Need of stateful source like kafka to ensure end-to-end exactly-once semantic in case of fail.
Kafka sink doesn't guarantee end-to-end exactly-once ( multiple writes in topic ) ( at least-once )
Semantics in Flink:
At Least Once: never loses events, events might be reprocessed
Exactly once: neither reprocessed nor lost events.
Exactly once by default, with low impact in performance and development effort, unlike another tools like Storm or Spark.
![Page 96: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/96.jpg)
If you want to know one thing about Flink is that you don't need to know
the internals of Flink
![Page 97: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/97.jpg)
Events Time &
Windows
Fault Tolerance &
CorrectnessState Handling
Low Latency &
High ThroughputAPI Libraries SQL
Building Blocks
lPipelined runtime
lLatency vs throughput tunning
![Page 98: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/98.jpg)
Exactly-once semantic with low impact in performance
Controllable checkpointing overhead
Higher throughput using processing time
Performance improvements thanks to:
. operator chaining during optimization phase
. own optimized serialization stack with code generation
Performance Tunning
![Page 99: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/99.jpg)
Benchmark for “Streaming Computation” published by Yahoo. Dec 18, 2015https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
Production use-case
l counting ad impressions group by campaign
l aggregations over a 10 second window
l save current aggregate value to Redis every second
Streaming Benchmark
![Page 100: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/100.jpg)
Throughput vs Latency Graph
Throughput ( 1000 events / sec )
99 PercentileLatency ( ms )
Not Operator combinig in Storm, more complicate topology, more steps for events and more overhead
![Page 101: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/101.jpg)
Apache Storm Without Tridentl At least once / Double counting after fail / Lost state after Failuresl CPU bounded
Apache Sparkl Latency increase with throughput
Apache Flinkl Exactly once / No double counting / No state lossl Limited by bandwidth between Kafka and Flink cluster l (1 GigE).
l kafka brokers within Kafka Cluster ( 10 GigE ) l Achieved 15 million messages /sec l ( before 3 million m/sec) with exactly once semantic
10,000,000 20,000,000
1 GigE
10 GigE
Performance Tunning
![Page 102: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/102.jpg)
Events Time &
Windows
Fault Tolerance &
CorrectnessState Handling
Low Latency &
High ThroughputAPI Libraries SQL
Building Blocks
lHigh Level API
lWide range of basic and advanced operators
lJava , Scala. Python soon !!
![Page 103: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/103.jpg)
API
![Page 104: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/104.jpg)
API
Working on data streams ( bounded ? )
![Page 105: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/105.jpg)
API
Working on data streams ( bounded ? )
Stream Processing: Explicit Handling of Time
![Page 106: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/106.jpg)
API
Working on data streams ( bounded ? )
Stream Processing: Explicit Handling of Time
Java & Scala. Python coming. Java: Bean type classes vs Tuples with position addresses. Scala: case classes.
![Page 107: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/107.jpg)
API
Working on data streams ( bounded ? )
Stream Processing: Explicit Handling of Time
Java & Scala. Python coming. Java: Bean type classes vs Tuples with position addresses. Scala: case classes.
Operators:
Sources: kafka, FileSystem, Cassandra …
Sinks: Kafka, HDFS, Cassandra ….
Transformations: Basic: map, flatmap, filter, grouping, iterate, project, join, cross, … Streaming: Windowing + Aggregations, Temporal Binary Iterative Stream operators
![Page 108: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/108.jpg)
API
Working on data streams ( bounded ? )
Stream Processing: Explicit Handling of Time
Java & Scala. Python coming. Java: Bean type classes vs Tuples with position addresses. Scala: case classes.
Operators:
Sources: kafka, FileSystem, Cassandra …
Sinks: Kafka, HDFS, Cassandra ….
Transformations: Basic: map, flatmap, filter, grouping, iterate, project, join, cross, … Streaming: Windowing + Aggregations, Temporal Binary Iterative Stream operators
DataStream<?> DataSet<?>
Core API
1 implementation*, 2 interfaces
![Page 109: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/109.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Map
Source
Operators
![Page 110: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/110.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Map
Source
Operators
![Page 111: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/111.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Source
Filter
Operators
![Page 112: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/112.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Source
Filter
Operators
![Page 113: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/113.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Source
Reduce
Operators
![Page 114: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/114.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Source
Reduce
Operators
![Page 115: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/115.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Source
Join
Operators
![Page 116: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/116.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Source
Join
Operators
![Page 117: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/117.jpg)
Source Map Reduce
Fliter
Join Sum Sink
Source
Operators
![Page 118: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/118.jpg)
Events Time &
Windows
Fault Tolerance &
CorrectnessState Handling
Low Latency &
High ThroughputAPI Libraries SQL
Building Blocks
lEasy to use. SQL !!
lBased on Apache Calcite
![Page 119: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/119.jpg)
API extension for DataSets y DataStreams
Based on relational Table abstraction
Table <=> Source / DataSet / DataStream
Operators like: where, select, as, groupBy, join, union, minus, distinct, orderBy, ...
Table API
![Page 120: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/120.jpg)
Execute SQL-Like sentences on DataSets and Datastreams
Resuts returned as Table ( Table API ), convertible to DataStream or DataSets
SQL and Table API can be seamlessly mixed over DataStream/DataSets
Flink’s SQL support is not feature complete, yet.
Queries that include unsupported SQL will fail !!
SQL Support
SQL
![Page 121: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/121.jpg)
Parsing and Logical plan for Table operators and SQL are optimized using Apache Calcite
Only supported a Subset of the comprehensive SQL standard
Apache Calcite provides with:
SQL Parsing
API for building expressions in relational algebra
Query planning engine
Provides SQL for Streaming Queries with windows aggregations
SELECT STREAM TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime, productId, COUNT(*) AS c, SUM(units) AS units FROM Orders GROUP BY TUMBLE(rowtime, INTERVAL '1' HOUR), productId;
Apache Calcite
SQL Sentence
Apache Calcite: SQL to Logical
Plan as Relational Algebra
Flink Optimizer: Logical Plan to Execution Plan
![Page 122: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/122.jpg)
If you want to know
one thing about Flink
is that you don't need
to know the internals of Flink
So … Batch
![Page 123: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/123.jpg)
Batch on Stream
![Page 124: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/124.jpg)
Stream: Unbounded Data Stream
Unbounded Data Stream
Batch on Stream
![Page 125: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/125.jpg)
Stream: Unbounded Data Stream
Batch: Bounded stream ( dataset ) on a stream processor
Global window over the entire dataset
Optimization in operators for joins and grouping, with blocking data exchange if needed
Unbounded Data Stream
Bounded Data Set
Batch on Stream
![Page 126: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/126.jpg)
Stream: Unbounded Data Stream
Batch: Bounded stream ( dataset ) on a stream processor
Global window over the entire dataset
Optimization in operators for joins and grouping, with blocking data exchange if needed
Unbounded Data Stream
Bounded Data Set
Batch on Stream
![Page 127: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/127.jpg)
Stream: Unbounded Data Stream
Batch: Bounded stream ( dataset ) on a stream processor
Global window over the entire dataset
Optimization in operators for joins and grouping, with blocking data exchange if needed
Batch specific optimizations:
Cost-based optimizer: dataset size known before hand
Manage memory on / off-heap for join, sort, …
Optimization serialization stack for user-types
Bounded Data Set
Batch on Stream
Unbounded Data Stream
![Page 128: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/128.jpg)
Conclusions
![Page 129: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/129.jpg)
Conclusions
![Page 130: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/130.jpg)
Conclusions
Flink Pure streaming engine matches real life. No Abstraction
![Page 131: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/131.jpg)
Conclusions
Flink Pure streaming engine matches real life. No Abstraction
Batch on streaming
![Page 132: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/132.jpg)
Conclusions
Flink Pure streaming engine matches real life. No Abstraction
Batch on streaming
Flexible Windowing Semantics with Explicit Time handling
![Page 133: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/133.jpg)
Conclusions
Flink Pure streaming engine matches real life. No Abstraction
Batch on streaming
Flexible Windowing Semantics with Explicit Time handling
Competitive Performance, low latency and hight throughput
![Page 134: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/134.jpg)
Conclusions
Flink Pure streaming engine matches real life. No Abstraction
Batch on streaming
Flexible Windowing Semantics with Explicit Time handling
Competitive Performance, low latency and hight throughput
Apache Beam, open sourced by Google, uses Flink as its first order runner forBatch and Streaming processing in partnership with Data Artisans.
100% Compliance of data processing model “what, where, when, how “
![Page 135: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/135.jpg)
![Page 136: Flink. Pure Streaming](https://reader036.vdocuments.us/reader036/viewer/2022081514/58ef68981a28ab590b8b456b/html5/thumbnails/136.jpg)
Indizen Technologies, S.L
Paseo de la Castellana, 130 - 4ª planta
28046 Madrid, Spain
Tel. 91 535 85 68
www.indizen.com
@indizen_corp¡gracia
s!
136
Francisco José Guerrero
Big Data & Solutions Architect
Tel. 91 535 85 68
Mov. XXX YYY ZZZ