from apache flink® 1.3 to 1.4
TRANSCRIPT
![Page 2: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/2.jpg)
2
Original creators of Apache
Flink®
Providers of
dA Platform 2, including
open source Apache Flink +
dA Application Manager
![Page 3: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/3.jpg)
Overview
Apache Flink 1.3 – Previously on Apache
Flink
Apache Flink 1.4 – What’s happening now?
Apache Flink 1.5+ – Next on Apache Flink
3
![Page 4: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/4.jpg)
Previously on Apache Flink
Apache Flink 1.3
![Page 5: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/5.jpg)
Apache Flink 1.3 in Numbers
141 contributors (no deduplication)
1400 commits
>= 680 resolved JIRA issues
+261813 / -65646 LOC
5
![Page 6: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/6.jpg)
Evolution of Flink’s API
6
Flink 1.0.0
State API (ValueState
ReducingState, ListState)
Flink 1.1.0
Session Windows
Late arriving events
Flink 1.2.0
ProcessFunction (access
to state, timers, events)
Flink 1.3.0
Side outputs
Access to per-window state
![Page 7: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/7.jpg)
Side Outputs
Additional outputs for a stream
Late events
Corrupted input data
More expressive APIs
FLINK-4460
7
Process
Function
Main output
Side output
![Page 8: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/8.jpg)
Side Outputs: Example
8
DataStream<Integer> input = ...;final OutputTag<String> outputTag = new OutputTag<String>("side-output"){};
SingleOutputStreamOperator<Integer> mainDataStream = input.process(new ProcessFunction<Integer, Integer>() {
@Override public void processElement(Integer value,Context ctx,Collector<Integer> out) throws Exception {
// emit data to regular outputout.collect(value);
// emit data to side outputctx.output(outputTag, "sideout-" + String.valueOf(value));
}});
DataStream<String> sideOutputStream = mainDataStream.getSideOutput(outputTag);
![Page 9: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/9.jpg)
Evolution of Large State Handling
9
Flink 1.0.0
RocksDB for out-of-core
state support
Flink 1.1.0
Fully async RocksDB
snapshots
Flink 1.2.0
Rescalable keyed and
non-partitioned state
Flink 1.3.0
Incremental checkpoints
Fine-grained recovery
![Page 10: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/10.jpg)
GHCD
Full Checkpoints
10Checkpoint 1 Checkpoint 2 Checkpoint 3
IE
ABCD
ABCD
AFCDE
@t1 @t2 @t3
AFC
DE
GHC
DIE
![Page 11: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/11.jpg)
GHCD
Incremental Checkpoints
11Checkpoint 1 Checkpoint 2 Checkpoint 3
IE
ABCD
ABCD
AFCDE
EF
GHI
@t1 @t2 @t3
![Page 12: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/12.jpg)
Incremental Checkpoints
12
Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4
C1 C3C1 C1
Chunk
1
Chunk
2
Chunk
3Chunk
4Storage
C2 C4C3
![Page 13: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/13.jpg)
Incremental Checkpointing Contd.
Currently supported for RocksDBstate backend
FLINK-5053
Faster and smaller checkpoints
13
Full checkpoint Incremental checkpoint
Size 60 GB 1 – 30 GB
Time 180 s 3 – 30 s
“A Look at Flink’s Internal
Data Structures and
Algorithms for Efficient
Checkpointing” by Stefan
Richter, Tomorrow @
12:20 pm Maschinenhaus
![Page 14: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/14.jpg)
Evolution of High Level APIs
14
Flink 1.0.0
CEP library added
Table API v1
Flink 1.1.0
Table API overhaul
Integration with Apache Calcite
Flink 1.2.0
Tumbling, sliding and session
group-windows for Table API
Flink 1.3.0
Rescalable CEP operators
Retractions in Table API/SQL
![Page 15: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/15.jpg)
Enriched CEP Language
Support for quantifiers (+, *, ?)
FLINK-3318
Iterative conditions
FLINK-6197
Not operator
FLINK-3320
15
“Complex Event Processing With
Flink: The State of FlinkCEP” by
Kostas Kloudas, Today @ 2:30
pm Maschinenhaus
![Page 16: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/16.jpg)
CEP: Detect Dipping Stocks
16
DataStream<Stock> stocks = …;
Pattern<Stock, ?> pattern = Pattern.<Stock>begin("rising").where(new IterativeCondition<Stock>() {
@Overridepublic boolean filter(Stock stock, Context<Stock> ctx) throws Exception {
// calculate the average pricedouble sum = 0.0; int count = 0;for (Stock previousStock : ctx.getEventsForPattern("rising")) {
sum += previousStock.getPrice(); count++;}// only accept if the price is higher or equal than the average pricereturn stock.getPrice() >= sum / count;
}).oneOrMore().next("falling");
PatternStream<Stock> dippingStocks = new PatternStream<>(stocks.keyBy("name"), pattern);DataStream<String> namesOfDippingStocks = dippingStocks.select(…);
![Page 17: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/17.jpg)
What’s Happening Now?
Apache Flink 1.4
![Page 18: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/18.jpg)
Event Driven I/O
18
Rework of Flink’s network stack
Event driven network I/O
Use full available capacity
Near perfect latency behaviour
TCP
Buffer
capacity left
flush
![Page 19: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/19.jpg)
Flow Control
Flow control for TaskManager communication
Single channel no longer stalls other multiplexed channels
Fine-grained backpressure control
Improves checkpoint alignments
19
“Building a Network Stack
for Optimal Throughput /
Low-Latency Trade-Offs”
by Nico Kruber, Today @
2:00 pm Palais Atelier
Receiver
Sender #1
Sender #2
Give credit
Send
credited data
![Page 20: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/20.jpg)
New Deployment Model
Rework of Flink’s distributed architecture
Ready for multitude of deployment scenarios
Support for dynamic scaling
20
“Flink in Containerland” by
Patrick Lucas, Tomorrow
@ 3:20 pm Maschinenhaus
![Page 21: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/21.jpg)
Producing Exactly Once with Kafka 0.11
Support for Kafka 0.11
First Kafka producer with exactly once processingguarantees
21
“Hit Me, Baby, Just One Time
– Building End-to-End Exactly
Once Applications With Flink”
by Piotr Nowojski, Today @
3:20 pm Palais Atelier
Consuming Producing
End-to-End exactly once processing
![Page 22: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/22.jpg)
StreamSQL and Table API
Support for retractions
Extended aggregation support
Support for external table catalogs
Window joins
22
“Unified Stream and Batch
Processing With Apache
Flink’s Relational APIs” by
Fabian Hüske, Tomorrow
@ 11:00 am Kesselhaus
“From Streams to Tables
and Back Again: A Demo
of Flink’s Table & SQL
API” by Timo Walther,
Tomorrow @ 11:50 am
Kesselhaus
![Page 23: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/23.jpg)
Operational Robustness
Drop Java 7
Support Scala 2.12
Avoid dependency hell
Child first class loading
Relocation of dependencies
De-Hadoopification
23
![Page 24: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/24.jpg)
Next on Apache Flink
Apache Flink 1.5+
![Page 25: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/25.jpg)
Side Inputs
Additional input for operator
Join with static data set
Feeding of externally trained ML model
Window joins
Flip-17 design document: https://goo.gl/W4yMEu
25
Process
Function
Main input
Side input
![Page 26: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/26.jpg)
State Management & Evolution
Eager state declaration
State type, serializer and name known at pre-flight time
Flip-22 design document: https://goo.gl/trFiSi
Evolving existing state
Schema updates
Serializer upgrades
26
“Managing State in
Apache Flink” by
Tzu-Li Tai, Today @
4:30 pm Kesselhaus
![Page 27: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/27.jpg)
State Replication
Replicate state between
TaskManagers
Faster recovery in
case of failures
High throughput
queryable state
27
TaskManager
TaskManager
Change log stream
Input
State
![Page 28: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/28.jpg)
Programmatic Job Control
Improve client to give better job control
Run concurrent jobs from the same
program
Trigger savepoints programmatically
Better testing facilities
28
![Page 29: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/29.jpg)
JobClient & ClusterClient
29
StreamExecutionEnvironment env = ...;// define program
JobClient jobClient = env.execute();
CompletableFuture<Acknowledge> savepointFuture = jobClient.takeSavepoint(savepointPath);
// wait for the savepoint completionsavepointFuture.get();
CompletableFuture<JobExecutionResult> resultFuture = jobClient.getResultFuture();
// cancel the jobjobClient.cancelJob();
// get the execution result --> should be canceledJobExecutionResult result = resultFuture.get();
// get list of all still running jobs on the clusterClusterClient clusterClient = jobClient.getClusterClient();CompletableFuture<List<JobInfo>> jobInfosFuture = clusterClient.getJobInfos();List<JobInfo> jobInfos = jobInfosFuture.get();
![Page 30: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/30.jpg)
TL;DL
Apache Flink one of the most innovative open source stream processing platforms
Stay tuned what’s happening next
Visit the in depths talks to learn more about Flink’s internals
30
![Page 31: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/31.jpg)
31
Thank you!
@stsffap
@ApacheFlink
@dataArtisans
![Page 32: From Apache Flink® 1.3 to 1.4](https://reader031.vdocuments.us/reader031/viewer/2022022415/5a6476e47f8b9afc4d8b4683/html5/thumbnails/32.jpg)
We are hiring!
data-artisans.com/careers
32