apache flink® and iot: how statefulevent-time processing ... · apachecon na 2017 - apache flink®...

Post on 20-May-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Aljoscha Krettek@aljoscha

ApacheCon North AmericaMay, 2017

Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics

What I’d Like to Talk About

2

§ IoT and event-time stream processing

§ Stateful stream processing

§ Streaming architecture and Flink

3

Original creators of Apache Flink®

Providers of the dA Platform, a supported

Flink distribution

IoT and Event-time Stream Processing

4

Example Event Sources

5

A Simple Definition

6

IoT use cases from the system’s perspective:

A large number of (distributed) things continuously generating a large amount of data.

IoT: Some Insights

7

§ Data is continuously produced → Stream Processing

§ Events have a timestamp→ Event-time based processing

§ Data/Events can arrive with huge delays/out-of-order

§ Most analyses happen on time windows

What Is Event-Time Processing

8

1977 1980 1983 1999 2002 2005 2015

Processing Time

EpisodeIV

EpisodeV

EpisodeVI

EpisodeI

EpisodeII

EpisodeIII

EpisodeVII

Event Time

What Is Event-Time Processing

9

1312735961112

1234567891011121314Processing Time

Event timestamp

Message Queue

What’s The Problem?

10

13

12

735961112

1234567891011121314Processing Time

Processing-Time Windows 137356

12 137 356Event-Time Windows

12

1112

Mismatch between event time and processing time.

Sources of Time Mismatch§ Big Mismatch• Network disconnects• Slow network

§ Small Mismatch• The nature of distributed systems• Differing system clock time

11

Small Event-Time Mismatch

12

Robust Stream Processing with Apache Flink®:A Simple Walkthroughhttp://data-artisans.com/robust-stream-processing-flink-walkthrough/

13

14

15

Recap: Event-Time§ IoT use cases need event-time

processing§ Even small mismatch of event

time/processing time will lead to wrong results

16

(Stateful) Stream Processing

17

Stream Processing

18

Computation

Computations on never-ending “streams” of data records (“events”)

Distributed Stream Processing

19

Computation

Computation spread across many machines

Computation Computation

Stateful Stream Processing

20

Computation

State

State is usually partitioned by some key in the data

Stateful Stream Processing II

21

§ Result depends on history of stream§ A stateful stream processor should

gives the tools to manage state• Recover, roll back, version upgrade, etc.

22

app state

app state

app state

event log

Queryservice

Recap: Stateful Streams§ Continuous processing of data that is

continuously generated§ I.e., pretty much all “big” data§ It’s all about state and time§ Flink does all of that

23

Operational Issues

24

Operational Questions§ What happens in case of failures?§ What if I need to update my code/Flink?§ Can I re-process my data?§ How can I execute my programs?

25

Failure Handling§ JobManager High-Availability using

ZooKeeper§ Periodic checkpoints of state to

persistent storage (HDFS, S3, …)§ In case of failure: rollback to previous

consistent state

26

Savepoints§ A persistent snapshot of all state§ When starting an application, state can

be initialized from a savepoint§ In-between savepoint and restore we can

update Flink version or user code

27

Closing

28

TL;DR§ Stateful stream processing is nice 😎§ IoT use cases require proper time

management§ Apache Flink is a stateful stream

processor with plenty of nifty features

29

30

Thank you!

@aljoscha@ApacheFlink@dataArtisans

Backup Slides

31

Event-Time Processing

32

What Is Event-Time Processing

33

1977 1980 1983 1999 2002 2005 2015

Processing Time

EpisodeIV

EpisodeV

EpisodeVI

EpisodeI

EpisodeII

EpisodeIII

EpisodeVII

Event Time

What Is Event-Time Processing

34

1312735961112

1234567891011121314Processing Time

Event timestamp

Message Queue

What is Event-Time Streaming§ Events have timestamps

§ Processing depends on timestamps

§ An event-time stream processor should give you the tools to reason about time• Handle streams that are out of

order35

Your code

state

t3 t1 t2t4 t1-t2 t3-t4

Recap: Event-Time§ IoT use cases need event-time

processing§ Even small mismatch of event

time/processing time will lead to wrong results

36

History of Flink

37

A brief History of Flink

38

January ‘10 December ‘14

v0.5 v0.6 v0.7

March ‘16

Flink ProjectIncubation

Top LevelProject

v0.8 v0.10

Release1.0

ProjectStratosphere

(Flink precursor)

v0.9

April ‘14

A brief History of Flink

39

January ‘10 December ‘14

v0.5 v0.6 v0.7

March ‘16

Flink ProjectIncubation

Top LevelProject

v0.8 v0.10

Release1.0

ProjectStratosphere

(Flink precursor)

v0.9

April ‘14

The academia gap:Reading/writing papers, teaching, worrying about

thesis

Realizing this might be interesting to people

beyond academia(even more so,

actually)

top related