k. tzoumas & s. ewen – flink forward keynote

47
Welcome to The first conference on Apache Flink Sponsored by

Upload: flink-forward

Post on 18-Feb-2017

6.612 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: K. Tzoumas & S. Ewen – Flink Forward Keynote

Welcome to

The first conference on Apache Flink

Sponsored by

Page 2: K. Tzoumas & S. Ewen – Flink Forward Keynote

Some practical info §  Registration, cloakroom, and meals are in

Palais

§  Information point always staffed

§  WiFi is FlinkForward

§  Twitter hashtag is #ff15

§  Follow @FlinkForward

Page 3: K. Tzoumas & S. Ewen – Flink Forward Keynote

Some practical info

§  Need help? Look for a volunteer (pink badges)

§  All sessions are recorded and will be made available online

§  This includes the training sessions

3

Page 4: K. Tzoumas & S. Ewen – Flink Forward Keynote

Getting around

4

Please go around while talks are in progress

Page 5: K. Tzoumas & S. Ewen – Flink Forward Keynote

Our speaker organizations

5

Page 6: K. Tzoumas & S. Ewen – Flink Forward Keynote

Kostas Tzoumas and Stephan Ewen @kostas_tzoumas | @StephanEwen

Apache FlinkTM: From Incubation to Flink 1.0

Page 7: K. Tzoumas & S. Ewen – Flink Forward Keynote

7

1.  A bit of history

2.  The streaming era and Flink

3.  Inside Flink 0.10

4.  Towards Flink 1.0 and beyond

Page 8: K. Tzoumas & S. Ewen – Flink Forward Keynote

A bit of history From incubation until now

8

Page 9: K. Tzoumas & S. Ewen – Flink Forward Keynote

9

DataSet API (Java/Scala)

Flink core

Local Remote Yarn

Apr 2014 Jun 2015 Dec 2014

0.7 0.6 0.5 0.9 0.9-m1 0.10

Oct 2015

Top level

0.8

Gel

ly

Tabl

e

Flin

kML

SAM

OA

DataSet (Java/Scala/Python) DataStream (Java/Scala)

Hado

op M

/R

Flink dataflow engine

Local Remote Yarn Tez Embedded

Dat

aflow

Dat

aflow

Casc

adin

g

Tabl

e

Stor

m

Page 10: K. Tzoumas & S. Ewen – Flink Forward Keynote

Community growth

Flink is one of the largest and most active Apache big data projects with well over 120 contributors

10

Page 11: K. Tzoumas & S. Ewen – Flink Forward Keynote

Flink meetups around the globe

11

Page 12: K. Tzoumas & S. Ewen – Flink Forward Keynote

Featured in

12

Page 13: K. Tzoumas & S. Ewen – Flink Forward Keynote

The streaming era Welcome to

13

Page 14: K. Tzoumas & S. Ewen – Flink Forward Keynote

14

batch

event based

need new systems

well served

Page 15: K. Tzoumas & S. Ewen – Flink Forward Keynote

15

Streaming is the biggest change in data infrastructure since Hadoop

Page 16: K. Tzoumas & S. Ewen – Flink Forward Keynote

16

1.  Radically simplified infrastructure 2.  Internet of Things, on-demand services

3.  Can completely subsume batch

Page 17: K. Tzoumas & S. Ewen – Flink Forward Keynote

17

In a world of events and isolated apps, the stream processor is the backbone of the data infrastructure

App App

App

local view

local view local view

Consistent movement,

analytics

App App App

Global view Consistent store

Page 18: K. Tzoumas & S. Ewen – Flink Forward Keynote

18

§  Until now, stream processors were less mature than batch processors

§  This led to •  in-house solutions •  abuse of batch processors •  Lambda architectures

§  This is no longer the case

Page 19: K. Tzoumas & S. Ewen – Flink Forward Keynote

19

Flink 0.10 With the upcoming 0.10 release, Flink significantly surpasses the state of the art in open source stream processing systems.

And, we are heading to Flink 1.0 after that.

Page 20: K. Tzoumas & S. Ewen – Flink Forward Keynote

20

§  Streaming technology has matured •  e.g., Flink, Kafka, Dataflow

§  Flink and Dataflow duality •  a Google technology •  an open source Apache project

+

Page 21: K. Tzoumas & S. Ewen – Flink Forward Keynote

21

§  Streaming is happening

§  Better adapt now

§  Flink 0.10: a ready to use open source stream processor

Page 22: K. Tzoumas & S. Ewen – Flink Forward Keynote

Flink 0.10 Flink for the streaming era

22

Page 23: K. Tzoumas & S. Ewen – Flink Forward Keynote

Improved DataStream API §  Stream data analysis differs from batch data

analysis by introducing time

§  Streams are unbounded and produce data over time

§  Simple as batch API if handling time in a simple way

§  Powerful if you want to handle time in an advanced way (out-of-order records, preliminary results, etc)

23

Page 24: K. Tzoumas & S. Ewen – Flink Forward Keynote

Improved DataStream API

24

case  class  Event(location:  Location,  numVehicles:  Long)    

val  stream:  DataStream[Event]  =  …;    stream        .filter  {  evt  =>  isIntersection(evt.location)  }  

Page 25: K. Tzoumas & S. Ewen – Flink Forward Keynote

Improved DataStream API

25

case  class  Event(location:  Location,  numVehicles:  Long)    

val  stream:  DataStream[Event]  =  …;    stream        .filter  {  evt  =>  isIntersection(evt.location)  }          .keyBy("location")        .timeWindow(Time.of(15,  MINUTES),  Time.of(5,  MINUTES))        .sum("numVehicles")  

Page 26: K. Tzoumas & S. Ewen – Flink Forward Keynote

Improved DataStream API

26

case  class  Event(location:  Location,  numVehicles:  Long)    

val  stream:  DataStream[Event]  =  …;    stream        .filter  {  evt  =>  isIntersection(evt.location)  }          .keyBy("location")        .timeWindow(Time.of(15,  MINUTES),  Time.of(5,  MINUTES))        .trigger(new  Threshold(200))        .sum("numVehicles")  

Page 27: K. Tzoumas & S. Ewen – Flink Forward Keynote

Improved DataStream API

27

case  class  Event(location:  Location,  numVehicles:  Long)    

val  stream:  DataStream[Event]  =  …;    stream        .filter  {  evt  =>  isIntersection(evt.location)  }          .keyBy("location")        .timeWindow(Time.of(15,  MINUTES),  Time.of(5,  MINUTES))        .trigger(new  Threshold(200))        .sum("numVehicles")          .keyBy(  evt  =>  evt.location.grid  )        .mapWithState  {  (evt,  state:  Option[Model])  =>  {              val  model  =  state.orElse(new  Model())              (model.classify(evt),  Some(model.update(evt)))          }}  

Page 28: K. Tzoumas & S. Ewen – Flink Forward Keynote

IoT / Mobile Applications

28

Events occur on devices

Queue / Log

Events analyzed in a data streaming

system

Stream Analysis

Events stored in a log

Page 29: K. Tzoumas & S. Ewen – Flink Forward Keynote

IoT / Mobile Applications

29

Page 30: K. Tzoumas & S. Ewen – Flink Forward Keynote

IoT / Mobile Applications

30

Page 31: K. Tzoumas & S. Ewen – Flink Forward Keynote

IoT / Mobile Applications

31

Page 32: K. Tzoumas & S. Ewen – Flink Forward Keynote

IoT / Mobile Applications

32

Out of order !!!

First burst of events

Second burst of events

Page 33: K. Tzoumas & S. Ewen – Flink Forward Keynote

IoT / Mobile Applications

33

Event time windows

Arrival time windows

Instant event-at-a-time

Flink supports out of order time (event time) windows, arrival time windows (and mixtures) plus low latency processing.

First burst of events

Second burst of events

Page 34: K. Tzoumas & S. Ewen – Flink Forward Keynote

High Availability and Consistency

34

No Single-Point-Of-Failure any more

Exactly-once processing semantics across pipeline

Checkpoints/Fault Tolerance is decoupled from windows è Allows for highly flexible window implementations

ZooKeeper ensemble

Multiple Masters

failover

Page 35: K. Tzoumas & S. Ewen – Flink Forward Keynote

Performance

35

Continuous streaming

Latency-bound buffering

Distributed Snapshots

High Throughput & Low Latency

With configurable throughput/latency tradeoff

Page 36: K. Tzoumas & S. Ewen – Flink Forward Keynote

Batch and Streaming

36

case  class  WordCount(word:  String,  count:  Int)    

val  text:  DataStream[String]  =  …;    text      .flatMap  {  line  =>  line.split("  ")  }      .map  {  word  =>  new  WordCount(word,  1)  }      .keyBy("word")      .window(GlobalWindows.create())      .trigger(new  EOFTrigger())      .sum("count")  

Batch Word Count in the DataStream API

Page 37: K. Tzoumas & S. Ewen – Flink Forward Keynote

Batch and Streaming

37

Batch Word Count in the DataSet API

case  class  WordCount(word:  String,  count:  Int)    

val  text:  DataStream[String]  =  …;    text      .flatMap  {  line  =>  line.split("  ")  }      .map  {  word  =>  new  WordCount(word,  1)  }      .keyBy("word")      .window(GlobalWindows.create())      .trigger(new  EOFTrigger())      .sum("count")  

val  text:  DataSet[String]  =  …;    text      .flatMap  {  line  =>  line.split("  ")  }      .map  {  word  =>  new  WordCount(word,  1)  }      .groupBy("word")      .sum("count")      

Page 38: K. Tzoumas & S. Ewen – Flink Forward Keynote

Batch and Streaming

38

Pipelined and blocking operators Streaming Dataflow Runtime

Batch Parameters

DataSet DataStream

Relational Optimizer

Window Optimization

Pipelined and windowed operators

Schedule lazily Schedule eagerly

Recompute whole operators Periodic checkpoints

Streaming data movement

Stateful operations

DAG recovery Fully buffered streams DAG resource management

Streaming Parameters

Page 39: K. Tzoumas & S. Ewen – Flink Forward Keynote

Batch and Streaming

39

A full-fledged batch processor as well G

elly

Tabl

e

Flin

kML

SAM

OA

DataSet (Java/Scala/Python) DataStream (Java/Scala)

Hado

op M

/R

Flink dataflow engine

Local Remote Yarn Tez Embedded

Dat

aflow

Dat

aflow

Casc

adin

g

Tabl

e

Stor

m

Page 40: K. Tzoumas & S. Ewen – Flink Forward Keynote

Batch and Streaming

40

A full-fledged batch processor as well G

elly

Tabl

e

Flin

kML

SAM

OA

DataSet (Java/Scala/Python) DataStream (Java/Scala)

Hado

op M

/R

Flink dataflow engine

Local Remote Yarn Tez Embedded

Dat

aflow

Dat

aflow

Casc

adin

g

Tabl

e

Stor

m

More details at Dongwon Kim's Talk "A comparative performance evaluation of Flink"

Page 41: K. Tzoumas & S. Ewen – Flink Forward Keynote

Integration (picture not complete)

41

POSIX   Java/Scala Collections

POSIX  

Page 42: K. Tzoumas & S. Ewen – Flink Forward Keynote

Monitoring

42

Life system metrics and user-defined accumulators/statistics

Get  http://flink-­‐m:8081/jobs/7684be6004e4e955c2a558a9bc463f65/accumulators  

Monitoring REST API for custom monitoring tools

{  "id":  "dceafe2df1f57a1206fcb907cb38ad97",  "user-­‐accumulators":  [      {  "name":"avglen",  "type":"DoubleCounter",  "value":"123.03259440000001"  },      {  "name":"genwords",  "type":"LongCounter",  "value":"75000000"  }  ]  }  

Page 43: K. Tzoumas & S. Ewen – Flink Forward Keynote

Flink 0.10 Summary §  Focus on operational readiness •  high availability •  monitoring •  integration with other systems

§  First-class support for event time

§  Refined DataStream API: easy and powerful

43

Page 44: K. Tzoumas & S. Ewen – Flink Forward Keynote

Towards Flink 1.0 and beyond Where we see the project going

44

Page 45: K. Tzoumas & S. Ewen – Flink Forward Keynote

Towards Flink 1.0

§  Flink 1.0 is around the corner

§  Focus on defining public APIs and automatic API compatibility checks

§  Guarantee backwards compatibility in all Flink 1.X versions

45

Page 46: K. Tzoumas & S. Ewen – Flink Forward Keynote

Beyond Flink 1.0 §  Flink engine has most features in place

§  Focus on usability features on top of DataStream API •  e.g., SQL, ML, more connectors

§  Continue work on elasticity and memory management

46

Page 47: K. Tzoumas & S. Ewen – Flink Forward Keynote

47  

Enjoy the rest of

The first conference on Apache Flink