apache flink overview at stockholm hadoop user group

Apache FlinkNext-gen data analysis

Stephan [email protected]

@StephanEwen

2

What is Apache Flink?• Project undergoing incubation in the Apache Software

Foundation

• Originating from the Stratosphere research project started at TU Berlin in 2009

• http://flink.incubator.apache.org

• 59 contributors (doubled in ~ 4 months)

• Has a cool squirrel for a logo

http://flink.incubator.apache.org/

3

What is Apache Flink?

Master

Worker

Worker

Flink Cluster

Analytical Program

Flink Client &Optimizer

4

This talk

• Introduction to Flink

• Flink from a user perspective

• Tour of Flink internals

• Flink roadmap and closing

5

Open Source Data Processing Landscape

5

MapReduce

Hive

Flink

Spark Storm

Yarn Mesos

HDFS

Mahout

Cascading

Tez

Pig

Data processing engines

App and resource management

Applications

Storage, streams KafkaHBase

Crunch

…

6

Common API

StorageStreams

Hybrid Batch/Streaming Runtime

HDFS Files S3

ClusterManager

YARN EC2 Native

Flink Optimizer

Scala API(batch)

Graph API („Spargel“)

JDBC Redis RabbitMQKafkaAzure …

JavaCollections

Streams Builder

Apache Tez

Python API

Java API(streaming)

Apache MRQL

Batc

h

Stre

amin

g

Java API(batch)

Local Execution

7

Flink APIs

8

Parallel Collections / Distributed Data Sets

DataSetA

DataSetB

DataSetC

A (1)

A (2)

B (1)

B (2)

C (1)

C (2)

X

X

Y

Y

Program

Parallel Execution

X Y

Operator X Operator Y

9

Flexible Pipelines

Reduce

Join

Map

Reduce

Map

Iterate

Source

Sink

Source

Map, FlatMap, MapPartition, Filter, Project, Reduce, ReduceGroup, Aggregate, Distinct, Join, CoGoup, Cross, Iterate, Iterate Delta, Iterate-Vertex-Centric

10

DataSet<String> text = env.readTextFile(input);

DataSet<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("\\W")) { out.collect(new Tuple2<>(token, 1)); }) .groupBy(0) .aggregate(SUM, 1);

Word Count, Java API

11

Word Count, Scala API

val input = env.readTextFile(input);val words = input flatMap { line => line.split("\\W+") }val counts = words groupBy { word => word } count()

13

Beyond Key/Value Pairs

DataSet<Page> pages = ...;DataSet<Impression> impressions = ...;

DataSet<Impression> aggregated = impressions .groupBy("url") .sum("count");

pages.join(impressions).where("url").equalTo("url") .print()// outputs pairs of matching pages and impressions

class Impression { public String url; public long count;}

class Page { public String url; public String topic;}

// outputs pairs of pages and impressions

14

Distributed architecture

val paths = edges.iterate (maxIterations) { prevPaths: DataSet[(Long, Long)] => val nextPaths = prevPaths .join(edges) .where(1).equalTo(0) { (left, right) => (left._1,right._2) } .union(prevPaths) .groupBy(0, 1) .reduce((l, r) => l) nextPaths}

Client

Optimization andtranslation to data flow

Job Manager

Scheduling, resource negotiation, …

Task Manager

Data node

mem

ory

heap

Task Manager

Data node

mem

ory

heap

Task Manager

Data node

mem

ory

heap

15

What’s new in Flink

16

DependabilityJV

M H

eap

Flink ManagedHeap

Network Buffers

UnmanagedHeap

(next version unifies network buffersand managed heap)

User Code

Hashing/Sorting/Caching

• Flink manages its own memory

• Caching and data processing happens in a dedicated memory fraction

• System never breaks theJVM heap, gracefully spills

Shuffles/Broadcasts

17

Operating onSerialized Data

• serializes data every time Highly robust, never gives up on you

• works on objects, RDDs may be stored serialized Serialization considered slow, only when needed

• makes serialization really cheap: partial deserialization, operates on serialized form Efficient and robust!

18

Operating onSerialized Data

Microbenchmark• Sorting 1GB worth of (long, double) tuples• 67,108,864 elements• Simple quicksort

19

Memory Managementpublic class WC { public String word; public int count;}

emptypage

Pool of Memory Pages

• Works on pages of bytes, maps objects transparently• Full control over memory, out-of-core enabled• Algorithms work on binary representation• Address individual fields (not deserialize whole object)• Move memory between operations

20


DataSet<Page> pages = ...;DataSet<Impression> impressions = ...;

DataSet<Impression> aggregated = impressions .groupBy("url") .sum("count");

pages.join(impressions).where("url").equalTo("url") .print()// outputs pairs of pages and impressions

class Impression { public String url; public long count;}

class Page { public String url; public String topic;}

// outputs pairs of pages and impressions

21


Why not key/value pairs

• Programs are much more readable ;-)

• Functions are self-contained, do not need to set key for successor)

• Much higher reusability of data types and functionso Within Flink programs, or from other programs

22

Flink programsrun everywhere

Cluster (Batch)Cluster (Streaming)

LocalDebugging

Fink Runtime or Apache Tez

As Java CollectionPrograms

Embedded(e.g., Web Container)

23

Upcoming Streaming API

24

Streaming Throughput

25

Migrate EasilyFlink supports out-of-the-box supports• Hadoop data types (writables)• Hadoop Input/Output Formats• Hadoop functions and object model

Input Map Reduce Output

DataSet DataSet DataSetRed Join

DataSet Map DataSet

OutputS

Input

26

Little tuning or configuration required• Requires no memory thresholds to configure

o Flink manages its own memory

• Requires no complicated network configso Pipelining engine requires much less memory for data exchange

• Requires no serializers to be configuredo Flink handles its own type extraction and data representation

• Programs can be adjusted to data automaticallyo Flink’s optimizer can choose execution strategies automatically

27

Understanding Programs

Visualizes the operations and the data movement of programs

Analyze after execution

Screenshot from Flink’s plan visualizer

28


Analyze after execution (times, stragglers, …)

29


Analyze after execution (times, stragglers, …)

30

Iterations in other systems

Step Step Step Step Step

ClientLoop outside the system

Step Step Step Step Step

ClientLoop outside the system

Iterations in Flink

31

Streaming dataflowwith feedback

map

join

red.

join

System is iteration-aware, performs automatic optimization

Flink

32

Automatic Optimization for Iterative Programs

Caching Loop-invariant DataPushing work„out of the loop“

Maintain state as index

33

Flink RoadmapWhat is the community currently working on?

• Flink has a major release every 3 months,with >=1 big-fixing releases in-between

• Finer-grained fault tolerance• Logical (SQL-like) field addressing• Python API• Flink Streaming , Lambda architecture support• Flink on Tez• ML on Flink (e.g., Mahout DSL)• Graph DSL on Flink• … and more

34

http://flink.incubator.apache.org

github.com/apache/incubator-flink

@ApacheFlink

Engine comparison

35

Paradigm

Optimization

Execution

API

Optimizationin all APIs

Optimizationof SQL queriesnone none

DAG

Transformations on k/v pair collections

Iterative transformations on collections

RDD Cyclicdataflows

MapReduce onk/v pairs

k/v pairReaders/Writers

Batchsorting

Batchsorting andpartitioning

Batch withmemorypinning

Stream without-of-corealgorithms

MapReduce

36

DataSet<Order> large = ...DataSet<Lineitem> medium = ...DataSet<Customer> small = ...

DataSet<Tuple...> joined1 = large.join(medium).where(3).equals(1) .with(new JoinFunction() { ... });

DataSet<Tuple...> joined2 = small.join(joined1).where(0).equals(2) .with(new JoinFunction() { ... });

DataSet<Tuple...> result = joined2.groupBy(3).aggregate(MAX, 2);

Example: Joins in Flink

Built-in strategies include partitioned join and replicated join with local sort-merge or hybrid-hash algorithms.

⋈⋈

γ

large medium

small

37

DataSet<Tuple...> large = env.readCsv(...);DataSet<Tuple...> medium = env.readCsv(...);DataSet<Tuple...> small = env.readCsv(...);

DataSet<Tuple...> joined1 = large.join(medium).where(3).equals(1) .with(new JoinFunction() { ... });

DataSet<Tuple...> joined2 = small.join(joined1).where(0).equals(2) .with(new JoinFunction() { ... });

DataSet<Tuple...> result = joined2.groupBy(3).aggregate(MAX, 2);

Automatic Optimization

Possible execution 1) Partitioned hash-join

3) Grouping /Aggregation reuses the partitioningfrom step (1) No shuffle!!!

2) Broadcast hash-join

Partitioned ≈ Reduce-sideBroadcast ≈ Map-side

38

Running Programs

> bin/flink run prg.jar

Packaged ProgramsRemote EnvironmentLocal Environment

Program JAR file

JVM

master master

RPC &Serialization

RemoteEnvironment.execute()LocalEnvironment.execute()

Spawn embeddedmulti-threaded environment

39

Unifies various kinds of Computations

ExecutionEnvironment env = getExecutionEnvironment();

DataSet<Long> vertexIds = ...DataSet<Tuple2<Long, Long>> edges = ...

DataSet<Tuple2<Long, Long>> vertices = vertexIds.map(new IdAssigner());

DataSet<Tuple2<Long, Long>> result = vertices .runOperation( VertexCentricIteration.withPlainEdges( edges, new CCUpdater(), new CCMessager(), 100));

result.print();env.execute("Connected Components");

Pregel/Giraph-style Graph Computation

40

Delta Iterations speed up certain problems

by a lot

0

200000

400000

600000

800000

1000000

1200000

1400000

Iteration

Bulk

Delta

Twitter Webbase (20)0

1000

2000

3000

4000

5000

6000

Computations performed in each iteration for connected communities of a social graph

Runtime

Cover typical use cases of Pregel-like systems with comparable performance in a generic platform and developer API.

41

What is Automatic Optimization

Run on a sampleon the laptop

Run a month laterafter the data evolved

Hash vs. SortPartition vs. BroadcastCachingReusing partition/sortExecution

Plan A

ExecutionPlan B

Run on large fileson the cluster

ExecutionPlan C

apache flink overview at stockholm hadoop user group

Data & Analytics

flink flink

dataset pages

dataset impressions

configuredo flink

configureo flink

flink apis7

data exchange

data representation