apache flink - community update january 2015

11
Apache Flink Flink Community Update January 2015 Robert Metzger rmetzger @apache.org

Upload: fabian-hueske

Post on 16-Jul-2015

231 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Apache Flink - Community Update January 2015

Apache Flink

Flink Community Update

January 2015

Robert Metzger

[email protected]

Page 2: Apache Flink - Community Update January 2015

Flink Community Updates

• What happened in the Flink community?

check out the monthly newsletter

Subscribe to [email protected]

• This Talk

– Graduation

– Release

– Graph API contribution

– and more…

1flink.apache.org

Page 3: Apache Flink - Community Update January 2015

Graduation to Top Level Project

2flink.apache.org

Official Press release:

https://blogs.apache.org/foundation/entry/the_apa

che_software_foundation_announces69

• Officially acknowledged as a healthy,

vibrant and growing open-source

community, following the Apache Way.

• Attracted many new contributors and

committers during incubation

• More than 75+ contributors now.

Page 4: Apache Flink - Community Update January 2015

Apache Flink in the News

flink.apache.org 3

Page 5: Apache Flink - Community Update January 2015

Release 0.8.0

• Lots of stability improvements

• Scala Streaming API

• Improved streaming windowing semantics

• Kryo as the new fallback serializer

• Extended FileSystem support (Hadoop

compat)

• And more .. check it out!

flink.apache.org 4

Page 6: Apache Flink - Community Update January 2015

In the news: Large-scale Matrix

Factorization with Apache Flink

flink.apache.org 5

Read more: http://data-artisans.com/computing-recommendations-with-

flink.html

• Using the Alternating Least

Squares (ALS) algorithm

on top of Flink

• max matrix size:

• 40 million users,

• 5 million items,

• average of 700 ratings

per user,

28 billion ratings.

• We ran all experiments

with 50 latent factors, for

10 iterations.

Page 7: Apache Flink - Community Update January 2015

flink.apache.org 6

Page 8: Apache Flink - Community Update January 2015

Graph API “gelly”

• Distributed Graph Processing API for Flink

flink.apache.org 7

Graph Creation

create

fromCollection

Graph Properties and Metrics

getVertices

getEdges

getVertexIds

getEdgeIds

numberOfVertices

numberOfEdges

getDegrees

inDegrees

outDegrees

isWeaklyConnected

Graph Mutations

addVertex

addEdge

removeVertex

removeEdge

Graph Transformations

mapVertices

mapEdges

union

filterOnVertices

filterOnEdges

subgraph

reverse

And more..

Page 9: Apache Flink - Community Update January 2015

Upcoming 0.9 release

• Many major changes:

– Intermediate datasets and backtracking

(improves FT, allows new paradigms)

– Distrib. Coordination with Akka

– Reworked YARN client (‘per job’ YARN

clusters)

– Semantic Annotations

– Graph API

flink.apache.org 8

Page 10: Apache Flink - Community Update January 2015

Flink Roadmap

• Currently being discussed by the Flinkcommunity

• Flink has a major release every 3 months, and one or more bug-fixing releases between major releases

• Caveat: rough roadmap, depends on volunteer work, outcome of community discussion, and Apache open source processes

9

Page 11: Apache Flink - Community Update January 2015

Roadmap for 2015 (highlights)

APIs Logical Query

integration

Additional

operators

Interactive

programs

Interactive

Scala shell

SQL-on-Flink

Optimizer Semantic

annotations

HCatalog

integration

Optimizer

hints

Runtime Dual engine

(blocking &

pipelining)

Fine-grained

fault

tolerance

Dynamic

memory

allocation

Streaming Better

memory

management

More

operators in

API

At-least-once

processing

guarantees

Unify batch

and

streaming

Exactly-once

processing

guarantees

ML library First version Additional

algorithms

Mahout

integration

Graph

library

First version

Integration Tez, Samoa Mahout

10