apache flink - community update january 2015

Post on 16-Jul-2015

231 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Apache Flink

Flink Community Update

January 2015

Robert Metzger

rmetzger@apache.org

Flink Community Updates

• What happened in the Flink community?

check out the monthly newsletter

Subscribe to news@flink.apache.org

• This Talk

– Graduation

– Release

– Graph API contribution

– and more…

1flink.apache.org

Graduation to Top Level Project

2flink.apache.org

Official Press release:

https://blogs.apache.org/foundation/entry/the_apa

che_software_foundation_announces69

• Officially acknowledged as a healthy,

vibrant and growing open-source

community, following the Apache Way.

• Attracted many new contributors and

committers during incubation

• More than 75+ contributors now.

Apache Flink in the News

flink.apache.org 3

Release 0.8.0

• Lots of stability improvements

• Scala Streaming API

• Improved streaming windowing semantics

• Kryo as the new fallback serializer

• Extended FileSystem support (Hadoop

compat)

• And more .. check it out!

flink.apache.org 4

In the news: Large-scale Matrix

Factorization with Apache Flink

flink.apache.org 5

Read more: http://data-artisans.com/computing-recommendations-with-

flink.html

• Using the Alternating Least

Squares (ALS) algorithm

on top of Flink

• max matrix size:

• 40 million users,

• 5 million items,

• average of 700 ratings

per user,

28 billion ratings.

• We ran all experiments

with 50 latent factors, for

10 iterations.

flink.apache.org 6

Graph API “gelly”

• Distributed Graph Processing API for Flink

flink.apache.org 7

Graph Creation

create

fromCollection

Graph Properties and Metrics

getVertices

getEdges

getVertexIds

getEdgeIds

numberOfVertices

numberOfEdges

getDegrees

inDegrees

outDegrees

isWeaklyConnected

Graph Mutations

addVertex

addEdge

removeVertex

removeEdge

Graph Transformations

mapVertices

mapEdges

union

filterOnVertices

filterOnEdges

subgraph

reverse

And more..

Upcoming 0.9 release

• Many major changes:

– Intermediate datasets and backtracking

(improves FT, allows new paradigms)

– Distrib. Coordination with Akka

– Reworked YARN client (‘per job’ YARN

clusters)

– Semantic Annotations

– Graph API

flink.apache.org 8

Flink Roadmap

• Currently being discussed by the Flinkcommunity

• Flink has a major release every 3 months, and one or more bug-fixing releases between major releases

• Caveat: rough roadmap, depends on volunteer work, outcome of community discussion, and Apache open source processes

9

Roadmap for 2015 (highlights)

APIs Logical Query

integration

Additional

operators

Interactive

programs

Interactive

Scala shell

SQL-on-Flink

Optimizer Semantic

annotations

HCatalog

integration

Optimizer

hints

Runtime Dual engine

(blocking &

pipelining)

Fine-grained

fault

tolerance

Dynamic

memory

allocation

Streaming Better

memory

management

More

operators in

API

At-least-once

processing

guarantees

Unify batch

and

streaming

Exactly-once

processing

guarantees

ML library First version Additional

algorithms

Mahout

integration

Graph

library

First version

Integration Tez, Samoa Mahout

10

top related