apache flink - community update january 2015
TRANSCRIPT
Apache Flink
Flink Community Update
January 2015
Robert Metzger
Flink Community Updates
• What happened in the Flink community?
check out the monthly newsletter
Subscribe to [email protected]
• This Talk
– Graduation
– Release
– Graph API contribution
– and more…
1flink.apache.org
Graduation to Top Level Project
2flink.apache.org
Official Press release:
https://blogs.apache.org/foundation/entry/the_apa
che_software_foundation_announces69
• Officially acknowledged as a healthy,
vibrant and growing open-source
community, following the Apache Way.
• Attracted many new contributors and
committers during incubation
• More than 75+ contributors now.
Apache Flink in the News
flink.apache.org 3
Release 0.8.0
• Lots of stability improvements
• Scala Streaming API
• Improved streaming windowing semantics
• Kryo as the new fallback serializer
• Extended FileSystem support (Hadoop
compat)
• And more .. check it out!
flink.apache.org 4
In the news: Large-scale Matrix
Factorization with Apache Flink
flink.apache.org 5
Read more: http://data-artisans.com/computing-recommendations-with-
flink.html
• Using the Alternating Least
Squares (ALS) algorithm
on top of Flink
• max matrix size:
• 40 million users,
• 5 million items,
• average of 700 ratings
per user,
28 billion ratings.
• We ran all experiments
with 50 latent factors, for
10 iterations.
flink.apache.org 6
Graph API “gelly”
• Distributed Graph Processing API for Flink
flink.apache.org 7
Graph Creation
create
fromCollection
Graph Properties and Metrics
getVertices
getEdges
getVertexIds
getEdgeIds
numberOfVertices
numberOfEdges
getDegrees
inDegrees
outDegrees
isWeaklyConnected
Graph Mutations
addVertex
addEdge
removeVertex
removeEdge
Graph Transformations
mapVertices
mapEdges
union
filterOnVertices
filterOnEdges
subgraph
reverse
And more..
Upcoming 0.9 release
• Many major changes:
– Intermediate datasets and backtracking
(improves FT, allows new paradigms)
– Distrib. Coordination with Akka
– Reworked YARN client (‘per job’ YARN
clusters)
– Semantic Annotations
– Graph API
flink.apache.org 8
Flink Roadmap
• Currently being discussed by the Flinkcommunity
• Flink has a major release every 3 months, and one or more bug-fixing releases between major releases
• Caveat: rough roadmap, depends on volunteer work, outcome of community discussion, and Apache open source processes
9
Roadmap for 2015 (highlights)
APIs Logical Query
integration
Additional
operators
Interactive
programs
Interactive
Scala shell
SQL-on-Flink
Optimizer Semantic
annotations
HCatalog
integration
Optimizer
hints
Runtime Dual engine
(blocking &
pipelining)
Fine-grained
fault
tolerance
Dynamic
memory
allocation
Streaming Better
memory
management
More
operators in
API
At-least-once
processing
guarantees
Unify batch
and
streaming
Exactly-once
processing
guarantees
ML library First version Additional
algorithms
Mahout
integration
Graph
library
First version
Integration Tez, Samoa Mahout
10