stratosphere - next generation big data analytics platform from … · 2014-05-14 · the 10...
TRANSCRIPT
Stratosphere
StratosphereNext Generation Big Data Analytics Platform from Europe
Márton BalassiData Mining and Search Group1
Big Data Business Intelligence Group1
1Computer and Automation Research Institute of the Hungarian Academy of Sciences
May 11, 2014
Stratosphere
Table of Contents
Motivation
The 10 commandments for Big Data Analytics
Project info
StratosphereMotivation
Table of contents
Motivation
The 10 commandments for Big Data Analytics
Project info
StratosphereMotivation
The Big Data scene
The Big Data scene
What’s all the hype for?
I Data acquisition is cheapEva Andreasson (Cloudera), 2014
I Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .
StratosphereMotivation
The Big Data scene
The Big Data scene
What’s all the hype for?
I Data acquisition is cheapI Data storage is cheap
Matthew Komorowski, 2014I Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .
StratosphereMotivation
The Big Data scene
The Big Data scene
What’s all the hype for?
I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st century
Harvard Business Review, 2012I It’s a piece of cake . . .
StratosphereMotivation
The Big Data scene
The Big Data scene
What’s all the hype for?
I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . .
StratosphereMotivation
The Big Data scene
The Big Data scene
What’s all the hype for?
I Data acquisition is cheapI Data storage is cheapI Data Science is the Sexiest Job of the 21st centuryI It’s a piece of cake . . . Or is it?
StratosphereMotivation
The Big Data scene
The Big Data scene
Image courtesy of Matt Turck and Shivon Zilis
StratosphereThe 10 commandments for Big Data Analytics
Table of contents
Motivation
The 10 commandments for Big Data Analytics
Project info
StratosphereThe 10 commandments for Big Data Analytics
Stratosphere in one slide
Stratosphere in one slide
A European project just accepted to the Apache Incubator
I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development
StratosphereThe 10 commandments for Big Data Analytics
Stratosphere in one slide
Stratosphere in one slide
A European project just accepted to the Apache Incubator
I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development
StratosphereThe 10 commandments for Big Data Analytics
Stratosphere in one slide
Stratosphere in one slide
A European project just accepted to the Apache Incubator
I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development
StratosphereThe 10 commandments for Big Data Analytics
Stratosphere in one slide
Stratosphere in one slide
A European project just accepted to the Apache Incubator
I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development
StratosphereThe 10 commandments for Big Data Analytics
Stratosphere in one slide
Stratosphere in one slide
A European project just accepted to the Apache Incubator
I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development
StratosphereThe 10 commandments for Big Data Analytics
Stratosphere in one slide
Stratosphere in one slide
A European project just accepted to the Apache Incubator
I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development
StratosphereThe 10 commandments for Big Data Analytics
Stratosphere in one slide
Stratosphere in one slide
A European project just accepted to the Apache Incubator
I Declarative programming (native language bindings)I Schema-on-read (HDFS, databases, . . . )I Rich programming model (beyond MapReduce)I User-defined functions as first-class citizensI Automated parallelization and optimizationI Efficient and scalable execution engineI Intensive development
StratosphereThe 10 commandments for Big Data Analytics
Stratosphere in one slide
Stratosphere in one slide
Contributors
StratosphereThe 10 commandments for Big Data Analytics
1. Thou shalt use declarative programming
1. Thou shalt use declarative programming
K-Means Clustering in Stratosphere’s Scala front-end
StratosphereThe 10 commandments for Big Data Analytics
2. Thou shalt accept external (dynamic) sources
2. Thou shalt accept external (dynamic) sources
„In situ” data – no load
StratosphereThe 10 commandments for Big Data Analytics
3. Thou shalt use rich primitives
3. Thou shalt use rich primitives
Beyond MapReduce
StratosphereThe 10 commandments for Big Data Analytics
3. Thou shalt use rich primitives
3. Thou shalt use rich primitives
Beyond MapReduce
StratosphereThe 10 commandments for Big Data Analytics
4. Thou shalt deeply embed UDFs
4. Thou shalt deeply embed UDFs
Flexible and transparent
StratosphereThe 10 commandments for Big Data Analytics
5. Thou shalt optimize
5. Thou shalt optimize
Auto-parallelization and optimization as in relational databases
StratosphereThe 10 commandments for Big Data Analytics
6. Thou shalt iterate
6. Thou shalt iterate
Needed for most interesting analysis cases
StratosphereThe 10 commandments for Big Data Analytics
7. Thou shalt use a scalable and efficient execution engine
7. Thou shalt use a scalable and efficient executionengine
Reliable and robust infrastructure
StratosphereThe 10 commandments for Big Data Analytics
8. Thou shalt tackle streaming
8. Thou shalt tackle streaming
Integration of low latency jobs
StratosphereThe 10 commandments for Big Data Analytics
9. Thou shalt provide a common API through the whole framework
9. Thou shalt provide a common API through the wholeframework
Batch? BSP? Streaming? You just write the same code. . .
StratosphereThe 10 commandments for Big Data Analytics
10. Thou shalt support the lambda architecture
10. Thou shalt support the lambda architecture
Combine the reliability of batch and the speed of streaming toenable real-time queries on large datasets
First hourof input
. . .1 to 2hours
old input
Less thanan hourold input
Output
Batch1 . . . Batchn−1
Streaming1 . . . Streamingn−1 Streamingn
StratosphereProject info
Table of contents
Motivation
The 10 commandments for Big Data Analytics
Project info
StratosphereProject info
Where to look for us
Where to look for us
Project homepageThe project can be found at stratosphere.eu.The homepage served as a source for the code and most of thepictures presented on these slides.
Data Mining and Search & Big Data BI GroupsThe webpage of Budapest team members’ research groups can befound at dms.sztaki.hu and at bigdatabi.sztaki.hu.
Márton [email protected]