big data – is it a database problem ?
DESCRIPTION
Big Data – is it a database problem ?. Volker Markl http:// www.user.tu-berlin.de/marklv [email protected]. Another “V“ - Value: Big Data in the Cloud - the Information Economy. - PowerPoint PPT PresentationTRANSCRIPT
Valu
e in
cre
ase
Another “V“ - Value:Big Data in the Cloud - the Information Economy
A major new trend in information processing will be the trading of original and enriched data, effectively creating an information economy.
Cloud-Computing Stack The Market situation
MIA, Datamarket, …
MIA, Azure IMR, etc …
Salesforce.com,Office 2010 WebApps
Microsoft Azure,Google App Engine
Amazon Elastic Compute Cloud
Data as a Service
SaaS
PaaS
Information as a Service
IaaS
End users
System administrators
Application Developers
Analysts
Corporations
„When hardware became commoditized, software was valuable. Now that software is being commoditized, data is valuable.“ (TIM O‘REILLY)
„The important question isn’t who owns the data. Ultimately, we all do. A better question is, who owns the means of analysis?“ (A. CROLL, MASHABLE, 2011)
Slide 2
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Marketplace
Trust
Infrastructure as a Service
e.g., Social Media Monitoring
e.g., Media Publisher Services
e.g., SEO
Distributed Data Storage
Massively Parallel Infrastructure
Index
UsersDataproviders Algorithms
Data &Aggregation
RevenueSharing Technology Licensing Queries
Analytical results
German Web
http://www.mia-marktplatz.dehttp://www.dopa-project.eu
Slide 3
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Information Marketplaces: Enabling SMEs to capitalize on Big Data
If I had a hammer! Running in Circles?
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 4
SQL
NoMapReduce SQL--
Running in circles
Re-implement some code paths of
parallel databases or run restricted
non-compatible SQL on Hadoop?
NoSQL
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 5
scripting wrongplatform?
XQuery?SQL--
columnstore++
a queryplan
scalable parallel sort
What is Wrong with this Picture?
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 6
7 for Big Data Analytics
commandments
1: Thou shall use declarative languages
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 7
All-pairs shortest paths using recursive doubling in Stratosphere’s Scala front-end
Avoid impedance mismatch!
Beyond MapReduce
3: Thou shall use rich primitives
Map
Reduce
Cross
Match
CoGroup
2: Thou shall acceptexternal (dynamic) sources
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 8
“In situ” data - no load
4: Thou shall deeply embed UDFs
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 9
Concise and flexible
5: Thou shall optimize
6: Thou shall iterate
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 10
Needed for most interesting analysis cases
Pregel as a Stratosphere query plan with comparable performance.
Different phyiscal jmplementations for iterations exist! (e.g., bulk vs. seminaive evaluations, others? Think join implementations!)
7: Thou shall use a scalable and efficient execution engine
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 11
Pipeline and data parallelism, flexible checkpointing/fault-tolerance, optimized network data transfers, utilizing novel compute and storage models (heterogeneous CPUs, NUMA, etc.), caching