big data – is it a database problem ?

11
Big Data – is it a database problem? Volker Markl http://www.user.tu-berlin.de/marklv [email protected]

Upload: alayna

Post on 06-Jan-2016

34 views

Category:

Documents


2 download

DESCRIPTION

Big Data – is it a database problem ?. Volker Markl http:// www.user.tu-berlin.de/marklv [email protected]. Another “V“ - Value: Big Data in the Cloud - the Information Economy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Big Data  –  is it  a  database problem ?

Big Data – is it a database problem?

Volker Markl

http://www.user.tu-berlin.de/[email protected]

Page 2: Big Data  –  is it  a  database problem ?

Valu

e in

cre

ase

Another “V“ - Value:Big Data in the Cloud - the Information Economy

A major new trend in information processing will be the trading of original and enriched data, effectively creating an information economy.

Cloud-Computing Stack The Market situation

MIA, Datamarket, …

MIA, Azure IMR, etc …

Salesforce.com,Office 2010 WebApps

Microsoft Azure,Google App Engine

Amazon Elastic Compute Cloud

Data as a Service

SaaS

PaaS

Information as a Service

IaaS

End users

System administrators

Application Developers

Analysts

Corporations

„When hardware became commoditized, software was valuable. Now that software is being commoditized, data is valuable.“ (TIM O‘REILLY)

„The important question isn’t who owns the data. Ultimately, we all do. A better question is, who owns the means of analysis?“ (A. CROLL, MASHABLE, 2011)

Slide 2

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Page 3: Big Data  –  is it  a  database problem ?

Marketplace

Trust

Infrastructure as a Service

e.g., Social Media Monitoring

e.g., Media Publisher Services

e.g., SEO

Distributed Data Storage

Massively Parallel Infrastructure

Index

UsersDataproviders Algorithms

Data &Aggregation

RevenueSharing Technology Licensing Queries

Analytical results

German Web

http://www.mia-marktplatz.dehttp://www.dopa-project.eu

Slide 3

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Information Marketplaces: Enabling SMEs to capitalize on Big Data

Page 4: Big Data  –  is it  a  database problem ?

If I had a hammer! Running in Circles?

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 4

SQL

NoMapReduce SQL--

Running in circles

Re-implement some code paths of

parallel databases or run restricted

non-compatible SQL on Hadoop?

NoSQL

Page 5: Big Data  –  is it  a  database problem ?

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Slide 5

scripting wrongplatform?

XQuery?SQL--

columnstore++

a queryplan

scalable parallel sort

What is Wrong with this Picture?

Page 6: Big Data  –  is it  a  database problem ?

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 6

7 for Big Data Analytics

commandments

Page 7: Big Data  –  is it  a  database problem ?

1: Thou shall use declarative languages

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 7

All-pairs shortest paths using recursive doubling in Stratosphere’s Scala front-end

Avoid impedance mismatch!

Page 8: Big Data  –  is it  a  database problem ?

Beyond MapReduce

3: Thou shall use rich primitives

Map

Reduce

Cross

Match

CoGroup

2: Thou shall acceptexternal (dynamic) sources

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 8

“In situ” data - no load

Page 9: Big Data  –  is it  a  database problem ?

4: Thou shall deeply embed UDFs

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 9

Concise and flexible

5: Thou shall optimize

Page 10: Big Data  –  is it  a  database problem ?

6: Thou shall iterate

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 10

Needed for most interesting analysis cases

Pregel as a Stratosphere query plan with comparable performance.

Different phyiscal jmplementations for iterations exist! (e.g., bulk vs. seminaive evaluations, others? Think join implementations!)

Page 11: Big Data  –  is it  a  database problem ?

7: Thou shall use a scalable and efficient execution engine

Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges

Seite 11

Pipeline and data parallelism, flexible checkpointing/fault-tolerance, optimized network data transfers, utilizing novel compute and storage models (heterogeneous CPUs, NUMA, etc.), caching