big data analytics - fakultät iv elektrotechnik und ... · big data analytics chances and...
Post on 06-May-2018
223 Views
Preview:
TRANSCRIPT
Big Data AnalyticsChances and Challenges
Volker Markl | DIMA | BDOD Big Data Chances and Challenges
Volker Markl
Professor and ChairDatabase Systems and Information Management (DIMA), Technische Universität Berlin
www.dima.tu-berlin.de
More and more data is available to science and business!
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 2
Drivers:Cloud ComputingInternet of ServicesInternet of ThingsCyberphysical Systems
Underlying Trends:ConnectivityCollaborationComputer generated data
video streams
web archives
sensor data
audio streams
RFID data
simulation data
Data are becoming increasingly complex!
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 3
Size (volume)Freshness (velocity)Format/Media Type (variability)Uncertainty/Quality (veracity)etc.
Data
Data and analyses are becoming increasingly complex!
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 4
Size (volume)Freshness (velocity)Format/Media Type (variability)Uncertainty/Quality (veracity)etc.
Data
Selection/Grouping (map/reduce)Relational Operators (Join/Correlation)Extraction & Integration, (map/reduce or dataflow systems)Data Mining (R, S+, Matlab )Predictive Models (R, S+, Matlab )etc.
Analysis
Data-driven applications …
… will revolutionize decision making in business and the sciences!
… have great economic potential!Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 5
lifecycle management
home automation health
water management
market research traffic management
energy management
information marketplaces
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 6
General consensus seems to be that people (would like to) have lots of data, which they would like to analyze. How do we go about that?
Running in Circles?
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 7
SQL
NoMapReduceSQL--
NoSQL
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 8
scripting wrongplatform?
XQuery?SQL--
columnstore++
a queryplan
scalable parallel sort
What is Wrong with this Picture?
Big Data Technologies Worldwide
■ MapReduce und Tools: Hadoop, Cloudera Impala (USA), InfoSphereBigInsights (IBM)
■ Data Mining und Informationsextraktion: SystemT (IBM),
■ Statistical Packages: R (Universität Auckland), Matlab (Mathworks;USA), SAS, SPSS
■ DBMS: Aster (Teradata), Greenplum (EMC), DB2 (IBM), SQLServer (Microsoft)Oracle Exadata (Oracle)
■ Business Analytics/Intelligence: SAS Analytics (SAS), Vertica (HP), SPSS IBM)
■ Script Sprachen: Pig, Hive, JAQL
■ Graph Databases: Neo4j (USA), AllegroGraph (Franz Inc.; Giraph (Apache)
■ Visualization: Tableau (Tableau Software; USA),
■ Suche / Indexing: Lucene (Apache), Solr (Apache)
■ Other: UIMA (IBM)
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 9
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 10
System Type Stakeholder
ParStream DBMS (in-memory) ParStream
SAP Hana DBMS(in-memory) SAP
Hyper DBMS(in-memory) TU München
BigMemory DBMS(in-memory) Terracotta Inc./Software AG
Scalaris Storage (parallel) Konrad-Zuse-Zentrum fürInformationstechnik
RapidMiner Data mining toolkit Rapid-I
ImageMaster Enterprise Content Management T-Systems
SalesIntelligence Market research Implisense
Predictive-Analytics-Suite Business Analytics Blue Yonder
Stratosphere Scalable Data Analytics System TU Berlin
Big Data Technologies
3 Big Trends in Big Data
■ Solving complex data analysis problems□ Predictive analytics□ Infinite data streams□ Distributed data□ Dealing with uncertainty□ Bringing the human in the loop: Visual analytics
■ Producing results in near-real time□ First results fast□ Low latency□ Exploiting new Hardware
(manycore, OpenGL/CUDA, FPGA, heterogeneous processors, NUMA, remote memory)
■ Hiding complexity□ Declarative data anaylsis programs► aggregation► relational algebra► iterative algorithms► distributed state
□ Fault tolerance (pessimistic/optimistic)
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 11
Valu
e in
crease
Another “V“ - Value:Big Data in the Cloud - the Information Economy
A major new trend in information processing will be the trading of original and enriched data, effectively creating an information economy.
Cloud-Computing Stack The Market situation
MIA, Datamarket, …
MIA, Azure IMR, etc …
Salesforce.com,Office 2010 WebApps
Microsoft Azure,Google App Engine
Amazon Elastic Compute Cloud
Data as a Service
SaaS
PaaS
Information as a Service
IaaS
End users
System administrators
Application Developers
Analysts
Corporations
„When hardware became commoditized, software was valuable.Now that software is being commoditized, data is valuable.“ (TIM O‘REILLY )
„The important question isn’t who owns the data. Ultimately, we all do. A better question is, who owns the means of analysis?“ (A. CROLL, MASHABLE, 2011)
Slide 12
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Marketplace
Tru
st
Infrastructure as a Service
e.g., Social Media Monitoring
e.g., Social Media Monitoring
e.g., Media Publisher Services
e.g., Media Publisher Services
e.g., SEOe.g., SEO
Distributed Data StorageDistributed Data Storage
Massively Parallel InfrastructureMassively Parallel Infrastructure
IndexIndex
UsersDataproviders Algorithms
Data &Aggregation
RevenueSharing
Technology Licensing QueriesAnalytical
results
German Web
http://www.mia-marktplatz.dehttp://www.dopa-project.eu
Slide 13
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Information Marketplaces: Enabling SMEs to capitalize on Big Data
Data analysis program
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 14
Programcompiler
Dataflow
Runtime operatorsHash- and sort-based out-of-core
operator implementations,memory management
Stratosphere optimizerAutomatic selection of parallelization, shipping and local strategies, operator
order and placementExecution plan
Parallel execution engineTask scheduling, network data transfers,
resource allocation, checkpointing
Job graph Execution graph
Big Data looks Tiny from the Stratosphere
Stratosphere is a European declarative, massively-pa rallel Big Data Analytics System, funded by DFG and EIT, available open-source with Apache License
“
http://www. stratosphere.eu
Germany is on the Map!
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 15
“Scalable Machine Learning for Big Data” Tutorial by Microsoft at the
IEEE International Conference on Data Engeneering2013
5 Mistakes Data Scientists Make When Putting Big Data To Work
Selecting a platform that scales
□ Performance (runtime, first-results fast)□ People
Selecting the right data
□ Information Quality□ Timeliness
Choosing the right model□ Assumptions about the data
Reproducability
□ Numerical stability
Trusting your results
□ Confidence and statistical significance□ Lineage
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 16
Legal Dimension
SocialDimension
EconomicDimension
Technology Dimension
ApplicationDimension
Business ModelsBenchmarking
Impact of Open SourceDeploymentPricing
Scalable Data ProcessingSignal Processing
Statistics/MLLinguistics
HCI/Visualization
„Big Data“ is Interdisciplinary
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Slide 17
CopyrightPrivacyLiability
User BehaviourSocietal ImpactCollaboration
Digital Preservation Decision MakingVerticals
Call to Action
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 18
■ Educate Data Scientists to create the required tale nt□ T-shaped students□ Information „literacy“□ Data Analytics Curriculum
■ Research Big Data Analytics Technologies□ Data management (uncertainty, query processing under near real-time constraints,
information extraction)□ Programming models□ Machine learning and statistical methods□ Systems architectures□ Information Visualization
■ Innovate to maintain competetiveness□ Demonstrate flagship use-cases to raise awareness□ Promote startups in the area of data analytics□ Transfer technologies to enterprises, in particular SMEs□ Determine legal frameworks and business models
We need to ensure a European technological leadership role in „Big Data“
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 19
for Big Data Analytics
commandments
1: Thou shall use declarative languages
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 20
All-pairs shortest paths using recursive doubling in Stratosphere’s Scala front-end
Beyond MapReduce
3: Thou shall use rich primitives
Map
Reduce
Cross
Match
CoGroup
2: Thou shall acceptexternal (dynamic) sources
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 21
“In situ” data - no load
4: Thou shall deeplyembed UDFs
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 22
Concise and flexible
5: Thou shall optimize
6: Thou shall iterate
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 23
Needed for most interesting analysis cases
Pregel as a Stratosphere query plan with comparable performance.
7: Thou shall use a scalable and efficient execution engine
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 24
Pipeline and data parallelism, flexible checkpointing, optimized network data transfers
An Overview of Stratosphere
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 25
Meteor program
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 26
Meteor compiler
Pact program
RuntimeHash- and sort-based out-of-core
operator implementations, memory management
Stratosphere optimizerPicks data shipping and local
strategies, operator orderExecution plan
Nephele execution engineTask scheduling, network data transfers, resource allocation, checkpointing
Job graph Execution graph
Stratosphere bulk iterations
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 27
commandments
Currently making the optimizer iteration-aware and exposing PACT-level abstraction
PACT4S: Embedding of PACTs in Scala
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 28
Master thesis developed language and Scala compiler plug-in, ongoing Master thesis will integrate in query optimizer, still in (very) prototypical mode
Ingredients of a Big Data Project
Big Data Analytics | Volker Markl | BDOD Big Data – Chances and Challenges
Seite 29
database systems,scalable platforms
machine learning/statistics/optimization
real-worldanalysis problem
visualizationNLP
signal processing…
technology(data preparation,
scalable processing)
mathematicalanalysis methods
application
top related