user-centric data management in the era of big...
Post on 06-Jul-2020
0 Views
Preview:
TRANSCRIPT
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
USER-CENTRIC DATA MANAGEMENT IN THE ERA OF BIG DATA Alexandros Labrinidis Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh http://labrinidis.cs.pitt.edu
1
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
Data-Intensive Science
2
Data-intensive scienceObservational Simulation
One (virtual) instrument
Multiple instruments
✤ SDSS: Sloan Digital Sky Survey (2000 - )200 GB/night
✤ LSST: Large Synoptic Survey Telescope (2015 - )30 TB/night -- 1.28PB/year
✤ LHC: Large Hadron Collider15 PB/year
✤ SKA: Square Kilometer Array (2019 - )10 PB/hour
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
Data-Intensive Science
3
Data-intensive science
Observational Simulation
One (virtual) instrument
Multiple instruments
✤ Gene Sequencing
✤ Personalized Medicine
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
Data-Intensive Science
4
Data-intensive science
Observational Simulation
One (virtual) instrument
Multiple instruments
✤ Climate Modeling
✤ Turbulent Combustion Flow
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
What’s the Big Deal with Big Data?
• Featured on the cover of Nature and the Economist!
5
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
What’s the Big Deal with Big Data?
• And even has a Dilbert Cartoon!
6
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
7
Big Data Definition - The three Vs
• Volume - size does matter!
• Velocity - data at speed, i.e., the data “fire-hose”
• Variety - heterogeneity is the rule
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
8
Five more Vs • Variability - rapid change of data characteristics
over time
• Veracity - ability to handle uncertainty, inconsistency, etc
• Visibility – protect privacy and provide security
• Value – usefulness & ability to find the right hay-colored needle in the haystack
• Voracity - strong appetite for data!
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
9
Enter Moore’s Law
[ Wikipedia Image ]
Moore's law is the observation that, over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years. The law is named after Gordon E. Moore, co-founder of Intel Corporation, who described the trend in his 1965 paper.
Source: http://en.wikipedia.org/wiki/Moore's_law
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
10
Enter Bezos’ Law
Photo: http://www.slashgear.com/google-data-center-hd-photos-hit-where-the-internet-lives-gallery-17252451/
Bezos' law is the observation that, over the history of cloud, a unit of computing power price is reduced by 50% approximately every 3 years
Source: http://blog.appzero.com/blog/futureofcloud
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
11
Storage capacity increase
0 1000 2000 3000 4000 5000 6000 7000
HDD Capacity (GB)
[ Wikipedia Data ]
Insert other exponentially increasing graphs here (e.g., data generation rates, world-wide smartphone access rates,
Internet of Things, …)
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
12
But
• Human processing capacity remains roughly the same!
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
13
We refer to this as the:
Big Data – Same Humans Problem
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
About the ADMT Lab • Directed by:
• Panos K. Chrysanthis
• Alexandros Labrinidis
• Established in 1995 • Currently: 5 PhD students
• Our “slogan”: User-centric data management for network-centric applications
14
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
Look at the entire data lifecycle
15
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
AQSIOS - A DSMS Architecture
Query optim
izer Data stream sources
Q1
Q2
Q3
Query networks
Cont. queries
Stream applications
Administrator Set the delay targets
and priorities for queries
Scheduler Statistics collector
Load Manager
AQSIOS is the DSMS prototype developed at our ADMT Lab. It is built on top of the STREAM prototype from Stanford.
16
AQS IOS
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
DILoS evaluation – QoS and QoD
Average response time (ms) Average data loss (%) Class 1 Class 2 Class 3 Class 1 Class 2 Class 3
No load manager 3.40 3.53 56541.69 0 0 0
Common load manager 3.00 3.13 517.07 11.42 11.43 11.60
Per-class load manager 3.55 3.75 492.84 0 0 35.95
DILoS 4.28 4.38 42.95 0 0 0
17
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
Style of research • Emphasis on systems and algorithms • Building real systems
• Often based on academic prototypes (e.g., Stream from Stanford) or on top of well-known open-source software (e.g., Storm)
• Experimenting using real systems and simulation • Comparing alternatives
• Should we do grouping of queries in way A or way B? • If we do 4 different optimizations, what is the relative benefit of
each one? • In which cases would a certain algorithm be better than another?
18
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
Types of projects for undergrads • Upcoming:
• web-based user interface to visualize run-time behavior of a real system
• Past: • clustering of tweets • web-based interfaces to different database back-ends • REST APIs for remote data access • application to coordinate supernovae observations • monitoring application for transient astronomical events
19
© 2014 Alexandros Labrinidis, University of Pittsburgh October 8, 2014
More info • [1] The Beckman Report on Database Research
By Abadi et al, October 2013 http://beckman.cs.wisc.edu
• [2] Big Data and Its Technical Challenges By Jagadish, Gehrke, Labrinidis, Papakonstantinou, Patel, Ramakrishnan, and Shahabi, Communications of the ACM, July 2014 http://bit.ly/bigdatachallenges (over 4,500 downloads)
• [3] Contact me: http://labrinidis.cs.pitt.edu/contact
20
top related