big data - umesh bellur
TRANSCRIPT
![Page 1: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/1.jpg)
Not Only Big Data
Prof. Umesh Bellur Department of Computer Science
The Indian Institute of Technology (IIT) Bombay India
But FAST
![Page 2: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/2.jpg)
What’s Big Data? No single definition; here is one from Wikipedia:
• “…difficult to process using on-hand database
management tools or traditional data processing applications. “
• This is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.”
2
![Page 3: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/3.jpg)
The Vs of Big Data
3
![Page 4: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/4.jpg)
12+ TBs
of tweet data every day
25+ TBs of log data
every day
? TB
s o
f d
ata
ever
y d
ay
2+ billion
people on the Web
by end 2011
30 billion RFID
tags today (1.3B in 2005)
4.6 billion
camera phones
world wide
100s of millions
of GPS enabled
devices sold annually
76 million smart meters
in 2009… 200M by 2014
Volume
![Page 5: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/5.jpg)
Variety - A Single perspective of the Digital Universe
Customer
Social Media
Gaming
Entertain
Banking Finance
Our
Known History
Purchase
![Page 6: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/6.jpg)
Velocity (Speed)
• Data is being generated fast and need to be processed fast
• Online Data Analytics
• Late decisions missing opportunities
• Examples – E-Promotions: Based on your current location, your purchase history,
what you like send promotions right now for store next to you
– Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction
6
![Page 7: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/7.jpg)
Motivational Use Cases
Customer
Influence Behavior
Product Recommendations that are Relevant
& Compelling
Friend Invitations to join a
Game or Activity that expands
business
Preventing Fraud as it is Occurring
& preventing more proactively
Learning why Customers Switch to competitors
and their offers; in time to Counter
Improving the Marketing
Effectiveness of a Promotion while it
is still in Play
![Page 8: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/8.jpg)
“Fast” in Smart Grids
An electricity network that can intelligently integrate the actions of all users connected to it (generators, consumers and those that do both) in order to efficiently deliver sustainable, economic and secure electricity supplies
![Page 9: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/9.jpg)
No longer just an experiment!
Estimated investments of ~ 60-75 Billion Euro by 2020
![Page 10: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/10.jpg)
Hinges on
• Real time decision making to route energy from producers to consumers
• Based on fine-grained energy demand predictions.
• Millions of events a second have to be processed “on the fly” – A Billion events per day (10000 smart plugs, per
second readings)
![Page 11: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/11.jpg)
Another Motivational Angle for
“Fast”
Performance of disks:
1987 2004 Increase
CPU Performance 1 MIPS 2,000,000 MIPS 2,000,000 x
Memory Size 16 Kbytes 32 Gbytes 2,000,000 x
Memory Performance 100 usec 2 nsec 50,000 x
Disc Drive Capacity 20 Mbytes 300 Gbytes 15,000 x
Disc Drive Performance 60 msec 5.3 msec 11 x
Source: Seagate Technology Paper: ” Economies of Capacity and Speed:
Choosing the most cost-effective disc drive size and RPM to meet IT requirements” Memory I/O is much faster
than disk I/O!
11
![Page 12: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/12.jpg)
Processing Fast Data
• Streams of data that must be processed in one pass in real time: – No random access allowed. – Continuous – Massive – Unbounded – May be dense or sparse – Event arrive faster than can be “mined” – Uncertainty – missing values
Lack of a real time response may be either life threatening or result in large revenue losses
![Page 13: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/13.jpg)
Challenges
• Time/Space constrained – Not enough memory – Can’t afford storing/revisiting the data
• Single pass computation
– External memory algorithms for handling data sets larger than main memory cannot be used.
• Do not support continuous queries • Too slow real-time response
• Noise – Missing data is a common feature – Outliers – Aged (Stale) data
![Page 14: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/14.jpg)
So…..
• No time to stop and smell the roses
• Only one chance to look at the data…
![Page 15: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/15.jpg)
Harnessing Big Data – the Evolution
• OLTP: Online Transaction Processing (DBMSs)
• OLAP: Online Analytical Processing (Data Warehousing)
• RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
15
![Page 16: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/16.jpg)
DBMS vs. DSMS
Query Processing Continuous Query (CQ) Result
Query Processing
Main Memory Data Stream(s) Data Stream(s)
Disk
Main Memory
SQL Query Result
16
Transient Continuous queries Bounded memory Real time requirements
Persistent relations (relatively static,
stored)
Random access
“Unbounded” disk store
Only current state matters
No real-time services
![Page 17: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/17.jpg)
Synopsis • Random sampling • Histograms • Wavelets
Aging • Sliding Window
Techniques
Stream Processing
• Temporal and spatial operators
• Distributed Complex event processing
Approximations • Deterministic
bounds • Probabilistic
bounds
Technical Aspects of DSMS
![Page 18: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/18.jpg)
Maturity Model
Monitoring
Insights
Process Optimization
Data Monetization
Metamorphosis
![Page 19: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/19.jpg)
(Role of) Standards in Big Data Adoption
• OGC Standards – SOS – Sensor Observation Service
• IEEE Big Data Initiative (BDI) – Metadata standards for Big data management – Verticals – Healthcare, energy etc.
• ISO/IEC CD 20546 – Big Data Vocabulary
• NIST Public working group on Big Data • ITU-T Technology Watch report on Big Data • …
![Page 20: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/20.jpg)
Summary
• Fast data processing is fundamentally different from Big data processing
– DSMS Vs Hadoop/Data Warehousing etc.
• More and more applications having real time needs.
• While there are some solutions, wide open space for research and technological innovation.
– Role of standards cannot be emphasized enough
![Page 22: Big Data - Umesh Bellur](https://reader030.vdocuments.us/reader030/viewer/2022021422/58ee000c1a28ab95108b45b7/html5/thumbnails/22.jpg)
NIST Reference Architecture for Big Data