big data challenges & opportunities search feeling lucky lei chen internetpictures clips maps...
TRANSCRIPT
![Page 1: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/1.jpg)
BIG DATA Challenges & Opportunities
Search Feeling Lucky
Lei Chen
Internet Pictures Clips Maps News Shop Email more
1
![Page 2: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/2.jpg)
OutlineBackground
Internet Pictures Clips Maps News Shop Email more
“Big data” is term acknowledging the exponential growth, availability and use of …
Challenges“Big data” proposes ground challenges on data capture, storage, analysis …
OpportunitiesMany applications can be benefited from “Big data” …
2
BIG DATA
OutlineBackgroundChallengesOpportunities
![Page 3: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/3.jpg)
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
3
We are capturing more data
Satellite imagery, mobile station, distributed sensor
networks, geographical plotting …
Super exponential growth in data volume
Copyright belongs to “Data Analysis Challenges”, JSR-08-142, Dec
![Page 4: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/4.jpg)
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
4
We are using more data
Intelligent transportation
Digital health care
![Page 5: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/5.jpg)
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
5
We need quick processing of the data
Volcano monitor
Hurricane moving path predication
![Page 6: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/6.jpg)
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
6
We are exploring the unknowns with different means of data measurements
Ocean science
Exploring the universe
![Page 7: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/7.jpg)
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
7
We are discovering new rules from data
The well-formed.eigenfactor project visualizes information flow in science.
This diagram shows the citation links of the journal Nature.
Copyright belongs to http://well-formed.eigenfactor.org
![Page 8: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/8.jpg)
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
8
Defining Big DataWiki: Big data are datasets that grow so large that they become
awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visualizing.
Gartner(2011): Big data is a popular term used to acknowledge
the exponential growth, availability and use of information in the data-rich landscape of tomorrow.
![Page 9: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/9.jpg)
BackgroundInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallengesOpportunities
9
Features of Big Data
3V: Variety, Velocity and Volume
![Page 10: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/10.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATAOutlineBackgroundChallengesOpportunities
Network Topology
<key,vals> Object E-R Hierarchical
Applications
Storage(Reliability, Scalability,
Availability)
Data Model(Interpretation, representation)
Data Processing(Processing lang,
optimization, Visualization)
Data Extraction(Acquisition, Integration,
Representation )
![Page 11: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/11.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
11
Data model challenges
<key,vals> Object E-R Hierarchical
Volume Scale up, scale out, and scale in
Velocity “Interactive” properties to facilitate processing
Variety Simple but unified to adapt heterogeneity
Existing data models are not satisfactoryFunctionality vs. Simplicity
![Page 12: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/12.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
12
Storage challenges
Storage concerns:
• Reliability: data is safe and trustable
• Availability: data is accessible
• Scalability: data operation performance does not decay along with data size growth
However, the CAP theorem is the bottleneck. No one-for-all solution exists
![Page 13: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/13.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
13
CAP Theorem• Consistency• Availability• Partition tolerance
Storage challenges
![Page 14: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/14.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
14
Storage challenges
14
ACID vs. BASE
RDBMS
Atomic
Consistent
Isolated
Durable
NoSQL
Basically AvailableSoft-state
Eventually consistent C
P A
BigTableHyperTableHBaseMongoDBRedisScalaris etc.
RDBMS
DynamoCouchDBCassandraSimpleDBTokyo CabinetRiakVoldemot etc.
![Page 15: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/15.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
15
Management challenges
15
“Solving 'Big Data' Challenge Involves More Than Just Managing Volumes of Data” Gartner(2011)
Big data management
Indexing &Partition
Functionality
Adaption to new requirement and new component
Flexibility
![Page 16: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/16.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
16
Management challenges
16
E.g., Indexing over big data
Volume
Variety
Large volume of data captured very time unit
Requires Distributed adaptive index
Leads to Significant cost on meta data exchange
Data captured from different sources
Requires Distributed adaptive index
Leads to Ambiguity on indexing the same object
![Page 17: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/17.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
17
Challenges on processing
17
• New query language (algebra)
Desired Sacrifices & Overhead
Flexibility Complexity in data modeling
“Relational” supporting Poor scalability
“Uncertain” supporting Poor scalability and significant computing overhead
Scalability Less functionality
Efficiency & Effectiveness Poor scalability
![Page 18: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/18.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
18
Challenges on processing
18
• New computing paradigm for processing
Distributed Computing Paradigm Limitations
Message Passing Poor scalability and fault tolerance
Unified AccessInvalidated efficiency over large computing nodes
MapReduce Poor functionality
![Page 19: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/19.jpg)
ChallengesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
19
Challenges on processing
19
• New optimization methodology
Load Balance Data Locality
High Parallelism Merging Cost
Less Network I/O Replicated Computing
![Page 20: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/20.jpg)
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
20
Why “Big Data”?
20
• We are empowered to learn knowledge and process information more accurately, effectively and efficiently.
Natural Science Study Fundamental Scientific Research
Social Civilization Daily Life
Big Data
![Page 21: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/21.jpg)
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
Big Data for natural science study
• E.g., natural disaster forecasting and management
Flood Earthquake Extreme Weather
Fore-casting
Management
Meteorological dataGeographic data
Population, transportation, urban design data
Economic data
![Page 22: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/22.jpg)
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
22
Big Data for fundamental scientific research
• E.g., Bio informatics and medicine
The mutual promotion relation between the gene technology and the clinical medicine
![Page 23: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/23.jpg)
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
Big Data for social civilization
• Light-speed information spreading & enormous knowledge
Quick events detection
Easy collaboration
Wandering where to get a real good cup of coffee ?
JUST tweet your question!!
![Page 24: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/24.jpg)
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
24
Big Data for daily life
24
• Our life can be much easier more data… E.g., trip planning
Travel to Beijing::Request
3-day stay
Budget< 1000$
Forbidden City
10am Meeting every day
Real world incidents
Traffic jam
Luggage delay
Bad weather
Predefine
Updating
Adaptive agenda
![Page 25: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/25.jpg)
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities
25
Opportunity highlights
25
• Volume o Capture, store and analyze data help us better
understand the world• Velocity
o Guaranteed effective & efficient data processing• Variety
o Handling heterogeneous sources of data
Considering all the challenges and constraints, perhaps there is no one-for-all solution
However, application dependent “Big Data” solutions are promising
![Page 26: BIG DATA Challenges & Opportunities Search Feeling Lucky Lei Chen InternetPictures Clips Maps News Shop Email more 1](https://reader035.vdocuments.us/reader035/viewer/2022062404/5516fcaf550346f5558b4e81/html5/thumbnails/26.jpg)
OpportunitiesInternet Pictures Clips Maps News Shop Email more
BIG DATA
OutlineBackgroundChallenges . Data Model . Storage . Management . ProcessingOpportunities . Applications
Applications
26
Heterogeneous data management• Search doctors • Search universities (undergoing)
…
Data Integration
Data Extraction
~500,000 doctors &~30,000 hospitals
from 50+GB source
OLAP Query Processing
Integrated Database
Web pages on the Internet Hospital databases Search results from
general-
purpose search engines News / rumors
Search Doctors