nosql & big data analytics: history, hype, opportunities
DESCRIPTION
Looking at NoSQL and Big Data Analytics as an evolution starting from Relational Databases, and go behind the hype. You can find more on this topic in my blog at: http://innovation-edge.blogspot.com/Thanks to Gregory Piatetsky-Shapiro for the 2nd half of the slides.TRANSCRIPT
1
analyze(NoSQL,BigData);/* history, hype, opportunities */
// By: Vishy Poosala
// Head of Bell Labs, India
// @vishyp
2
The dark ages of COBOL
3
..then Codd saidlet there be tables
Rows & Columns
Normal Forms
ACID
SQL
4
www.data-for-humans.com
WHAT COLUMNS
? SET-VALUED
ATTRIBUTES
Schema Evolution
XML
5
Billions of Keys & Values
Cassandra
Dynamo
Hadoop
Big Table
GFS
6
How would you build a super-fast, FB-scale chat service, in 2012?
(for example)
7
I want my own DB!
• Memcached• redisMain
Memory
• MongoDBDistr.
K-V
• CouchDBVersions
• Neo4jSocial Graphs
8
BIG
Data
Analytics
Language
60’s 80-96
96-’07 ‘07-
KB
FILES
STATS
COBOL
GB
TABLES
OLAPCube
SQL
TB
Semi-Structured
Apps
XML
PB
VarietyDynamic
Mahout
NoSQL
9
Following *AMAZING* Slides Courtesy: Gregory Piatesky-Shapiro, kdnuggets.com
You can find all the slides from his talk at:
http://www.slideshare.net/gpiatetskyshapiro/analytics-and-data-mining-industry-overview
Analyzing Analytics,
Job Trends
10
Data Tsunami
• In 2010 enterprises stored 7 exabytes =7,000,000,000 GB
of new data (McKinsey)• 90 percent of the
world's data has been generated in the past two years (IBM)
Image with apologies to KDD-2011
11
Pre-history
From Google Ngram viewer – English language booksNote: Our analysis uses only English language data. Other languages, especially Chinese , need to be considered for full picture
Statistics is the biggest term in 20th century, but data mining and analytics appears in late 1990s
12
Recent History: Analytics, Data Mining, Knowledge Discovery
Analytics has been used since 1800, but started to rise in 2005Data Mining jumps around 1996 (soon after first KDD conference) but declines after 2003 (TIA controversy, associated with gov. invasion of privacy).Knowledge Discovery appears in 1989, jumps in 1996, and plateaus after 2000
13
Google Trends: After 2006, Data Mining < Analytics
14
Google Insights: searches for data mining, analytics -googleare most popular in India, US
15
Analytics > Data Mining > Data Science
16
Data Science, Big Data
17
Data Types Analyzed/Mined
www.KDnuggets.com/polls/2011/data-types-analyzed-mined.html
18
Largest Dataset Analyzed?2011 median dataset size ~10-20 GB, vs 8-10 GB in 2010.
Increase in10 GB to 1 PB range
www.KDnuggets.com/polls/2011/largest-dataset-analyzed-data-mined.html
19
Which methods/algorithms did you use for data analysis in 2011
Decision Trees
Regression
Clustering
Statistics
Visualization
Time series/Sequence analysis
Support Vector (SVM)
Association rules
Ensemble methods
Text Mining
Neural Nets
Boosting
Bayesian
Bagging
Factor Analysis
Anomaly/Deviation detection
Social Network Analysis
Survival Analysis
Genetic algorithms
Uplift modeling
0% 10% 20% 30% 40% 50% 60% 70%
% analysts who used it
www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html
20
Cloud Analytics is not common (yet)
www.KDnuggets.com/polls/2011/algorithms-analytics-data-mining.html
21
Shortage of Skills
• McKinsey: shortage by 2018 in the US of– 140-190,000 people with deep analytical skills
– 1.5 M managers/analysts with the know-how to use the analysis of big data to make effective decisions.
Source: www.mckinsey.com/mgi/publications/big_data/
22
Job data: Data Scientist
23
Jobs: Data Mining >> Data Scientist
24
“Ground” Analytics (LinkedIn Skills)
~ 75,000 with Data Mining skill
~ 7,000 with Predictive Modeling
Also ~ 20,000 with Predictive Analytics(not related with Predictive Modeling ??
25
Analytics LinkedIn Skills
Machine LearningPredictive Analytics
Text Mining MapRedu
ce
26
Big Data Bubble?
Gartner Hype Cycle
Big Data
27
@vishyp
http://innovation-edge.blogspot.com