Download - Data Tools and the Data Scientist Shortage
1 © Cloudera, Inc. All rights reserved.
Data Tools and the Data Scien;st Shortage Wes McKinney @wesmckinn Data Summit @ Web Summit 2015-‐11-‐04
4 © Cloudera, Inc. All rights reserved.
hMps://hbr.org/2012/10/data-‐scien;st-‐the-‐sexiest-‐job-‐of-‐the-‐21st-‐century/
5 © Cloudera, Inc. All rights reserved.
hMp://www.bloomberg.com/news/ar;cles/2015-‐06-‐04/help-‐wanted-‐black-‐belts-‐in-‐data
6 © Cloudera, Inc. All rights reserved.
“The United States alone faces a shortage of 140,000 to 190,000 people with analy;cal exper;se and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.”
McKinsey & Co
hMp://www.mckinsey.com/features/big_data
7 © Cloudera, Inc. All rights reserved. Source: Drew Conway, “The Data Science Venn Diagram”
Tradi;onal view of Data Science
8 © Cloudera, Inc. All rights reserved. Analyzing the Analyzers, Harris, Murphy, Vaisman
Many Kinds of “Data People”
9 © Cloudera, Inc. All rights reserved. Analyzing the Analyzers, Harris, Murphy, Vaisman
Many Kinds of “Data People”
12 © Cloudera, Inc. All rights reserved.
The “Great Decoupling” for Industry Analy;cs UI
ComputeStorage
13 © Cloudera, Inc. All rights reserved.
The “Great Decoupling” for Industry Analy;cs
UI
ComputeStorage
Accumula;on of user ;me
Legacy technology: ver;cally-‐integrated solu;ons
14 © Cloudera, Inc. All rights reserved.
Ubiquitous Real-‐Time Storage and Compute: A view from 2040
15 © Cloudera, Inc. All rights reserved.
Data analysis hierarchy of needs
Data Storage / Access
Clean Data
Analysis and Visualization
Productivity tools / UI
20 © Cloudera, Inc. All rights reserved.
Execu;ng data science languages in the compute layer
UIIbis, SQL, Spark API, …
ComputeAnalytic SQL, Spark, MapReduce
StorageHDFS, Kudu, HBase
Python, R, Julia, …?