data tools and the data scientist shortage

22
1 © Cloudera, Inc. All rights reserved. Data Tools and the Data Scien;st Shortage Wes McKinney @wesmckinn Data Summit @ Web Summit 20151104

Upload: wes-mckinney

Post on 15-Apr-2017

2.670 views

Category:

Technology


1 download

TRANSCRIPT

1  ©  Cloudera,  Inc.  All  rights  reserved.  

Data  Tools  and  the  Data  Scien;st  Shortage  Wes  McKinney  @wesmckinn  Data  Summit  @  Web  Summit  2015-­‐11-­‐04  

2  ©  Cloudera,  Inc.  All  rights  reserved.  

Me  

3  ©  Cloudera,  Inc.  All  rights  reserved.  

Career  theme:  Serial  creator  of  data  tools  

4  ©  Cloudera,  Inc.  All  rights  reserved.  

hMps://hbr.org/2012/10/data-­‐scien;st-­‐the-­‐sexiest-­‐job-­‐of-­‐the-­‐21st-­‐century/  

5  ©  Cloudera,  Inc.  All  rights  reserved.  

hMp://www.bloomberg.com/news/ar;cles/2015-­‐06-­‐04/help-­‐wanted-­‐black-­‐belts-­‐in-­‐data  

6  ©  Cloudera,  Inc.  All  rights  reserved.  

“The  United  States  alone  faces  a  shortage  of  140,000  to  190,000  people  with  analy;cal  exper;se  and  1.5  million  managers  and  analysts  with  the  skills  to  understand  and  make  decisions  based  on  the  analysis  of  big  data.”    

McKinsey  &  Co  

hMp://www.mckinsey.com/features/big_data  

7  ©  Cloudera,  Inc.  All  rights  reserved.  Source:  Drew  Conway,  “The  Data  Science  Venn  Diagram”  

Tradi;onal  view  of  Data  Science  

8  ©  Cloudera,  Inc.  All  rights  reserved.  Analyzing  the  Analyzers,  Harris,  Murphy,  Vaisman  

Many  Kinds  of  “Data  People”  

9  ©  Cloudera,  Inc.  All  rights  reserved.  Analyzing  the  Analyzers,  Harris,  Murphy,  Vaisman  

Many  Kinds  of  “Data  People”  

10  ©  Cloudera,  Inc.  All  rights  reserved.  

Addressing  the  analy;cal  shortage  

Educa;on   Culture   Tools  

11  ©  Cloudera,  Inc.  All  rights  reserved.  

Data  process  

12  ©  Cloudera,  Inc.  All  rights  reserved.  

The  “Great  Decoupling”  for  Industry  Analy;cs  UI

ComputeStorage

13  ©  Cloudera,  Inc.  All  rights  reserved.  

The  “Great  Decoupling”  for  Industry  Analy;cs  

UI

ComputeStorage

Accumula;on  of  user  ;me  

Legacy  technology:  ver;cally-­‐integrated  solu;ons  

14  ©  Cloudera,  Inc.  All  rights  reserved.  

Ubiquitous  Real-­‐Time  Storage  and  Compute:  A  view  from  2040  

15  ©  Cloudera,  Inc.  All  rights  reserved.  

Data  analysis  hierarchy  of  needs  

Data Storage / Access

Clean Data

Analysis and Visualization

Productivity tools / UI

16  ©  Cloudera,  Inc.  All  rights  reserved.  

Some  data  tooling  UI  innova;ons  

17  ©  Cloudera,  Inc.  All  rights  reserved.  

Rejec;ng  the  “Highlander  Fallacy”  

18  ©  Cloudera,  Inc.  All  rights  reserved.  

SQL  Programming:  the  “mainframe  punch  cards”  of  our  ;me  

19  ©  Cloudera,  Inc.  All  rights  reserved.  

Many  SQL  engines  

…  and  more  

20  ©  Cloudera,  Inc.  All  rights  reserved.  

Execu;ng  data  science  languages  in  the  compute  layer  

UIIbis, SQL, Spark API, …

ComputeAnalytic SQL, Spark, MapReduce

StorageHDFS, Kudu, HBase

Python, R, Julia, …?

21  ©  Cloudera,  Inc.  All  rights  reserved.  

22  ©  Cloudera,  Inc.  All  rights  reserved.  

Thank  you  Wes  McKinney  @wesmckinn  Views  are  my  own