data science for the internet of things (ibm analytics) presentation at the chief data scientist,...

29
1 Data Science for the Internet of Things: Creating Explosive Disruption Sam Lightstone Distinguished Engineer IBM Analytics

Upload: corinium-coriniumglobal

Post on 13-Apr-2017

508 views

Category:

Data & Analytics


0 download

TRANSCRIPT

1  

Data Science for the Internet of Things: Creating Explosive Disruption Sam Lightstone Distinguished Engineer IBM Analytics

Agenda  •  A  new  IBM  and  the  era  of  data  •  Watson  Data  Pla5orm  •  Data  Science  Experience  •  dashDB  Cloud  Data  Warehouse  •  DataConfluence:  Data  Science  for  Internet  of  Things  

2  

©2016 IBM Corporation 3

Data  is  the  basis    of  compeDDve  advantage  

The  world’s  largest    taxi  company    

owns  NO  vehicles.

5  

The  world’s  largest  accommoda;ons  provider    owns  no  real  estate.

Finding Innovative Cancer Cures with Genomic Medicine 800 billion Base pairs of DNA to analyze one brain tumor

23 million Medical research articles with relevant findings

14.1 million Cancer patients each year, 8.2 million deaths

6  

7   Visit:  hGps://youtu.be/aWShHDhF8Yo  

The  future  belongs  to  data  scien9sts  

Data  is  the  new  basis  of  compe99ve  value  

Retail Banking

Oil & Gas

Healthcare

Watson  Data  PlaAorm  &  Data  Science  Experience  

9  Visit:  hGps://youtu.be/QzxAgwzx7P8    

IBM  dashDB    Cloud  data  warehouse

10  

1.  Fast  data  analyDcs  –  Extreme  speed  2.  Load-­‐and-­‐go  simplicity.  A  fully  managed  cloud  service    3.  In-­‐database  analyDcs  for  R,  SpaDal,  PredicDve  4.  Cu[ng  edge  technology.    Columnar,  Vectorized,  In-­‐memory  

opDmized,  analyDcs  on  compressed  data    

IBM  dashDB  cloud  data  warehouse

11  

Scale  from  megabytes  to  petabytes  

MPP  Scale-­‐out  of  dashDB  with  CPU-­‐op9mized  column  store  

CPUs  CPUs  CPUs  CPUs  

Columnar  AcceleraDon    Dynamic  In-­‐Memory  Processing  

CPUs  CPUs  CPUs  CPUs  

Columnar  AcceleraDon    Dynamic  In-­‐Memory  Processing  

Columnar Columnar Columnar Columnar Columnar Columnar Columnar Columnar

Server  #1   Server  #2  

CPUs  CPUs  CPUs  CPUs  

Columnar  AcceleraDon    Dynamic  In-­‐Memory  Processing  

Columnar Columnar Columnar Columnar

Server  #3  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

Data  shard  

dashDB  on  AWS  vs  Compe9tor  Iden9cal  hardware    

12  

A  liOle  Spark  in  every  dashDB  Data  Science  and  Machine  Learning  

14  

VISIT:  dashDB.com  

Data  Science  &    The  Internet  of  Things  

16  

17  

18  

Cisco  believes  the  IoT  market  could  generate  

$14.4  trillion    IDC  predicts  that  IoT  will  Generate  nearly  

$9  trillion  in  annual  sales  by  2020  

Opportuni9es  for  IoT  Data  Science:  Trucking  Systems  Example  

Leading  provider  of  enterprise  socware  primarily  to  transportaDon  and  logisDcs  operaDons.    Need  to  run  analyDcs  on  the  fleet  per  region  in  a  given  week  or  month.  Examples  of  analyDcs  that  they  cannot  easily  obtain  today  •  Average  idle  Dme  of  the  fleet  •  Average  miles  and  fuel  usage    •  Most  problemaDc  metrics  by  region    Devices  •  In  cab  devices  reading  sensors  •  Cell  phone  apps  

19  

20  

21  

22  

DATACONFLUENCEThe  Extreme  Distributed  Processing  Service  For  Data  Science  &  AnalyDcs    

Introducing…  

0  200  400  600  800  

1,000  1,200  1,400  1,600  1,800  2,000  2,200  2,400  2,600  2,800  3,000  3,200  3,400  

AnalyDcs  at  the  Edge   Hive  &  MapReduce   Data  Confluence  

In  this  early  experiment  we  study  the  performance  of  a  aggregaDon  query  over  a  constellaDon  of  24  devices,  holding  real-­‐world  data  from  electrical  solar  panels  in  MySQL.      

DataConfluence  Early  Performance  Study    Electrical  solar  panel  data  on  24  Raspberry  Pi,  with  MySQL  databases.  

23  

83x  

Query  2:  Six  aggregates  and  grouping  on  2.5  years  of  data  

Execu9

on  Tim

e  (s)  

21x  

The  Power  of  Many  Together  

•  Video  of  constellaDon  growing  to  349  Nodes.  

•  Network  stays  compact.  •  2  and  10  links  between  nodes  •  No  manual  configuraDon.  

•  Actual  system  test  performed  by  Emerging  Technology  Services,  IBM  Hursley,  United  Kingdom  

24  

Real  world  demonstra9on  …  •  Worldwide  firsts    

–  True  Bluemix  service    for  AnalyDcs  over  distributed  data  

–  R  Studio  query  over  distributed  IoT  data  –  Spark  and  Jupyter  notebooks  on  IoT  data  

•  The  setup  –  24  Raspberry  Pis  with  real-­‐world  data  from  

the  electrical  output  of  mulDple  solar  panels  (at  Bob’s  house).    

–  Format:  MySQL  database  

•  The  data  –  41  Solar  panels  –  2  ½  Year  worth  of  data  –  1500  data  point  per  panel  per  day    

25  

Simplify.      Use  DataConfluence  whenever  you  want  to  obtain  data  analyDcs  on  mulDple  data  sources.    

26  

Cl ick.    Deploy.    Query.    Visual ize.      

27  

Jan  2017  Ready  for  trials!      •  Scale  to  10,000  data  sources  

•  Query  paradigms:    Spark,  SQL,  R,  Python  

•  Data  sources:  Supports  JDBC  sources,  Text,  Excel  

•  OperaDng  systems:  Linux,  Windows,  Android,  iOS  

January 2017Interested?  Email  us!  

                                 info@data-­‐confluence.com      

 

Make  the  most  of  your  data  Watson  Data  PlaAorm  –  a  ubiquitous  data  pla5orm  that  fuels  the  CogniDve  Era  

1.   dashDB    -­‐  Scale  your  Data  Science  to  terabytes  and  petabytes  

2.   Data  Science  Experience  –  Collaborate,  and  leverage  leading  open  source  technologies  for  Data  Science  

3.   DataConfluence  –  Run  Data  Science  analyDcs  on  distributed  data,  including  massively  distributed  IoT  data  

28  

29  

Data Science for the Internet of Things: Creating Explosive Disruption Sam Lightstone Distinguished Engineer IBM Analytics