data science for the internet of things (ibm analytics) presentation at the chief data scientist,...
TRANSCRIPT
1
Data Science for the Internet of Things: Creating Explosive Disruption Sam Lightstone Distinguished Engineer IBM Analytics
Agenda • A new IBM and the era of data • Watson Data Pla5orm • Data Science Experience • dashDB Cloud Data Warehouse • DataConfluence: Data Science for Internet of Things
2
Finding Innovative Cancer Cures with Genomic Medicine 800 billion Base pairs of DNA to analyze one brain tumor
23 million Medical research articles with relevant findings
14.1 million Cancer patients each year, 8.2 million deaths
6
IBM dashDB Cloud data warehouse
10
1. Fast data analyDcs – Extreme speed 2. Load-‐and-‐go simplicity. A fully managed cloud service 3. In-‐database analyDcs for R, SpaDal, PredicDve 4. Cu[ng edge technology. Columnar, Vectorized, In-‐memory
opDmized, analyDcs on compressed data
IBM dashDB cloud data warehouse
11
Scale from megabytes to petabytes
MPP Scale-‐out of dashDB with CPU-‐op9mized column store
CPUs CPUs CPUs CPUs
Columnar AcceleraDon Dynamic In-‐Memory Processing
CPUs CPUs CPUs CPUs
Columnar AcceleraDon Dynamic In-‐Memory Processing
Columnar Columnar Columnar Columnar Columnar Columnar Columnar Columnar
Server #1 Server #2
CPUs CPUs CPUs CPUs
Columnar AcceleraDon Dynamic In-‐Memory Processing
Columnar Columnar Columnar Columnar
Server #3
Data shard
Data shard
Data shard
Data shard
Data shard
Data shard
Data shard
Data shard
Data shard
Data shard
Data shard
Data shard
18
Cisco believes the IoT market could generate
$14.4 trillion IDC predicts that IoT will Generate nearly
$9 trillion in annual sales by 2020
Opportuni9es for IoT Data Science: Trucking Systems Example
Leading provider of enterprise socware primarily to transportaDon and logisDcs operaDons. Need to run analyDcs on the fleet per region in a given week or month. Examples of analyDcs that they cannot easily obtain today • Average idle Dme of the fleet • Average miles and fuel usage • Most problemaDc metrics by region Devices • In cab devices reading sensors • Cell phone apps
19
22
DATACONFLUENCEThe Extreme Distributed Processing Service For Data Science & AnalyDcs
Introducing…
0 200 400 600 800
1,000 1,200 1,400 1,600 1,800 2,000 2,200 2,400 2,600 2,800 3,000 3,200 3,400
AnalyDcs at the Edge Hive & MapReduce Data Confluence
In this early experiment we study the performance of a aggregaDon query over a constellaDon of 24 devices, holding real-‐world data from electrical solar panels in MySQL.
DataConfluence Early Performance Study Electrical solar panel data on 24 Raspberry Pi, with MySQL databases.
23
83x
Query 2: Six aggregates and grouping on 2.5 years of data
Execu9
on Tim
e (s)
21x
The Power of Many Together
• Video of constellaDon growing to 349 Nodes.
• Network stays compact. • 2 and 10 links between nodes • No manual configuraDon.
• Actual system test performed by Emerging Technology Services, IBM Hursley, United Kingdom
24
Real world demonstra9on … • Worldwide firsts
– True Bluemix service for AnalyDcs over distributed data
– R Studio query over distributed IoT data – Spark and Jupyter notebooks on IoT data
• The setup – 24 Raspberry Pis with real-‐world data from
the electrical output of mulDple solar panels (at Bob’s house).
– Format: MySQL database
• The data – 41 Solar panels – 2 ½ Year worth of data – 1500 data point per panel per day
25
Simplify. Use DataConfluence whenever you want to obtain data analyDcs on mulDple data sources.
26
Cl ick. Deploy. Query. Visual ize.
27
Jan 2017 Ready for trials! • Scale to 10,000 data sources
• Query paradigms: Spark, SQL, R, Python
• Data sources: Supports JDBC sources, Text, Excel
• OperaDng systems: Linux, Windows, Android, iOS
January 2017Interested? Email us!
info@data-‐confluence.com
Make the most of your data Watson Data PlaAorm – a ubiquitous data pla5orm that fuels the CogniDve Era
1. dashDB -‐ Scale your Data Science to terabytes and petabytes
2. Data Science Experience – Collaborate, and leverage leading open source technologies for Data Science
3. DataConfluence – Run Data Science analyDcs on distributed data, including massively distributed IoT data
28