ibm interconnect 2016 - 3505 - cloud-based analytics of the weather company in ibm bluemix

19
Cloud-Based Analytics of The Weather Company in IBM Bluemix Session # 3505 Torsten Steinbach @torsstei

Upload: torsten-steinbach

Post on 12-Jan-2017

22 views

Category:

Software


1 download

TRANSCRIPT

Page 1: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Cloud-Based Analytics of The Weather Company in IBM Bluemix

Session # 3505

Torsten Steinbach @torsstei

Page 2: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Please Note:• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole

discretion.

• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a

controlled environment. The actual throughput or performance that any user will experience will vary

depending upon many factors, including considerations such as the amount of multiprogramming in the

user’s job stream, the I/O configuration, the storage configuration, and the workload processed.

Therefore, no assurance can be given that an individual user will achieve results similar to those stated

here.

2

Page 3: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

dashDB

3

Page 4: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

IBM dashDB – A unified data warehouse to provide best economics and flexibility to support your business

4

dashDB – DWaaS(public cloud)

Appliance(on-prem)Netezza

dashDB SDE (SW-Defined-Environment) (**new**)

(on-prem / private-cloud)

Page 5: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Loading Weather Data

Page 6: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

S3

Swift

Populating dashDB with Data

dashDB

Geodata in Esri

ShapefilesOn Premise Databases

Mobile App Data in Cloudant

GeoJSON

Twitter

The Weather Company

CSVs

Open Data

BluemixCloud Storage

data.gc.ca, data.gov, data.gov.uk, datahub.io, openAFRICA

Page 7: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

IBM Insights for Weather Service in Bluemix

2

Page 8: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

The Weather Company API• API Docs: http://bit.ly/1LAvNxb /weather/v2/geocode/observations/historical,json

• Contact:Andy Rice [email protected].

Page 9: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

The Weather Company Data Loader Bluemix App

2

Page 10: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

R & Python for dashDB

Page 11: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

dashDB

Predictive Analytics With R In dashDB 1/3

• Built-in R runtime & R Studio

• ibmdbR package– Data frames logically representing data physically residing in dashDB tables

> con <- idaConnect("BLUDB", "", "")> idaInit(con)> sysusage<-ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> systems<-ida.data.frame('DB2INST1.SHOWCASE_SYSTEMS')> systypes<-ida.data.frame('DB2INST1.SHOWCASE_SYSTYPES’)

– Push down of R data preparation to dashDB> sysusage2 <- sysusage[sysusage$MEMUSED>50000,c("MEMUSED","USERS")]> mergedSys<-idaMerge(systems, systypes, by='TYPEID')> mergedUsage<-idaMerge(sysusage2, mergedSys, by='SID’)

– Push down of analytic algorithms to in-db execution> lm1 <- idaLm(MEMUSED~USERS, mergedUsage)

R Runtime

Browser

Any R Runtime

ibmdbR

ibmdbR

RStudioREST Client

REST

Page 12: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Predictive Analytics With R In dashDB 2/3– Dynamite-native implementation of statistical functions

• colnames, cor, cov, dim, head, length, max, mean, min, names, print, sd, summary, var– Logically derived columns pushed down to Dynamite

> myDF <- ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> myDF$MemPerUser <- myDF$MEMUSED / myDF$USERS

– Sampling of tables in Dynamite> idaSample(myDF, 3) SID DATE USERS MEMUSED ALERT MemPerUser1 8 2014-02-14 23:39:00.000000 34 5015 f 1472 5 2014-01-22 07:52:00.000000 96 11512 f 1193 7 2013-09-12 05:17:00.000000 39 5592 t 143

– Statistics about tables in Dynamite> summary(myDF) SID USERS MEMUSED ALERT MemPerUser Min. :0.000 Min. : 3.000 Min. : 350.000 f :3655563 Min. :105.000 1st Qu.:2.000 1st Qu.: 35.000 1st Qu.: 5113.000 t :1344437 1st Qu.:135.000 Median :4.500 Median : 64.000 Median : 9455.000 NA's: NA Median :150.000 Mean : NA Mean : NA Mean : NA Mean : NA 3rd Qu.:7.000 3rd Qu.:111.000 3rd Qu.:16517.000 3rd Qu.:165.000 Max. :9.000 Max. :347.000 Max. :62379.000 Max. :209.000

– Statistics about categorical values> idaTable(myDF)ALERT

f t 3655563 1344437

Page 13: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Predictive Analytics With R In dashDB 3/3– Store R objects in Dynamite database

> myPrivateObjects <- ida.list(type='private’)

> myPrivateObjects['series100'] <- 1:100

> x <- myPrivateObjects['series100’]

> X [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

> names(myPrivateObjects) [1] "series100”

> myPrivateObjects['series100'] <- NULL

– Manage Dynamite tables> idaExistTable('DB2INST1.SHOWCASE_SYSUSAGE') [1] TRUE

> idaShowTables() Schema Name Owner Type

1 BLUADMIN R_OBJECTS_PRIVATE BLUADMIN T 2 BLUADMIN R_OBJECTS_PRIVATE_META BLUADMIN T 3 BLUADMIN R_OBJECTS_PUBLIC BLUADMIN T

> myView <- idaCreateView(myDF)> idaIsView(myView) [1] TRUE

> idaDropView(myView)

> idaIsView(myView) [1] FALSE

Page 14: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Running R in dashDB via REST API http://ibm.biz/dashdbapi

– Create you R script with RStudio• Storing it in home dir inside dashDB

– POST <dashdb-server>/dashdb-api/rscript/<fileName>• Run the specified stored R script

– POST <dashdb-server>/dashdb-api/rscript/• Run ad-hoc R code specified in request parameter “rScriptBody”

– GET <dashdb-server>/dashdb-api/home• List all files under user home (recursively)

– E.g. list the output written by your R script

– GET <dashdb-server>/dashdb-api/home/<fileName>• Download the specified file

14

Page 15: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Publishing R Analytics via API in dashDB

Page 16: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Publishing R Analytics as Web App in dashDB

Page 17: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

dashDB

Predictive Analytics With Python In dashDB• Bluemix Analytic Notebooks

• ibmdbPy package– https://pypi.python.org/pypi/ibmdbpy– Data frames logically representing data physically residing in dashDB tables

from ibmdbpy import IdaDataFrameidadf = IdaDataFrame(idadb, "IRIS", indexer = "ID")idadf = idadf[["ID","sepal_length", "sepal_width"]]idadf['new'] = idadf['sepal_width'] + idadf['sepal_length'].mean()idadf.head()

– Push down of analytic algorithms to in-db executionfrom ibmdbpy.learn import KMeanskmeans = KMeans(3) # clustering with 3 clusterskmeans.fit_predict(idadf).head()

Analytics for Spark Notebook in

BluemixBrowser

Any Python Runtime

ibmdbPy

ibmdbPy

Page 18: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

Your own R Dev Env for dashDB – Ready to Run• Either use readily available RStudio inside dashDB• Or set up ready to run RStudio docker container for dashDB

https://github.com/ibmdbanalytics/dashdb_analytic_tools

18

Page 19: IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix

1. Share informationCurrent blogs*: • Hybrid data warehouse by Sam Lightstone:

http://ibm.co/1WgFA0O • Using Docker containers by Mitesh Shah:

http://bit.ly/1TgH1yv

2. Learn Docker & dashDB• Watch this Docker video:

https://www.youtube.com/watch?v=OsOLF3_fotM

• Watch this dashDB overview video: https://www.youtube.com/watch?v=mLeNvqAZ8FM&feature=youtu.be

• Learn about dashDB technology at dashDB.com

3. Register clients for the preview• Registration page: bit.ly/DDB-PRV • Docker hub page containing installation and

get started information for dashDB SDE: bit.ly/DDB-DKR

* Follow on Twitter @IBMDataWH to be notified of the growing list of blogs as they are published.

How to onboard for the dashDB SDE Preview