Download - IBM InterConnect 2016 - 3505 - Cloud-Based Analytics of The Weather Company in IBM Bluemix
Cloud-Based Analytics of The Weather Company in IBM Bluemix
Session # 3505
Torsten Steinbach @torsstei
Please Note:• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole
discretion.
• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a
controlled environment. The actual throughput or performance that any user will experience will vary
depending upon many factors, including considerations such as the amount of multiprogramming in the
user’s job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results similar to those stated
here.
2
dashDB
3
IBM dashDB – A unified data warehouse to provide best economics and flexibility to support your business
4
dashDB – DWaaS(public cloud)
Appliance(on-prem)Netezza
dashDB SDE (SW-Defined-Environment) (**new**)
(on-prem / private-cloud)
Loading Weather Data
S3
Swift
Populating dashDB with Data
dashDB
Geodata in Esri
ShapefilesOn Premise Databases
Mobile App Data in Cloudant
GeoJSON
The Weather Company
CSVs
Open Data
BluemixCloud Storage
data.gc.ca, data.gov, data.gov.uk, datahub.io, openAFRICA
IBM Insights for Weather Service in Bluemix
2
The Weather Company API• API Docs: http://bit.ly/1LAvNxb /weather/v2/geocode/observations/historical,json
• Contact:Andy Rice [email protected].
The Weather Company Data Loader Bluemix App
2
R & Python for dashDB
dashDB
Predictive Analytics With R In dashDB 1/3
• Built-in R runtime & R Studio
• ibmdbR package– Data frames logically representing data physically residing in dashDB tables
> con <- idaConnect("BLUDB", "", "")> idaInit(con)> sysusage<-ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> systems<-ida.data.frame('DB2INST1.SHOWCASE_SYSTEMS')> systypes<-ida.data.frame('DB2INST1.SHOWCASE_SYSTYPES’)
– Push down of R data preparation to dashDB> sysusage2 <- sysusage[sysusage$MEMUSED>50000,c("MEMUSED","USERS")]> mergedSys<-idaMerge(systems, systypes, by='TYPEID')> mergedUsage<-idaMerge(sysusage2, mergedSys, by='SID’)
– Push down of analytic algorithms to in-db execution> lm1 <- idaLm(MEMUSED~USERS, mergedUsage)
R Runtime
Browser
Any R Runtime
ibmdbR
ibmdbR
RStudioREST Client
REST
Predictive Analytics With R In dashDB 2/3– Dynamite-native implementation of statistical functions
• colnames, cor, cov, dim, head, length, max, mean, min, names, print, sd, summary, var– Logically derived columns pushed down to Dynamite
> myDF <- ida.data.frame('DB2INST1.SHOWCASE_SYSUSAGE')> myDF$MemPerUser <- myDF$MEMUSED / myDF$USERS
– Sampling of tables in Dynamite> idaSample(myDF, 3) SID DATE USERS MEMUSED ALERT MemPerUser1 8 2014-02-14 23:39:00.000000 34 5015 f 1472 5 2014-01-22 07:52:00.000000 96 11512 f 1193 7 2013-09-12 05:17:00.000000 39 5592 t 143
– Statistics about tables in Dynamite> summary(myDF) SID USERS MEMUSED ALERT MemPerUser Min. :0.000 Min. : 3.000 Min. : 350.000 f :3655563 Min. :105.000 1st Qu.:2.000 1st Qu.: 35.000 1st Qu.: 5113.000 t :1344437 1st Qu.:135.000 Median :4.500 Median : 64.000 Median : 9455.000 NA's: NA Median :150.000 Mean : NA Mean : NA Mean : NA Mean : NA 3rd Qu.:7.000 3rd Qu.:111.000 3rd Qu.:16517.000 3rd Qu.:165.000 Max. :9.000 Max. :347.000 Max. :62379.000 Max. :209.000
– Statistics about categorical values> idaTable(myDF)ALERT
f t 3655563 1344437
Predictive Analytics With R In dashDB 3/3– Store R objects in Dynamite database
> myPrivateObjects <- ida.list(type='private’)
> myPrivateObjects['series100'] <- 1:100
> x <- myPrivateObjects['series100’]
> X [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
> names(myPrivateObjects) [1] "series100”
> myPrivateObjects['series100'] <- NULL
– Manage Dynamite tables> idaExistTable('DB2INST1.SHOWCASE_SYSUSAGE') [1] TRUE
> idaShowTables() Schema Name Owner Type
1 BLUADMIN R_OBJECTS_PRIVATE BLUADMIN T 2 BLUADMIN R_OBJECTS_PRIVATE_META BLUADMIN T 3 BLUADMIN R_OBJECTS_PUBLIC BLUADMIN T
> myView <- idaCreateView(myDF)> idaIsView(myView) [1] TRUE
> idaDropView(myView)
> idaIsView(myView) [1] FALSE
Running R in dashDB via REST API http://ibm.biz/dashdbapi
– Create you R script with RStudio• Storing it in home dir inside dashDB
– POST <dashdb-server>/dashdb-api/rscript/<fileName>• Run the specified stored R script
– POST <dashdb-server>/dashdb-api/rscript/• Run ad-hoc R code specified in request parameter “rScriptBody”
– GET <dashdb-server>/dashdb-api/home• List all files under user home (recursively)
– E.g. list the output written by your R script
– GET <dashdb-server>/dashdb-api/home/<fileName>• Download the specified file
14
Publishing R Analytics via API in dashDB
Publishing R Analytics as Web App in dashDB
dashDB
Predictive Analytics With Python In dashDB• Bluemix Analytic Notebooks
• ibmdbPy package– https://pypi.python.org/pypi/ibmdbpy– Data frames logically representing data physically residing in dashDB tables
from ibmdbpy import IdaDataFrameidadf = IdaDataFrame(idadb, "IRIS", indexer = "ID")idadf = idadf[["ID","sepal_length", "sepal_width"]]idadf['new'] = idadf['sepal_width'] + idadf['sepal_length'].mean()idadf.head()
– Push down of analytic algorithms to in-db executionfrom ibmdbpy.learn import KMeanskmeans = KMeans(3) # clustering with 3 clusterskmeans.fit_predict(idadf).head()
Analytics for Spark Notebook in
BluemixBrowser
Any Python Runtime
ibmdbPy
ibmdbPy
Your own R Dev Env for dashDB – Ready to Run• Either use readily available RStudio inside dashDB• Or set up ready to run RStudio docker container for dashDB
https://github.com/ibmdbanalytics/dashdb_analytic_tools
18
1. Share informationCurrent blogs*: • Hybrid data warehouse by Sam Lightstone:
http://ibm.co/1WgFA0O • Using Docker containers by Mitesh Shah:
http://bit.ly/1TgH1yv
2. Learn Docker & dashDB• Watch this Docker video:
https://www.youtube.com/watch?v=OsOLF3_fotM
• Watch this dashDB overview video: https://www.youtube.com/watch?v=mLeNvqAZ8FM&feature=youtu.be
• Learn about dashDB technology at dashDB.com
3. Register clients for the preview• Registration page: bit.ly/DDB-PRV • Docker hub page containing installation and
get started information for dashDB SDE: bit.ly/DDB-DKR
* Follow on Twitter @IBMDataWH to be notified of the growing list of blogs as they are published.
How to onboard for the dashDB SDE Preview