turbo-charge your analytics with ibm netezza and revolution r enterprise: a step-by-step approach...
DESCRIPTION
Everyone involved in high-stakes analytics wants power, speed and flexibility regardless of the size of the data set and complexity of the analysis. Trailblazing organizations that have deployed IBM Netezza Analytics with their IBM Netezza data warehouse appliances (TwinFin) with Revolution R Enterprise are getting all three.TRANSCRIPT
![Page 1: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/1.jpg)
© 2012 IBM Corporation1
Revolution Confidential
Revolution R Enterprise for IBM Netezza
![Page 2: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/2.jpg)
© 2012 IBM Corporation2
Revolution ConfidentialIBM Netezza with Revolution Analytics
High-performance, in-database analytics platform for Big Data– Massively parallel processing delivers 10-100x performance– Run analytics in-database and eliminate data movement– Scalable architecture fosters experimentation
Innovation with Advanced Analytics– Analytic modeling with most current statistical methods and 2,500+
open source packages Enterprise ready advanced analytics software, services &
support – Security, IDE, training, professional services– Web Services stack enables integration with front-end
presentation layer
![Page 3: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/3.jpg)
© 2012 IBM CorporationMarch 1, 2012
Revolution Analytics
![Page 4: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/4.jpg)
© 2012 IBM Corporation4
Revolution ConfidentialWhat is R?
Data analysis software A programming language
– Development platform designed by and for statisticians– Object-oriented: vector, matrix, model, …– Built-in libraries of algorithms
An environment– Huge library of algorithms for data access, data manipulation, analysis
and graphics An open-source software project
– Free, open, and active A community
– Thousands of contributors, 2 million users– Resources and help in every domain
Download the White Paper
R is Hotbit.ly/r-is-hot
![Page 5: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/5.jpg)
Revolution Confidential
The professor who invented analytic software for the experts now wants to take it to the masses
Most advanced statistical analysis software available
Half the cost of commercial alternatives
2M+ Users
2,500+ Applications
Statistics
Predictive Analytics
Data Mining
Visualization
Finance
Life Sciences
Manufacturing
Retail
Telecom
Social Media
Government
5
Power
Productivity
Enterprise Readiness
![Page 6: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/6.jpg)
Revolution Confidential
R evolution R E nterpris e has the Open-S ource R E ngine at the core
2,500 community packages and growing exponentially
6
R Engine Language Libraries
Open Source R Packages
Technical Support
Web ServicesAPI
Big DataAnalysis
RevolutionProductivity
Environment
BuildAssurance
ParallelTools
Multi-ThreadedMath Libraries
TechnologyPartners
![Page 7: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/7.jpg)
© 2012 IBM CorporationMarch 1, 2012
Working with Revolution R Enterprise for IBM Netezza
![Page 8: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/8.jpg)
© 2012 IBM Corporation8
Revolution ConfidentialRevolution R Enterprise for IBM Netezzainside the IBM Netezza Architecture
IBM Netezza Analytics
![Page 9: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/9.jpg)
© 2012 IBM Corporation9
Revolution ConfidentialIn-Database Paradigms for using R
In-database Scoring– Family of apply functions which score
analytic models by using data parallelism
– Underlying truism is that there is a fact that can be applied across all data
Big Data Analytics – Family of parallelized, in-database
analytics that have R wrappers and work on entire data set
– Underlying truism exists across all data
Grouped by Row (tapply)– Data and Task Parallelism
• Data flow technique to apply analytics to naturally occurring groups of data using non-parallelized analytics
– Underlying relationship in data is by a group
Examples
– Customer lifetime value– Credit score– Affinity– Good stock/bad stock
Big data analytics– Clustering of all data to determine
groupings– Models that are apply across a whole
data set – decision trees– Data transformation – variable
selection, correlationGroup \
– Forecasting – by store, stock symbol, etc.
– Build model for each customer or product or etc.
![Page 10: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/10.jpg)
© 2012 IBM Corporation10
Revolution ConfidentialAccess In-Database Language Support from R
SQL Java
PythonC
Fortran C++
![Page 11: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/11.jpg)
© 2012 IBM Corporation11
Revolution ConfidentialOpen Source R Package Support
Vertical• Econometrics • Experimental Design• Computational Physics• Clinical Trials• Environmetrics• Finance• Genetics• Medical Imaging • Pharmacokinetics• Phylogenetics• Psychometrics• Social Sciences
Horizontal• Bayesian
• Cluster • Distributions• Graphics• Graphical Models• Machine Learning• Multivariate • Natural Language Processing• Optimization• Robust Statistical Metrics• Spatial• Survival Analysis• Time Series
2500+ community packages
![Page 12: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/12.jpg)
© 2012 IBM Corporation12
Revolution ConfidentialUsing Revolution R Enterprise with IBM Netezza
R Packages integrate and push analytics processing
in-database
Revolution R Enterprise - Workstation
HTTP
Revolution R Enterprise - Server
RevoDeployR Server Web Services Interface for R
Business Intelligence, Excel or Third-Party Application
HostIBM Netezza Analytics
S-BladeIBM Netezza Analytics
S-BladeIBM Netezza Analytics
S-BladeIBM Netezza Analytics
S-BladeIBM Netezza Analytics
S-BladeIBM Netezza Analytics
RODBC &
nzODBC
RODBC &
nzODBC
![Page 13: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/13.jpg)
© 2012 IBM Corporation13
Revolution ConfidentialDeploying Revolution R Enterprise to IBM Netezza
•Remote terminal connection to Host•Create your R Script•Compile and Register your R Script as an AE (UDAP)•Execute SQL that will invoke the registered AE•Go back Revolution R Client to retrieve results and continue additional analysis
HostIBM Netezza Analytics
S-BladeIBM Netezza Analytics
S-BladeIBM Netezza Analytics
S-BladeIBM Netezza Analytics
S-BladeIBM Netezza Analytics
S-BladeIBM Netezza Analytics
![Page 14: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/14.jpg)
© 2012 IBM Corporation14
Revolution ConfidentialRevolution R Enterprise Client Configuration
Revolution R Enterprise– Productivity Environment
Netezza ODBC Drivers ‘nz’ R Packages
– nzA, nzR, nzMatrix
R Package Dependencies– RODBC– caTools– Tree– Bitops– E1071– Rgl– Ca– MASS– XML
![Page 15: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/15.jpg)
© 2012 IBM Corporation15
Revolution ConfidentialIBM Netezza In-Database Analytics from Revolution R
nzRPackage
Encapsulate database and expose “R”-like constructs
R data.frame = database tableApply an R function to a row of data or grouped rows of data
nzA Package
Entry point to the nzAnalytics
Explicitly parallelized algorithms that run in
database
nzMatrixPackage
Encapsulation of Matrices and operations in Database
nz.matrix construct in R to access matrices in the
database
R operations on nz.matrix translate to
matrix stored procedure operations
![Page 16: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/16.jpg)
© 2012 IBM Corporation16
Revolution ConfidentialnzR Package
Basic Functions Sample CodeDatabase Connection nzConnect
nzConnectDSN
SQL Execution nzQuery, nzScalarQuery nzDeleteTable
Data Management as.nz.data.frame nz.data.frame
Apply an R function nzApplynzTApply nzGroupedApply
R Package Management nzInstallPackages nzIsPackageInstalled
#load packages
library(nzr)
#connect to a database via ODBCnzConnect("admin", "xyz", "127.0.0.1", "iclasstest")
#load the iris tablenzdf <- nz.data.frame("iris")
#run a nzTApply against the nz dataframefun <- function(x) max(x[,1])nzTApply(nzdf, nzdf[,5], fun)
![Page 17: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/17.jpg)
© 2012 IBM Corporation17
Revolution ConfidentialnzA Package
Data ManipulationMoments nz.moments
Quantiles nz.quantile, nz.quartile
Outlier Detection nz.outliers
Frequency Table nz.bitable
Histogram nz.hist
Pearson's Correlation nz.corr
Spearman's Correlation nz.spearman.corr, nz.spearman.corr.s
Covariance nz.cov, nz.cov.matrix
Mutual Information nz.mutualinfo
Chi-Square Test nzChisq.test, nz.chisq.test
t -Test t.ls.test, t.me.test, t.pmd.test, t.umd.test
Mann-Whitney-Wilcoxon Test nz.mww.test
Wilcoxon Test nz.wilcoxon.test
Canonical Correlation nz.canonical.corr
One-Way ANOVA nzAnova, nz.anova.CRD.test, nz.anova.RBD.test
Principal Component Analysis nzPCA
Tree-Shaped Bayesian Networks nz.TBNet Apply, nz.TBNet Grow, nz.BigBNControl, nz.TBNet1g2p, nz.TBNet1g,nz.TBNet2g
![Page 18: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/18.jpg)
© 2012 IBM Corporation18
Revolution ConfidentialnzA Package
Data Transformations
Model Diagnostics
Discretization nz.efdisc, nz.emdisc, nz.ewdisc
Standardization and Normalization nz.std.norm
Data Imputation nz.impute.data
Misclassification Error nz.cerror
Confusion Matrix nz.acc, nz.CMATRIX STATS
Mean Absolute Error nz.mae
Mean Square Error nz.mse
Relative Absolute Error nz.rae
Percentage Split nz.percentage.split
Cross-Validation nz.cross.validation
![Page 19: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/19.jpg)
© 2012 IBM Corporation19
Revolution ConfidentialnzA Package
Classification
Regression
Clustering
Associative Rule Mining
Naive Bayes nzNaiveBayes, nz.naivebayes,nz.predict.naivebayes
Decision Trees nzDecTree, nz.dectree, nz.grow.dectree,nz.print.dectree,nz.prune.dectree,nz.predict.dectree
Nearest Neighbors nz.knn
Linear Regression nzLm
Regression Trees nzRegTree, nz.regtree, nz.grow.regtree, nz.print.regtree, nz.predict.regtree
K-Means Clustering nzKMeans, nz.kmeans, nz.predict.kmeans
Divisive Clustering nz.divcluster, nz.predict.divcluster
FP-Growth nz.fpgrowth, nz.prepare.fpgrowth
![Page 20: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/20.jpg)
© 2012 IBM Corporation20
Revolution ConfidentialnzMatrix Package
Data ManipulationCoerce or point to a nz.matrix as.nz.matrix, as.nz.matrix.matrix, nz.matrixCombine Matrices nzCBind, nzRBindCreate Matrices From Tables nzCreateMatrixFromTable, nzCreateTableFromMatrixCreate Special Matrices nzIdentityMatrix, nzNormalMatrix, nzOnesMatrix,
nzRandomMatrix, nzVecToDiagDecomposition nzSVD, svd, nzEigenDelete Matrices nzDeleteMatrix, nzDeleteMatrixByNameDimensions dim, NCOL, ncol, NROW, nrowMathematical Functions abs, add, aubtr, ceiling, div, exp, floor, ln, log10, mod,
mult, nzPowerMatrix, pow, rounding, sqrt, truncMatrix Engine Initialization nzMatrixEngineInitializationMatrix Info is.nz.matrix, isSparse, nzExistMatrix, nzExistMatrixByName,
nzGetValidMatrixNameOperators *, +, -, <, ==, >, nzKronecker, nzPMax, nzPMin, nzSetValue,
[, scale, tPrinting Matrices print.nz.matrixSolve nzInv, nzSolve, nzSolveLLSSparse Matrices isSparse, nzSparse2matrixSummaries
nzAll, nzAny, nzMax, nzMin, nzSsq, nzSum, nzTr
![Page 21: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/21.jpg)
© 2012 IBM CorporationMarch 1, 2012
DemonstrationUsing Revolution R with IBM Netezza
![Page 22: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/22.jpg)
Revolution Confidential
Turbo-C harge Your A nalytics with IB M Netezza and R evolution R E nterpris e
P res ented by:
Derek M Norton, S enior S ales E ngineer
![Page 23: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/23.jpg)
Revolution ConfidentialUs e C as e – C redit R is k
We have a dataset comprised of individuals and their credit risk stored on the Netezza Appliance
The goal is to model if someone is “approvable” for a loan. This use case will follow a modeling process
(though condensed) from start to finish. I will discuss each of the parts and at the end
there will be a demo of the code
![Page 24: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/24.jpg)
Revolution ConfidentialModeling E xerc is e
1. Learning more about the data2. Prepare the data for modeling3. Fit models to the data4. Model Performance
![Page 25: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/25.jpg)
Revolution Confidential1. L earning more about the data
Connect to the IBM Netezza appliance Summarize the data Visualize the data
Continuous Variable
x
Freq
uenc
y
0 5 10 15 20 25
050
100
150
200
250
300
High School Diploma Bachelors Degree Masters Degree Professional Degree PhD
Discrete Varible
050
100
150
200
250
300
![Page 26: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/26.jpg)
Revolution Confidential2. P repare the data for modeling
Split the data in to 70/30 Training/Test sets Transform some variables Discretize numeric variables for later use
![Page 27: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/27.jpg)
Revolution Confidential3. F it models to the data
Build two different models to predict if an individual is “approvable” Decision Tree Naïve Bayes
![Page 28: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/28.jpg)
Revolution Confidential4. Model P erformanc e
Examine confusion matrices to determine: Training performance Test performance
![Page 29: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/29.jpg)
Revolution ConfidentialDemo
![Page 30: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/30.jpg)
© 2012 IBM Corporation9
Summary Familiar environment for R Developers
– World-class productivity tools– Enterprise class service, support and integration
Execution of analytics in-database – Analytic computing distributed across Netezza nodes and run
in a massively parallel manner– Each Netezza node gets a data slice and analytics are pushed
down from the Host to the individual nodes Capabilities
– R Code executed on Netezza nodes in row-by-row fashion or on groups of rows
– Enables access to explicitly parallelized algorithms running on entire data set
– Large-scale parallel matrix operations on database tables Performance
– 10-100x Performance improvements
![Page 31: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A Step-by-Step Approach for Acceleration and Innovation](https://reader035.vdocuments.us/reader035/viewer/2022062303/554e8fc2b4c905fc368b4b6d/html5/thumbnails/31.jpg)
Revolution ConfidentialC ontac t Us
Derek NortonSolutions ExecutiveRevolution [email protected]
www.revolutionanalytics.com +1 (650) 646 9545 Twitter: @RevolutionR
Bill ZanineBusiness Solutions Executive, Analytics Solutions IBM [email protected]