100% r and more: plus what's new in revolution r enterprise 6.0
DESCRIPTION
R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.VP of Product Development, Dr. Sue Ranney will also provide an overview of the features introduced in Revolution R Enterprise 6.0 including:1. Big Data Generalized Linear Model, the new RevoScaleR function that provides a fast, scalable, distributable implementation of generalized linear models, offering impressive speed-ups relative to glm on in-memory data frames 2. Platform LSF Cluster Support, which allows you to create a distributed compute context for the Platform LSF workload manager3. Azure Burst support added to RxHpcServer 4. Updated R engine (R 2.14.2) 5. Ability to use RevoScaleR analysis functions with non-xdf data sources such as SAS, SPSS or text 6. New methods for RxXdfData data sources including head, tail, names, dim, colnames, length, str, and formula 7. New function rxRoc for generating ROC curvesTRANSCRIPT
Revolution Confidential
Revolution Confidential
R evolution R E nterpris e 6 100% R and More
P res ented by:David S mithV P Marketing and C ommunity
S ue R anneyV P P roduct Management
Revolution Confidential
P oll Ques tion
Which stats package do you use most?
Revolution ConfidentialIn today’s webc as t:
About Open-Source R and Revolution R Enterprise
What’s New in Revolution R Enterprise 6
Resources, Q&A
3
Revolution ConfidentialWhat is R ?
Data analysis software A programming language Development platform designed by and for statisticians
An environment Huge library of algorithms for data access, data
manipulation, analysis and graphics An open-source software project Free, open, and active
A community Thousands of contributors, 2 million users Resources and help in every domain
4
Download the White Paper
R is Hotbit.ly/r-is-hot
Revolution ConfidentialR Us er C ommunityFrom: The R Ecosystem
bit.ly/R-ecosystem
5
Revolution ConfidentialR evolution R E nterpris e is
6
Revolution ConfidentialR P roduc tivity E nvironment (Windows )
7
Script with type ahead and code
snippetsSolutions window
for organizing code and data
Packages installed and
loaded
Objects loaded in the
R Environment
Object details
Sophisticated debugging with
breakpoints , variable values etc.
http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm
Revolution ConfidentialP erformance: Multi-threaded Math
8
OpenSource R
Revolution R Enterprise
Computation (4-core laptop) Open Source R Revolution R Speedup
Linear Algebra1
Matrix Multiply 176 sec 9.3 sec 18x
Cholesky Factorization 25.5 sec 1.3 sec 19x
Linear Discriminant Analysis 189 sec 74 sec 3x
General R Benchmarks2
R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x
R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php2. http://r.research.att.com/benchmarks/
Revolution Confidential
Hadoop File Based Cluster In-database
A common analytic platform acros s big data architectures
9
Revolution Confidential
Compute Node
Compute Node
Master Node
DataPartition
DataPartition
Compute Node
Compute Node
DataPartition
DataPartition
R evoS c aleR on Dis tributed C omputing C lus ters(Windows HP C S erver, P latform L S F )
Data Step, Statistical Summary, Tables/Cubes, Covariance, Linear & Logistic Regression, GLM, K-means clustering, …
10
BIGDATA
Revolution Confidential
S calable dis tributed computing with R evolution R E nterpris e and Hadoop
11
Map-Reduce
RHadoop: http://bit.ly/RHadoop
Revolution ConfidentialIn-Databas e E xecution with IB M Netezza
12
More info: http://bit.ly/R-Netezza
Revolution Confidential
13
E nterpris e-Wide Deployment Research and Development
Excel BIWeb AppRevoDeployR Server
Web Services API
Management Console
Revolution R Enterprise Server+ Hadoop+ IBM Netezza+ Server cluster
Data Scientists / Modelers
Production
Analysts / Corporate Users
End-User Deployment
Revolution Confidential
On-Call Technical Support Consulting Migration | Analytics | Applications | Validation
Training R | Revolution R | Statistical Topics
Systems Integration BI | ERP | Databases | Cloud
www.revolutionanalytics.com/services 14
Revolution ConfidentialOpen-SourceR
RRE6Workstation
RRE6Server
Interface with multiple data sources
Exploratory data analysis
Wide range of statistical methods
Parallel Programming
Multi-threaded performance
Big Data Analytics
Distributed Analytics (Grid / Cluster) Client
Cloud Computing
Hadoop Integration Client
IBM Netezza Integration Client
Multi-user support
Scheduled, monitored batch production
Secure code deployment, management
Integration into Data Apps
15http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php
Why R evolution R ?
Revolution Confidential
P oll Ques tion
What’s most important to you about Revolution R Enterprise?
Revolution Confidential
Revolution Confidential
What’s new inR evolution R E nterpris e 6
P res ented by:S ue R anneyV P P roduct Development
Revolution ConfidentialR evolution R E nterpris e 6
Key Areas of Enhancements Latest stable release of open-source R (2.14.2) High Performance Analytics: Fast, scalable,
distributable, full-featured analysis of huge data sets High Performance Computing: Run arbitrary R
functions in parallel across cores or nodes of a cluster
Revolution ConfidentialR 2.14.2 Incorporation of ‘parallel’ as base package ‘foreach’ users can use doParallel backend Users of RevoScaleR’s ‘rxExec’ HPC function can
use new compute contexts to run arbitrary R functions in parallel Compute context for the ‘parallel’ package Compute context for any ‘foreach’ backend
Standard functions and packages in R are pre-compiled into byte-code using ‘compiler’ package The benefit in speed depends on the specific
function but code’s performance can improve by a factor of 2x times or more.
Revolution Confidential
High P erformance A nalytics (HPA ) in R evoS caleR
High Performance Computing + Data
Full-featured, fast, and scalable analysis functions
Same code works on small and big data
Same code works on a variety of compute contexts - a laptop, server, cluster, or the cloud
Scales approximately linearly with the number of observations – without increasing memory requirements
Revolution R Enterprise 20
Revolution Confidential
Directly A nalyze E xternal Data S ets with R evoS caleR HPA F unctions NE W The RevoScaleR package provides easy ways to
directly access and analyze external data sets (data sources) Delimited ASCII Fixed format ASCII SAS data sets (.sas7bdat) SPSS data sets (.sav) ODBC connections
No need to have SAS or SPSS installed to access data in SAS or SPSS file formats.
Get started on analyses without first importing data
Still have the option of importing into efficient .xdf file format
Revolution R Enterprise 21
Revolution ConfidentialR evoS c aleR : HPA A lgorithms
Descriptive statistics (rxSummary) Tables and cubes (rxCube, rxCrossTabs) Correlations/covariances (rxCovCor, rxCor,
rxCov, rxSSCP) K means clustering (rxKmeans) Linear regressions (rxLinMod) Logistic regressions (rxLogit) Generalized Linear Models (rxGlm) NEW! Predictions (scoring) (rxPredict)
Revolution R Enterprise 22
Revolution ConfidentialTips for Handling B ig Data in R
Use algorithms that process data in chunks. The functions provided with RevoScaleR are
scalable because they process data in ‘chunks.’ If the number of observations doubles, you can still
perform the same data analyses with the same amount of memory – it will just take longer
Use functions optimized for big data The implementations of RevoScaleR analysis
algorithms are all optimized for handling big data. RevoScaleR analysis functions provide significant
speed improvements over alternatives, even if you can fit all of your data in memory.
Revolution R Enterprise 23
Revolution Confidential
24
Revolution ConfidentialB eyond In-Memory Data A nalys is RevoScaleR functions can read from data sets on disk in
chunks, so you can increase the number of observations in the data set beyond what can be analyzed in memory all at once
RevoScaleR analysis functions process chunks of data in parallel, taking greater advantage of your computing resources (Parallel External Memory Algorithms) Multiple cores on a desktop/server Cluster/grids have added advantage of more hard drives
for storing & accessing data Windows HPC Server Cluster “Burst” computations to Azure in the cloud NEW IBM Platform LSF Grid NEW
Revolution R Enterprise 25
Revolution Confidential‘B ig Data’ G eneralized L inear Models NE W
Relaxes the assumptions for a standard linear model. Used in insurance, finance, biotech, and
other industries. Example 1: Count data (Poisson) Number of vehicles an auto policy holder owns Number of credit cards a person holds Number of bacterial colonies in a Petri dish
Revolution R Enterprise 26
Revolution Confidential
G L M: Other E xamples Example 2: Positive values with positive
skew (Gamma) Value of auto insurance claims for claims filed
Example 3: Positive data that also contains exact zeros (Tweedie Model) Data on insured vehicles (claims amount is zero
for many vehicles; range of positive claims values for others) Rainfall data
Revolution R Enterprise 27
Revolution Confidential
Quic k Demo Inc orporating rxG L M Use 5% Sample of the U.S. 2000 Census to
look at annual property insurance premiums Data manipulations: sub-sample data and
modify categorical data Perform summary statistics; draw histogram Estimate a Tweedie model using rxGlm Estimate predictions for targeted demographic
characteristics Visualize the results Analyze bigger model using a cluster
Revolution R Enterprise 28
Revolution ConfidentialC loud C omputing with A zure B urs t NE W
Windows Azure is a cloud platform that enables you to manage computations across a global network of Microsoft-managed datacenters
Revolution R Enterprise 6.0 can burstcomputations to Windows Azure from Windows HPC Server
Particularly suited to parallel HPC such as simulations
29
Revolution ConfidentialA S imple S imulation E xample For each run: Generate data with a known distribution
(Using code that accompanies the article "Pure Premium Regression with the Tweedie Model" by Glenn Meyers, Actuarial Review, May 2009 ) Estimate the model using rxGLM
Compare the means of the estimated coefficients with the known parameters of the underlying distribution
Do a small number of runs locally Do a large number of runs ‘bursting’ to the Azure cloud
(monitor jobs with HPC Job Scheduler, just as with on-premises nodes)
Revolution R Enterprise 30
Revolution Confidential
P oll Ques tion
What new feature of Revolution R Enterprise 6 is most interesting to
you?
Revolution ConfidentialT hank You! Download slides, replay from today’s webinar
http://bit.ly/z9xUG9
Learn more about Revolution R Enterprise Overview: revolutionanalytics.com/products New feature videos:
http://www.revolutionanalytics.com/products/new-features.php
Contact Revolution Analytics http://bit.ly/hey-revo
32
June 28: Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR
David Humke, Vice President, The Northern Trust Company
www.revolutionanalytics.com/news-events/free-webinars
Revolution Confidential
33
The leading commercial provider of software and support for the popular open source R statistics language.
www.revolutionanalytics.com+1 (650) 646 9545
Twitter: @RevolutionR