100% r and more: plus what's new in revolution r enterprise 6.0

33
Revolution Confidential Revolution Confidential R evolution R E nterprise 6 100% R and More Presented by: David Smith VP Marketing and Community S ue R anney VP Product Management

Upload: revolution-analytics

Post on 18-Dec-2014

1.735 views

Category:

Technology


1 download

DESCRIPTION

R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this webinar, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.VP of Product Development, Dr. Sue Ranney will also provide an overview of the features introduced in Revolution R Enterprise 6.0 including:1. Big Data Generalized Linear Model, the new RevoScaleR function that provides a fast, scalable, distributable implementation of generalized linear models, offering impressive speed-ups relative to glm on in-memory data frames 2. Platform LSF Cluster Support, which allows you to create a distributed compute context for the Platform LSF workload manager3. Azure Burst support added to RxHpcServer 4. Updated R engine (R 2.14.2) 5. Ability to use RevoScaleR analysis functions with non-xdf data sources such as SAS, SPSS or text 6. New methods for RxXdfData data sources including head, tail, names, dim, colnames, length, str, and formula 7. New function rxRoc for generating ROC curves

TRANSCRIPT

Page 1: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

Revolution Confidential

R evolution R E nterpris e 6 100% R and More

P res ented by:David S mithV P Marketing and C ommunity

S ue R anneyV P P roduct Management

Page 2: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

P oll Ques tion

Which stats package do you use most?

Page 3: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialIn today’s webc as t:

About Open-Source R and Revolution R Enterprise

What’s New in Revolution R Enterprise 6

Resources, Q&A

3

Page 4: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialWhat is R ?

Data analysis software A programming language Development platform designed by and for statisticians

An environment Huge library of algorithms for data access, data

manipulation, analysis and graphics An open-source software project Free, open, and active

A community Thousands of contributors, 2 million users Resources and help in every domain

4

Download the White Paper

R is Hotbit.ly/r-is-hot

Page 5: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialR Us er C ommunityFrom: The R Ecosystem

bit.ly/R-ecosystem

5

Page 6: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialR evolution R E nterpris e is

6

Page 7: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialR P roduc tivity E nvironment (Windows )

7

Script with type ahead and code

snippetsSolutions window

for organizing code and data

Packages installed and

loaded

Objects loaded in the

R Environment

Object details

Sophisticated debugging with

breakpoints , variable values etc.

http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm

Page 8: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialP erformance: Multi-threaded Math

8

OpenSource R

Revolution R Enterprise

Computation (4-core laptop) Open Source R Revolution R Speedup

Linear Algebra1

Matrix Multiply 176 sec 9.3 sec 18x

Cholesky Factorization 25.5 sec 1.3 sec 19x

Linear Discriminant Analysis 189 sec 74 sec 3x

General R Benchmarks2

R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x

R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable

1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php2. http://r.research.att.com/benchmarks/

Page 9: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

Hadoop File Based Cluster In-database

A common analytic platform acros s big data architectures

9

Page 10: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

Compute Node

Compute Node

Master Node

DataPartition

DataPartition

Compute Node

Compute Node

DataPartition

DataPartition

R evoS c aleR on Dis tributed C omputing C lus ters(Windows HP C S erver, P latform L S F )

Data Step, Statistical Summary, Tables/Cubes, Covariance, Linear & Logistic Regression, GLM, K-means clustering, …

10

BIGDATA

Page 11: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

S calable dis tributed computing with R evolution R E nterpris e and Hadoop

11

Map-Reduce

RHadoop: http://bit.ly/RHadoop

Page 12: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialIn-Databas e E xecution with IB M Netezza

12

More info: http://bit.ly/R-Netezza

Page 13: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

13

E nterpris e-Wide Deployment Research and Development

Excel BIWeb AppRevoDeployR Server

Web Services API

Management Console

Revolution R Enterprise Server+ Hadoop+ IBM Netezza+ Server cluster

Data Scientists / Modelers

Production

Analysts / Corporate Users

End-User Deployment

Page 14: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

On-Call Technical Support Consulting Migration | Analytics | Applications | Validation

Training R | Revolution R | Statistical Topics

Systems Integration BI | ERP | Databases | Cloud

www.revolutionanalytics.com/services 14

Page 15: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialOpen-SourceR

RRE6Workstation

RRE6Server

Interface with multiple data sources

Exploratory data analysis

Wide range of statistical methods

Parallel Programming

Multi-threaded performance

Big Data Analytics

Distributed Analytics (Grid / Cluster) Client

Cloud Computing

Hadoop Integration Client

IBM Netezza Integration Client

Multi-user support

Scheduled, monitored batch production

Secure code deployment, management

Integration into Data Apps

15http://www.revolutionanalytics.com/why-revolution-r/which-r-is-right-for-me.php

Why R evolution R ?

Page 16: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

P oll Ques tion

What’s most important to you about Revolution R Enterprise?

Page 17: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

Revolution Confidential

What’s new inR evolution R E nterpris e 6

P res ented by:S ue R anneyV P P roduct Development

Page 18: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialR evolution R E nterpris e 6

Key Areas of Enhancements Latest stable release of open-source R (2.14.2) High Performance Analytics: Fast, scalable,

distributable, full-featured analysis of huge data sets High Performance Computing: Run arbitrary R

functions in parallel across cores or nodes of a cluster

Page 19: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialR 2.14.2 Incorporation of ‘parallel’ as base package ‘foreach’ users can use doParallel backend Users of RevoScaleR’s ‘rxExec’ HPC function can

use new compute contexts to run arbitrary R functions in parallel Compute context for the ‘parallel’ package Compute context for any ‘foreach’ backend

Standard functions and packages in R are pre-compiled into byte-code using ‘compiler’ package The benefit in speed depends on the specific

function but code’s performance can improve by a factor of 2x times or more.

Page 20: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

High P erformance A nalytics (HPA ) in R evoS caleR

High Performance Computing + Data

Full-featured, fast, and scalable analysis functions

Same code works on small and big data

Same code works on a variety of compute contexts - a laptop, server, cluster, or the cloud

Scales approximately linearly with the number of observations – without increasing memory requirements

Revolution R Enterprise 20

Page 21: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

Directly A nalyze E xternal Data S ets with R evoS caleR HPA F unctions NE W The RevoScaleR package provides easy ways to

directly access and analyze external data sets (data sources) Delimited ASCII Fixed format ASCII SAS data sets (.sas7bdat) SPSS data sets (.sav) ODBC connections

No need to have SAS or SPSS installed to access data in SAS or SPSS file formats.

Get started on analyses without first importing data

Still have the option of importing into efficient .xdf file format

Revolution R Enterprise 21

Page 22: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialR evoS c aleR : HPA A lgorithms

Descriptive statistics (rxSummary) Tables and cubes (rxCube, rxCrossTabs) Correlations/covariances (rxCovCor, rxCor,

rxCov, rxSSCP) K means clustering (rxKmeans) Linear regressions (rxLinMod) Logistic regressions (rxLogit) Generalized Linear Models (rxGlm) NEW! Predictions (scoring) (rxPredict)

Revolution R Enterprise 22

Page 23: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialTips for Handling B ig Data in R

Use algorithms that process data in chunks. The functions provided with RevoScaleR are

scalable because they process data in ‘chunks.’ If the number of observations doubles, you can still

perform the same data analyses with the same amount of memory – it will just take longer

Use functions optimized for big data The implementations of RevoScaleR analysis

algorithms are all optimized for handling big data. RevoScaleR analysis functions provide significant

speed improvements over alternatives, even if you can fit all of your data in memory.

Revolution R Enterprise 23

Page 24: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

24

Page 25: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialB eyond In-Memory Data A nalys is RevoScaleR functions can read from data sets on disk in

chunks, so you can increase the number of observations in the data set beyond what can be analyzed in memory all at once

RevoScaleR analysis functions process chunks of data in parallel, taking greater advantage of your computing resources (Parallel External Memory Algorithms) Multiple cores on a desktop/server Cluster/grids have added advantage of more hard drives

for storing & accessing data Windows HPC Server Cluster “Burst” computations to Azure in the cloud NEW IBM Platform LSF Grid NEW

Revolution R Enterprise 25

Page 26: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential‘B ig Data’ G eneralized L inear Models NE W

Relaxes the assumptions for a standard linear model. Used in insurance, finance, biotech, and

other industries. Example 1: Count data (Poisson) Number of vehicles an auto policy holder owns Number of credit cards a person holds Number of bacterial colonies in a Petri dish

Revolution R Enterprise 26

Page 27: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

G L M: Other E xamples Example 2: Positive values with positive

skew (Gamma) Value of auto insurance claims for claims filed

Example 3: Positive data that also contains exact zeros (Tweedie Model) Data on insured vehicles (claims amount is zero

for many vehicles; range of positive claims values for others) Rainfall data

Revolution R Enterprise 27

Page 28: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

Quic k Demo Inc orporating rxG L M Use 5% Sample of the U.S. 2000 Census to

look at annual property insurance premiums Data manipulations: sub-sample data and

modify categorical data Perform summary statistics; draw histogram Estimate a Tweedie model using rxGlm Estimate predictions for targeted demographic

characteristics Visualize the results Analyze bigger model using a cluster

Revolution R Enterprise 28

Page 29: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialC loud C omputing with A zure B urs t NE W

Windows Azure is a cloud platform that enables you to manage computations across a global network of Microsoft-managed datacenters

Revolution R Enterprise 6.0 can burstcomputations to Windows Azure from Windows HPC Server

Particularly suited to parallel HPC such as simulations

29

Page 30: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialA S imple S imulation E xample For each run: Generate data with a known distribution

(Using code that accompanies the article "Pure Premium Regression with the Tweedie Model" by Glenn Meyers, Actuarial Review, May 2009 ) Estimate the model using rxGLM

Compare the means of the estimated coefficients with the known parameters of the underlying distribution

Do a small number of runs locally Do a large number of runs ‘bursting’ to the Azure cloud

(monitor jobs with HPC Job Scheduler, just as with on-premises nodes)

Revolution R Enterprise 30

Page 31: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

P oll Ques tion

What new feature of Revolution R Enterprise 6 is most interesting to

you?

Page 32: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution ConfidentialT hank You! Download slides, replay from today’s webinar

http://bit.ly/z9xUG9

Learn more about Revolution R Enterprise Overview: revolutionanalytics.com/products New feature videos:

http://www.revolutionanalytics.com/products/new-features.php

Contact Revolution Analytics http://bit.ly/hey-revo

32

June 28: Achieving High-Performing, Simulation-Based Operational Risk Measurement with RevoScaleR

David Humke, Vice President, The Northern Trust Company

www.revolutionanalytics.com/news-events/free-webinars

Page 33: 100% R and More: Plus What's New in Revolution R Enterprise 6.0

Revolution Confidential

33

The leading commercial provider of software and support for the popular open source R statistics language.

www.revolutionanalytics.com+1 (650) 646 9545

Twitter: @RevolutionR