revolution r - 100% r and more

34
Revolution Confidential R evolution R : 100% R and More Presented by: David Smith VP Marketing and Community R evolution Analytics

Upload: revolution-analytics

Post on 10-May-2015

5.626 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Revolution R - 100% R and More

Revolution Confidential

R evolution R :100% R and More

P res ented by:David S mithV P Marketing and C ommunityR evolution A nalytic s

Page 2: Revolution R - 100% R and More

Revolution Confidential

P oll Ques tion

Which stats package do you use most?

Page 3: Revolution R - 100% R and More

Revolution ConfidentialF ebruary 22, 2011: Welc ome!

Thanks for coming. Slides and replay available (soon) at: http://bit.ly/z9xUG9

David SmithVP Marketing & Community, Revolution AnalyticsEditor, Revolutions blog

http://blog.revolutionanalytics.comTwitter: @revodavid

3

Page 4: Revolution R - 100% R and More

Revolution ConfidentialIn today’s webc as t:

About Revolution Analytics and R

What Revolution R adds to R

Resources for getting more from R

Q&A

4Introducing Revolution R

Page 5: Revolution R - 100% R and More

Revolution ConfidentialWhat is R ?

Data analysis software A programming language Development platform designed by and for statisticians

An environment Huge library of algorithms for data access, data

manipulation, analysis and graphics An open-source software project Free, open, and active

A community Thousands of contributors, 2 million users Resources and help in every domain

5

Download the White Paper

R is Hotbit.ly/r-is-hot

Page 6: Revolution R - 100% R and More

Revolution Confidential

Source: http://r4stats.com/popularity 6

R is exploding in popularity and func tionality

Stata 10%

S-Plus 0%

SPSS -27%

SAS -11%

R 46%

Scholarly ActivityGoogle Scholar hits (’05-’09 CAGR)

20102008200620042002

Package GrowthNumber of R packages listed on CRAN

“A key benefit of R is that it provides near-instant availability of new and

experimental methods created by its user base — without waiting for the

development/release cycle of commercial software. SAS recognizes the value of R

to our customer base…”

Product Marketing Manager SAS Institute, Inc.

“I’ve been astonished by the rate at which R has been adopted. Four years ago,

everyone in my economics department [at the University of Chicago] was using

Stata; now, as far as I can tell, R is the standard tool, and students learn it first.”

Deputy Editor for New Products at Forbes

Page 7: Revolution R - 100% R and More

Revolution Confidential“ R is the mos t powerful & flexible s tatis tic al programming language in the world” 1

Capabilities Sophisticated

statistical analyses Predictive analytics Data visualization

Applications Real-time trading Finance Risk assessment Forecasting Bio-technology Drug development Social networks .. and more

1. Norman Nie, multiple interviews 7

15

20

25

30

MSFT [2009-

Last 29.29

Yonnie
Stamp
Page 8: Revolution R - 100% R and More

R Us er C ommunityFrom: The R Ecosystem

bit.ly/R-ecosystem

8

Page 9: Revolution R - 100% R and More

Revolution Confidential

P oll Ques tion

If you're not using R today, what would you most like to use R for?

Page 10: Revolution R - 100% R and More

Revolution ConfidentialR evolution R E nterpris e is

10

Page 11: Revolution R - 100% R and More

Revolution ConfidentialR P roduc tivity E nvironment (Windows )

11

Script with type ahead and code

snippetsSolutions window

for organizing code and data

Packages installed and

loaded

Objects loaded in the

R Environment

Object details

Sophisticated debugging with

breakpoints , variable values etc.

http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm

Page 12: Revolution R - 100% R and More

Revolution ConfidentialInterac tive Debugging

One-click to set a breakpoint in an R script Step in/out/over, inspect variables Eliminate the edit -> browser -> repair cycle

12

Page 13: Revolution R - 100% R and More

Revolution ConfidentialP erformance: Multi-threaded Math

13

OpenSource R

Revolution R Enterprise

Computation (4-core laptop) Open Source R Revolution R Speedup

Linear Algebra1

Matrix Multiply 327 sec 13.4 sec 23x

Cholesky Factorization 31.3 sec 1.8 sec 17x

Linear Discriminant Analysis 216 sec 74.6 sec 2x

General R Benchmarks2

R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x

R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable

1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php2. http://r.research.att.com/benchmarks/

Page 14: Revolution R - 100% R and More

Revolution ConfidentialT hree P aradigms for B ig Data

Standard R engine is constrained by capacity and performance

Revolution R Enterprise offers three methods for big data with R: Off-line: high-performance file-based analytics Off-line, parallel & distributed analytics On-line, in-database analytics Hadoop Netezza

14

Page 15: Revolution R - 100% R and More

Revolution Confidential

R evolution R E nterpris e with R evoS caleRB ig Data S tatis tics in R

15

www.revolutionanalytics.com/bigdata

Every US airline departure and arrival, 1987-2008

File: AirlineData87to08.xdfRows: 123.5 millionVariables: 29Size on disk: 13.2Gb

arrDelayLm2 <- rxLinMod(ArrDelay ~ DayOfWeek:F(CRSDepTime),cube=TRUE)

Page 16: Revolution R - 100% R and More

Revolution ConfidentialR evoS c aleR : B ig Data algorithms

Data processing (rxDataStep) Descriptive statistics (rxSummary) Tables and cubes (rxCube, rxCrossTabs) Correlations/covariances (rxCovCor, rxCor,

rxCov, rxSSCP) Linear regressions (rxLinMod) Logistic regressions (rxLogit) K means clustering (rxKmeans) Predictions (scoring) (rxPredict) Custom distributed computing (RxExec)

Revolution R Enterprise 16

Page 17: Revolution R - 100% R and More

Revolution Confidential

Compute Node

(RevoScaleR)

Compute Node

(RevoScaleR) Master Node

(RevoScaleR)

DataPartition

DataPartition

Compute Node

(RevoScaleR)

Compute Node

(RevoScaleR)

DataPartition

DataPartition

• Portions of the data source are made available to each compute node

• RevoScaleR on the master node assigns a task to each compute node

• Each compute node independently processes its data, and returns its intermediate results back to the master node

• master node aggregates all of the intermediate results from each compute node and produces the final result

R evoS c aleR – Dis tributed C omputing

17

*Available now for Microsoft HPC ServerVideo demo: http://bit.ly/ugQ9KR

Page 18: Revolution R - 100% R and More

Revolution ConfidentialP latform-agnos tic B ig Data A nalytic s Set “compute context” to define hardware (one line of code)

Native job-scheduler handles distribution, monitoring, failover etc.

Same code runs on other supported architectures Just change compute context

Supported architectures: Windows: Microsoft HPC Server Linux: Platform Computing LSF (coming 2012)

18

42 seconds instead of 6 minutes

Page 19: Revolution R - 100% R and More

Revolution Confidential

Hadoop File Based In-database

A common analytic platform acros s big data architectures

19

Page 20: Revolution R - 100% R and More

Revolution ConfidentialIn-Databas e E xecution with IB M Netezza

20

More info: http://bit.ly/R-Netezza

Page 21: Revolution R - 100% R and More

Revolution ConfidentialR and Hadoop Hadoop offers a scalable infrastructure for

processing massive amounts of data Storage – HDFS, HBASE Distributed Computing - MapReduce

R is a statistical programming language for developing advanced analytic applications

Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, …

The Rhadoop project makes it possible to write Big Data algorithms for Hadoop using the R language alone.

21

Page 22: Revolution R - 100% R and More

Revolution ConfidentialR evoC onnec tR for Hadoop

22

Revolution R Client

R

Map or Reduce

Job Tracker

Task Node

HDFS

HBASE

Thrift

rhdfs - R and HDFS rhbase - R and HBASE rmr - R and MapReduce

Write Map-Reduce analytics using only R code with these R packages:

rmr

rhdfs rhbase

More information at:bit.ly/r-hadoop

Page 23: Revolution R - 100% R and More

Revolution Confidential

E nterpris e R eadines s : R evolution R E nterpris e S erver

Multi-User Support Production Applications

Integrate R analytics into Web based applications Data Analysis and Visualization Reporting Dashboards Interactive applications

Revolution R Enterprise Server with RevoDeployR

23

Page 24: Revolution R - 100% R and More

Revolution Confidential

24

E nterpris e-Wide Deployment Research and Development

Excel BIWeb AppRevoDeployR Server

Web Services API

Management Console

Revolution R Enterprise Server+ Hadoop+ IBM Netezza+ Windows HPC Server cluster

Data Scientists / Modelers

Production

Analysts / Corporate Users

End-User Deployment

Page 25: Revolution R - 100% R and More

Revolution ConfidentialOn-Demand A nalytics with R evoDeployR

25

Page 26: Revolution R - 100% R and More

Revolution ConfidentialT he A dvanc ed A nalytic s S tac k

Deployment / Consumption

Advanced Analytics

ETL

Data / Infrastructure

“Open Analytics Stack” White Paper: bit.ly/lC43Kw26

Page 27: Revolution R - 100% R and More

Revolution Confidential

On-Call Technical Support Consulting Migration | Analytics | Applications | Validation

Training R | Revolution R | Statistical Topics

Systems Integration BI | ERP | Databases | Cloud

27

Page 28: Revolution R - 100% R and More

Revolution Confidential

Wrapping Up

Page 29: Revolution R - 100% R and More

Revolution ConfidentialWhy R ?

29

Every data analysis technique at your fingertips Create beautiful and unique data visualizations Get better results faster Draw on the talents of data scientists worldwide R is hot, and growing fast

Page 30: Revolution R - 100% R and More

Revolution ConfidentialR evolution R E nterpris e

30

High-performance R for multiprocessor systemsModern Integrated Development EnvironmentStatistical Analysis of Terabyte-Class Data Sets In-database R analytics with Hadoop and NetezzaDeploy R Applications via Web ServicesTelephone and email technical supportTraining and consulting services100% compatible with R packages

Production-Grade Statistical Analysis for the Workplace

Page 31: Revolution R - 100% R and More

Revolution ConfidentialR evolution R E nterpris e: F ree to A cademia

Personal use Research Teaching Package development

31

Free Academic Downloadwww.revolutionanalytics.com/downloads/free-academic.php

Discounted Technical Support Subscriptions Available

Page 32: Revolution R - 100% R and More

Revolution ConfidentialT hank You!

Download slides, replay http://bit.ly/z9xUG9

Learn more about Revolution R revolutionanalytics.com/products

Contact Revolution Analytics http://bit.ly/hey-revo

32

Feb 29: Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise

A Step-by-Step Approach for Acceleration and Innovation, presented by William Zanine (IBM Analytics Solutions).

www.revolutionanalytics.com/news-events/free-webinars

Page 33: Revolution R - 100% R and More

Revolution Confidential

P oll Ques tion

What interests you most about Revolution R Enterprise?

Page 34: Revolution R - 100% R and More

Revolution Confidential

34

The leading commercial provider of software and support for the popular open source R statistics language.

www.revolutionanalytics.com+1 (650) 646 9545

Twitter: @RevolutionR