r for sas users complement or replace two strategies

31
Revolution Confidential SAS: Complement or Replace June, 2013 Nick Barber - Sales Director Andrie de Vries – Business Services Director Revolution Analytics

Upload: revolution-analytics

Post on 26-Jan-2015

108 views

Category:

Technology


1 download

DESCRIPTION

Are you working in a SAS shop but want to add R based analytics to your portfolio? Learn why that is a great idea and how to do it.

TRANSCRIPT

Revolution Confidential

SAS: Complement or Replace

June, 2013

Nick Barber - Sales Director

Andrie de Vries – Business Services Director

Revolution Analytics

Revolution Confidential

Introductions and welcome

2

Andrie de VriesBusiness Services Director, Europe

Nick BarberSales Director - Europe

Revolution Confidential

Strawpoll: experiences with R and SAS?

3

Revolution ConfidentialAgenda

Quick introduction to Revolution Analytics Where does SAS and R fit in the Analytical

Landscape Introduction to R Typical Challenges Facing Analytical Organisations Differences between SAS and Revolution R Big Data Complex Computation Enterprise Readiness Production Efficiency Access to Talent

Conclusions…

4

Revolution ConfidentialCorporate Overview & Quick Facts

Founded 2008 (as REvolutionComputing)

Office Locations Palo Alto (HQ), Seattle (Engineering)SingaporeLondon

CEO David Rich

Number of customers

200+

Investors • Northbridge Venture Partners• Intel Capital• Platform Vendor

Web site: • www.revolutionanalytics.com

Revolution – “Contender” The Forrester Wave™: Big DataPredictive Analytics Solutions, Q1 2013

5

In the big data analytics context, speed and scale are critical drivers of success, and Revolution R delivers on both

Revolution R Enterprise is the leading commercial analytics platform based on the open source R statistical computing language

Revolution Confidential

Consumer & Info SvcsConsumer & Info Svcs

200+ Corporate Customers and growing

6

Finance & InsuranceFinance & Insurance Healthcare & Life SciencesHealthcare & Life Sciences

Manuf & TechManuf & TechAcademic & Gov’tAcademic & Gov’t

Revolution Confidential

Revolution ConfidentialWhere does R fit in the analytical lifecycle

7

Analytical data

Preparation

Analytical data

Exploration

Model Devlopment

Model Deployment

ETLBI /

operations

Opensource R competencies

Open source R is not- ETL- Business reporting tool- An end to end solution such as SAS

Marketing Automation or SAS Fraud Framework

Revolution ConfidentialIs: The way to do statistical computing A full blown programming language The home of every data mining algorithm known to

data science. A vibrant world-wide community

8

R was written in early 1990’s by

Robert Gentleman Ross Ihaka

the evolution of the

Since 1997 a core group of ~ 20

developers guides the evolution of the

language

Revolution ConfidentialTop companies are using R around the world

The NHS uses R to advance patient care and diagnosis The New York Times routinely uses R for interactive and print data

visualization. Ogilvy Europe uses R to analyse digital media campaigns for major

brands Google has more than 500 R users. The FDA supports the use of R for clinical trials of new drugs. The National Weather Service uses R to predict the extent of events. Facebook uses R to model user behaviour. The Consumer Financial Protection Bureau uses R and other open

source tools. Twitter uses R for data science applications on the Twitter database. John Deere uses R to forecast crop yields and optimize tractor

manufacturing.

9

Companies are recognising the additional benefits of R

Revolution ConfidentialIncredible Graphics and Data Visualization lead the way vs SAS

Functions for standard graphs Scatterplot, time series,

histogram, smoothing, … Bar plot, pie chart, dot chart,

… Image plot, 3-D surface, map,

Customize without limits Combine graph types Create entirely new graphics

10

Revolution ConfidentialR is open source and drives analytic innovation but has some limitations for Enterprises

Bigger data sizes 

Speed of analysis 

Production support

Memory Bound Big Data

Single Threaded Scale out, parallel processing, high speed

Community Support Commercial production support

Innovation and scale

Innovative – 4500 packages+, exponential growth

Combines with open source R packages where needed

11

Revolution Confidential

Typical Challenges Facing Analytical Organisations

12

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Revolution ConfidentialLets talk BIG DATA

13

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Revolution ConfidentialHow do SAS and Revolution R stack up for Big Data

Both handle large data sets well (big speed differences….)

Both have high speed database connectors to handle variety / velocity

Object Orientated nature of R handles data manipulation and visualisation in a superior way

Data Step parallel functions (such as merge/sort/cleansing) in Revolution R are available only in SAS HPA environments

RHadoop project (rhbase, rhdfs, rmr) run in-side Hadoop

14

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Revolution ConfidentialLets talk Complex Computation

15

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Revolution ConfidentialHow do SAS and Revolution R stack up for Complex Computation

Innovative Models: More functions available in R

16

Complex Computation • Innovative

models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

0 1,000 2,000 3,000 4,000 5,000

1,192

4,500

R SAS

R 2.15.2 Packages

SAS 9.3 statements, procedures, functions and call routines

Source: http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/

Revolution ConfidentialHow do SAS and Revolution R stack up for Complex Computation

Revolution R runs in parallel across multiple nodes and cores

SAS runs in parallel in SAS Grid multiple jobs, but still single threaded

SAS can run in parallel in SAS HPA

17

Complex Computation at Speed• Innovative

Models• Experimentation• Precision • Many Small

Models• Ensemble

Models• Simulation

Revolution ConfidentialLets talk Enterprise Readiness

18

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Revolution ConfidentialHow do SAS and Revolution R stack up for Enterprise Readiness

Both handle heterogeneous landscapes SAS runs on anything but mostly single threaded apart

from Teradata and Greenplum (no cloud except through own managed services)

Revolution runs across windows/Linux clusters, cores, Hadoop, Amazon Web Services, Microsoft Azure, Netezza and Teradata

SAS Programmers must write code for the required environment, whilst Revolution R code is device independent

Both offer good production support SAS integrates with pretty much all common BI reporting

tools as does Revolution

19

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Revolution ConfidentialLets talk Production Efficiency

20

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Revolution ConfidentialHow do SAS and Revolution R stack up for Speed & Production Efficiency?

21

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-End

Cycle Time• Pace of Decision

Accelerated

*As published by SAS in HPC Wire, April 21, 2011http://www.hpcwire.com/hpcwire/2011-04-19/sas_brings_high_performance_analytics_to_database_appliances.html

Revolution ConfidentialOptions for handling Speed

22

SAS- Normal SAS- Single Threaded

SAS Grid- Platform LSF- Single Threaded

SAS In-Database Scoring- Teradata Accelerator- Greenplum Accelerator

SAS High Performance Computing- Visual Analytics- HPA on Teradata / Greenplum

Revolution R- DistributedR parallel compute

contexts, windows, Linux, Amazon Azure, Hadoop, Netezza

…but Multi-threaded

…All databases that support PMML

…Commodity hardware, Hadoop, Netezza, (Teradata October)

Revolution ConfidentialLets see some R in action……

23

Andrie de VriesBusiness Services Director, Europe

Revolution ConfidentialLets talk Talent

24

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Big Data• New Data

Sources• Data Variety &

Velocity• Fine Grain

Control• Data Movement,

Memory Limits

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Complex Computation • Innovative

Models• Experimentation• Many Small

Models• Ensemble

Models• Simulation

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Enterprise Readiness• Heterogeneous

Landscape• Write Once,

Deploy Anywhere

• Production Support

• How to put analytics in the hands of business users

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Speed & Production Efficiency• Shorter Model

Shelf Life• Volume of

Models• Long End-to-

End Cycle Time• Pace of Decision

Accelerated• Hardware

Required

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Talent• Finding data

scientists• Ensuring work-

force is continually trained

• Creating an Analytical culture

Revolution ConfidentialTalent gap emerging Will finding SAS talent become more difficult?

Programming community want to keep up to date and work on modern object orientated languages

Many universities have adopted R as the defacto analytics standard for statistics

Since 2012, USA job descriptions that included “SAS” declined by 7.3% whilst Jobs for “R” increased by 42% (number of jobs on indeed.com)

25

Search phrase: “Statistics Programming”Sorted by popularity (May 29, 2013)

7 out of 10 books based on R0 out of 10 books based on SAS or SPSS

Revolution Confidentialwww.revolutionanalytics.com - Page Views

26

020000400006000080000

100000120000140000160000 15

1302

3672

4

2832

1

2771

8

1988

8

1299

0

1361

5

1109

6

1174

8

1044

2Page Views - Top 10 Countries

01/04/2013 – 25/05/2013

197454

163055

112172

19303

6544

4073738 10624795

Page Views by Geo – 01/04/2013 –25/05/2013

EUROPE

NORTH AMERICA

APJ

SOUTH AMERICA

AFRICA

MIDDLE EAST

NA

CARIBBEAN

CENTRAL AMERICA

15645

76227

EMEA Page Views by Organisation Type

Academic

Commercial

Revolution Confidential

Functionality SAS Software Revolution RFoundationStatisticsGraphicsMatrix OperationsOptimizationTime SeriesQuality ControlDatabase AccessDeploy in ExcelDeploy in BIDistributed AlgorithmsParallel small computeIn Database Scoring

27

Base SASSAS/STATSAS/Graph

SAS IMLSAS/ORSAS ETSSAS QC

SAS/ACCESS

SAS Business Intelligence

SAS HPA ServerSAS Grid

SAS DB Accelerators

How do the modules breakdown

Revolution Confidential

Confidential to Revolution Analytics28

Training courses helping companies train SAS users

Revolution ConfidentialConclusions Complement SAS when… End to end industry based solutions from SAS are a

good fit for a particular business problem (e.g. SAS Fraud Framework for Insurance, Marketing Automation for Retail ) Complement when innovative models needed,

visualisation or big data/complex model support is required Choose SAS when users are not coders and need a

point and click interface (SAS enterprise guide, SAS enterprise miner) Existing SAS landscape requires significant re-

training

29

Revolution ConfidentialConclusions

Replace SAS when… Cost savings, do things faster, deal with bigger

data Big data and complex processing is required Innovative models that give a competitive

advantage Access to talent today and in the future Flexible compute environments are required

30

Revolution Confidential

31

www.revolutionanalytics.com  Twitter: @RevolutionR

The leading commercial provider of software and support for the popular open source R statistics language.

Thank you