r for sas users complement or replace two strategies
DESCRIPTION
Are you working in a SAS shop but want to add R based analytics to your portfolio? Learn why that is a great idea and how to do it.TRANSCRIPT
Revolution Confidential
SAS: Complement or Replace
June, 2013
Nick Barber - Sales Director
Andrie de Vries – Business Services Director
Revolution Analytics
Revolution Confidential
Introductions and welcome
2
Andrie de VriesBusiness Services Director, Europe
Nick BarberSales Director - Europe
Revolution ConfidentialAgenda
Quick introduction to Revolution Analytics Where does SAS and R fit in the Analytical
Landscape Introduction to R Typical Challenges Facing Analytical Organisations Differences between SAS and Revolution R Big Data Complex Computation Enterprise Readiness Production Efficiency Access to Talent
Conclusions…
4
Revolution ConfidentialCorporate Overview & Quick Facts
Founded 2008 (as REvolutionComputing)
Office Locations Palo Alto (HQ), Seattle (Engineering)SingaporeLondon
CEO David Rich
Number of customers
200+
Investors • Northbridge Venture Partners• Intel Capital• Platform Vendor
Web site: • www.revolutionanalytics.com
Revolution – “Contender” The Forrester Wave™: Big DataPredictive Analytics Solutions, Q1 2013
5
In the big data analytics context, speed and scale are critical drivers of success, and Revolution R delivers on both
Revolution R Enterprise is the leading commercial analytics platform based on the open source R statistical computing language
Revolution Confidential
Consumer & Info SvcsConsumer & Info Svcs
200+ Corporate Customers and growing
6
Finance & InsuranceFinance & Insurance Healthcare & Life SciencesHealthcare & Life Sciences
Manuf & TechManuf & TechAcademic & Gov’tAcademic & Gov’t
Revolution Confidential
Revolution ConfidentialWhere does R fit in the analytical lifecycle
7
Analytical data
Preparation
Analytical data
Exploration
Model Devlopment
Model Deployment
ETLBI /
operations
Opensource R competencies
Open source R is not- ETL- Business reporting tool- An end to end solution such as SAS
Marketing Automation or SAS Fraud Framework
Revolution ConfidentialIs: The way to do statistical computing A full blown programming language The home of every data mining algorithm known to
data science. A vibrant world-wide community
8
R was written in early 1990’s by
Robert Gentleman Ross Ihaka
the evolution of the
Since 1997 a core group of ~ 20
developers guides the evolution of the
language
Revolution ConfidentialTop companies are using R around the world
The NHS uses R to advance patient care and diagnosis The New York Times routinely uses R for interactive and print data
visualization. Ogilvy Europe uses R to analyse digital media campaigns for major
brands Google has more than 500 R users. The FDA supports the use of R for clinical trials of new drugs. The National Weather Service uses R to predict the extent of events. Facebook uses R to model user behaviour. The Consumer Financial Protection Bureau uses R and other open
source tools. Twitter uses R for data science applications on the Twitter database. John Deere uses R to forecast crop yields and optimize tractor
manufacturing.
9
Companies are recognising the additional benefits of R
Revolution ConfidentialIncredible Graphics and Data Visualization lead the way vs SAS
Functions for standard graphs Scatterplot, time series,
histogram, smoothing, … Bar plot, pie chart, dot chart,
… Image plot, 3-D surface, map,
…
Customize without limits Combine graph types Create entirely new graphics
10
Revolution ConfidentialR is open source and drives analytic innovation but has some limitations for Enterprises
Bigger data sizes
Speed of analysis
Production support
Memory Bound Big Data
Single Threaded Scale out, parallel processing, high speed
Community Support Commercial production support
Innovation and scale
Innovative – 4500 packages+, exponential growth
Combines with open source R packages where needed
11
Revolution Confidential
Typical Challenges Facing Analytical Organisations
12
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Revolution ConfidentialLets talk BIG DATA
13
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Revolution ConfidentialHow do SAS and Revolution R stack up for Big Data
Both handle large data sets well (big speed differences….)
Both have high speed database connectors to handle variety / velocity
Object Orientated nature of R handles data manipulation and visualisation in a superior way
Data Step parallel functions (such as merge/sort/cleansing) in Revolution R are available only in SAS HPA environments
RHadoop project (rhbase, rhdfs, rmr) run in-side Hadoop
14
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Revolution ConfidentialLets talk Complex Computation
15
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Revolution ConfidentialHow do SAS and Revolution R stack up for Complex Computation
Innovative Models: More functions available in R
16
Complex Computation • Innovative
models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
0 1,000 2,000 3,000 4,000 5,000
1,192
4,500
R SAS
R 2.15.2 Packages
SAS 9.3 statements, procedures, functions and call routines
Source: http://r4stats.com/2013/03/19/r-2012-growth-exceeds-sas-all-time-total/
Revolution ConfidentialHow do SAS and Revolution R stack up for Complex Computation
Revolution R runs in parallel across multiple nodes and cores
SAS runs in parallel in SAS Grid multiple jobs, but still single threaded
SAS can run in parallel in SAS HPA
17
Complex Computation at Speed• Innovative
Models• Experimentation• Precision • Many Small
Models• Ensemble
Models• Simulation
Revolution ConfidentialLets talk Enterprise Readiness
18
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Revolution ConfidentialHow do SAS and Revolution R stack up for Enterprise Readiness
Both handle heterogeneous landscapes SAS runs on anything but mostly single threaded apart
from Teradata and Greenplum (no cloud except through own managed services)
Revolution runs across windows/Linux clusters, cores, Hadoop, Amazon Web Services, Microsoft Azure, Netezza and Teradata
SAS Programmers must write code for the required environment, whilst Revolution R code is device independent
Both offer good production support SAS integrates with pretty much all common BI reporting
tools as does Revolution
19
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Revolution ConfidentialLets talk Production Efficiency
20
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Revolution ConfidentialHow do SAS and Revolution R stack up for Speed & Production Efficiency?
21
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-End
Cycle Time• Pace of Decision
Accelerated
*As published by SAS in HPC Wire, April 21, 2011http://www.hpcwire.com/hpcwire/2011-04-19/sas_brings_high_performance_analytics_to_database_appliances.html
Revolution ConfidentialOptions for handling Speed
22
SAS- Normal SAS- Single Threaded
SAS Grid- Platform LSF- Single Threaded
SAS In-Database Scoring- Teradata Accelerator- Greenplum Accelerator
SAS High Performance Computing- Visual Analytics- HPA on Teradata / Greenplum
Revolution R- DistributedR parallel compute
contexts, windows, Linux, Amazon Azure, Hadoop, Netezza
…but Multi-threaded
…All databases that support PMML
…Commodity hardware, Hadoop, Netezza, (Teradata October)
Revolution ConfidentialLets see some R in action……
23
Andrie de VriesBusiness Services Director, Europe
Revolution ConfidentialLets talk Talent
24
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Big Data• New Data
Sources• Data Variety &
Velocity• Fine Grain
Control• Data Movement,
Memory Limits
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Complex Computation • Innovative
Models• Experimentation• Many Small
Models• Ensemble
Models• Simulation
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Enterprise Readiness• Heterogeneous
Landscape• Write Once,
Deploy Anywhere
• Production Support
• How to put analytics in the hands of business users
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Speed & Production Efficiency• Shorter Model
Shelf Life• Volume of
Models• Long End-to-
End Cycle Time• Pace of Decision
Accelerated• Hardware
Required
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Talent• Finding data
scientists• Ensuring work-
force is continually trained
• Creating an Analytical culture
Revolution ConfidentialTalent gap emerging Will finding SAS talent become more difficult?
Programming community want to keep up to date and work on modern object orientated languages
Many universities have adopted R as the defacto analytics standard for statistics
Since 2012, USA job descriptions that included “SAS” declined by 7.3% whilst Jobs for “R” increased by 42% (number of jobs on indeed.com)
25
Search phrase: “Statistics Programming”Sorted by popularity (May 29, 2013)
7 out of 10 books based on R0 out of 10 books based on SAS or SPSS
Revolution Confidentialwww.revolutionanalytics.com - Page Views
26
020000400006000080000
100000120000140000160000 15
1302
3672
4
2832
1
2771
8
1988
8
1299
0
1361
5
1109
6
1174
8
1044
2Page Views - Top 10 Countries
01/04/2013 – 25/05/2013
197454
163055
112172
19303
6544
4073738 10624795
Page Views by Geo – 01/04/2013 –25/05/2013
EUROPE
NORTH AMERICA
APJ
SOUTH AMERICA
AFRICA
MIDDLE EAST
NA
CARIBBEAN
CENTRAL AMERICA
15645
76227
EMEA Page Views by Organisation Type
Academic
Commercial
Revolution Confidential
Functionality SAS Software Revolution RFoundationStatisticsGraphicsMatrix OperationsOptimizationTime SeriesQuality ControlDatabase AccessDeploy in ExcelDeploy in BIDistributed AlgorithmsParallel small computeIn Database Scoring
27
Base SASSAS/STATSAS/Graph
SAS IMLSAS/ORSAS ETSSAS QC
SAS/ACCESS
SAS Business Intelligence
SAS HPA ServerSAS Grid
SAS DB Accelerators
How do the modules breakdown
Revolution Confidential
Confidential to Revolution Analytics28
Training courses helping companies train SAS users
Revolution ConfidentialConclusions Complement SAS when… End to end industry based solutions from SAS are a
good fit for a particular business problem (e.g. SAS Fraud Framework for Insurance, Marketing Automation for Retail ) Complement when innovative models needed,
visualisation or big data/complex model support is required Choose SAS when users are not coders and need a
point and click interface (SAS enterprise guide, SAS enterprise miner) Existing SAS landscape requires significant re-
training
29
Revolution ConfidentialConclusions
Replace SAS when… Cost savings, do things faster, deal with bigger
data Big data and complex processing is required Innovative models that give a competitive
advantage Access to talent today and in the future Flexible compute environments are required
30