sas to r best practices - convert to r...

36
1 of 36 Confidential and Proprietary © 2012 Boston Decision, LLC SAS to R Best Practices in SAS to R Conversion

Upload: hatram

Post on 06-Feb-2018

247 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

1 of 36 Confidential and Proprietary © 2012 Boston Decision, LLC

SAS to R Best Practices in SAS to R Conversion

Page 2: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

2 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

About Rconvert.com

Division of Boston Decision, LLC

Founded 2010 - Cambridge, MA Finance, Marketing, Technology

Located in the Cambridge Innovation Center

www.BostonDecision.com

Page 3: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

3 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS & R Compared

Page 4: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

4 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS – Circa 1966

• 4th Generation Language for Data Analysis

• Mostly written in the C language

• Proprietary

Page 5: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

5 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

R – Circa 1993

• Origins in S language, circa 1976

• 4th Generation Language for Data Analysis

• Mostly written in the C language

• Open-source

Page 6: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

6 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Components

• Data Step – Functions

• Procedure Step

• Macro Language

• SAS ODS

• Component Product Languages – SAS/IML

Page 7: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

7 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

R Components

• Functions

Page 8: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

8 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

A Paradigm Shift

• In R, all work is performed by functions

– Data steps = expressions with functions

– Procedures = expressions with functions

– Macros = expressions with functions

– SAS functions = R functions

Page 9: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

9 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Data Step – Implied Loop

• Data steps leverage an implied loop

– Reads data by row (obs), passes row through code line-by-line, hits run, starts next row.

• R does not make use of an implicit loop

– R applies functions at the “column-level”

Page 10: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

10 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Example

a a

A B

1 2

4 5

7 8

A B C

1 2 3

a a

A B

1 2

4 5

7 8

A B C

1 2 3

4 5 9

Page 11: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

11 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

R Example

a

A B

1 2

4 5

7 8

A B C

1 2 3

4 5 9

7 8 15

1 4 7

2 5 8

+

Page 12: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

12 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Data Storage

• SAS data sets

– Arrays = special group of columns in a SAS data set

• In R, more variety.

– Data frame (similar to SAS data set)

– Vectors – single “column” of values (all same type)

– Matrices

– Lists – collection of objects

Page 13: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

13 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Object Orientation (OOP)

• Both SAS and R can be used in OOP fashion

– In practice, we don’t see this much with SAS

• In R, everything is an object

– Variables are objects

– Functions are objects

Page 14: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

14 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Native Memory Usage

• SAS

– Hard drive (I/O) Intensive

• R

– RAM Intensive

• RevoScaleR

– External memory algorithms with data on disk

Page 15: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

15 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Extensibility

• SAS

– New procedures added with additional SAS purchases.

• R

– New functions added by loading libraries from CRAN.

Page 16: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

16 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Conversion Process

Page 17: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

17 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Quick Start

• Developing tools and libraries to expedite conversion.

Page 18: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

18 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Hybrid Agile Conversion

Implement in R

Test & Feedback

Document

Conversion Design

SAS Code Review

Master Requirements

Iteration Plan

Page 19: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

19 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Impact

• Faster results.

• Surfaces challenges quickly.

• TRANSPARENCY!

Page 20: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

20 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Order of Review

SAS Macro Language

Procedures Data Steps

Page 21: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

21 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Re-Factorization Strategy

• SAS and R are not just different languages.

– They are different frameworks.

• Planning must ensure appropriate vectorization to conform to the R framework.

Page 22: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

22 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Macro Assessment

• Assess business usage

– Level 1: Wrapper Macros

– Level 2: Macro Variables

– Level 3: Code Generation

Page 23: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

23 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Areas of Caution - Macros

• Macros to create or name SAS data sets

• Loops and iterations

• Specific syntax

– Call Symput

– %GOTO

Page 24: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

24 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Procedure Conversion

• SAS procedures map to R functions

– Most common Base SAS, SAS/STAT, and SAS/GRAPH can be found in Base R.

• Most common SAS procedures have approximate 1:1 analogs in R

– E.g. univariate, means, freq, rank, corr, import

Page 25: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

25 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Areas of Caution - PROCS

• Some advanced SAS procedures may require literature review of implementation.

• R implements new capabilities faster – consequence of open source.

Page 26: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

26 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Data Step Conversion

• Many SAS functions have similar R syntax.

– E.g. String and data manipulation

• Consider each tool’s preferred analysis target

– SAS = data sets = row

– R = vector = column

Page 27: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

27 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Areas of Caution – Data Steps

• Missing Data Handling

– SAS uses “.” for numeric and “” for string.

– 27 other missing data values.

– R uses NA. No numeric or string equivalent.

• Date & Time

– SAS date – number of days since 1/1/1960

– R date – number of days since 1/1/1970

Page 28: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

28 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Best Practices

• Best practice in SAS may be poor practice in R due to paradigm shift.

• E.g. SAS loop should not convert to R loop – Loops in SAS should generally be reconstructed

as applies in R.

• SAS is procedural, R is functional.

Page 29: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

29 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Conversion Samples

Page 30: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

30 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Reading a CSV File

• SAS

• R

Page 31: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

31 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Merging Data

• SAS

• R

Page 32: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

32 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Contingency Tables

• SAS

• R

Page 33: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

33 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Logistic Regression

• SAS

• R

Page 34: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

34 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Non-Linear Optimization

• SAS

• R

Page 35: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

35 of 36 Confidential and Proprietary © 2012 Boston Decision, LLC

Thank you Timothy D‘Auria

Page 36: SAS to R Best Practices - Convert to R Statisticalrconvert.com/Convert_SAS_To_R_Statistical_Data/wp-content/uploads/... · SAS to R Best Practices in SAS to R Conversion . ... –

36 of 36 www.bostondecision.com [email protected]

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Disclaimers

• SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

• R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License. R is hosted at www.r-project.org.

• Boston Decision LLC and Rconvert.com are not affiliated with SAS Institute nor the core R development team.