sas to r best practices - convert to r...

Post on 06-Feb-2018

249 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 of 36 Confidential and Proprietary © 2012 Boston Decision, LLC

SAS to R Best Practices in SAS to R Conversion

2 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

About Rconvert.com

Division of Boston Decision, LLC

Founded 2010 - Cambridge, MA Finance, Marketing, Technology

Located in the Cambridge Innovation Center

www.BostonDecision.com

3 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS & R Compared

4 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS – Circa 1966

• 4th Generation Language for Data Analysis

• Mostly written in the C language

• Proprietary

5 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

R – Circa 1993

• Origins in S language, circa 1976

• 4th Generation Language for Data Analysis

• Mostly written in the C language

• Open-source

6 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Components

• Data Step – Functions

• Procedure Step

• Macro Language

• SAS ODS

• Component Product Languages – SAS/IML

7 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

R Components

• Functions

8 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

A Paradigm Shift

• In R, all work is performed by functions

– Data steps = expressions with functions

– Procedures = expressions with functions

– Macros = expressions with functions

– SAS functions = R functions

9 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Data Step – Implied Loop

• Data steps leverage an implied loop

– Reads data by row (obs), passes row through code line-by-line, hits run, starts next row.

• R does not make use of an implicit loop

– R applies functions at the “column-level”

10 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Example

a a

A B

1 2

4 5

7 8

A B C

1 2 3

a a

A B

1 2

4 5

7 8

A B C

1 2 3

4 5 9

11 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

R Example

a

A B

1 2

4 5

7 8

A B C

1 2 3

4 5 9

7 8 15

1 4 7

2 5 8

+

12 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Data Storage

• SAS data sets

– Arrays = special group of columns in a SAS data set

• In R, more variety.

– Data frame (similar to SAS data set)

– Vectors – single “column” of values (all same type)

– Matrices

– Lists – collection of objects

13 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Object Orientation (OOP)

• Both SAS and R can be used in OOP fashion

– In practice, we don’t see this much with SAS

• In R, everything is an object

– Variables are objects

– Functions are objects

14 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Native Memory Usage

• SAS

– Hard drive (I/O) Intensive

• R

– RAM Intensive

• RevoScaleR

– External memory algorithms with data on disk

15 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Extensibility

• SAS

– New procedures added with additional SAS purchases.

• R

– New functions added by loading libraries from CRAN.

16 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Conversion Process

17 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Quick Start

• Developing tools and libraries to expedite conversion.

18 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Hybrid Agile Conversion

Implement in R

Test & Feedback

Document

Conversion Design

SAS Code Review

Master Requirements

Iteration Plan

19 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Impact

• Faster results.

• Surfaces challenges quickly.

• TRANSPARENCY!

20 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Order of Review

SAS Macro Language

Procedures Data Steps

21 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Re-Factorization Strategy

• SAS and R are not just different languages.

– They are different frameworks.

• Planning must ensure appropriate vectorization to conform to the R framework.

22 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Macro Assessment

• Assess business usage

– Level 1: Wrapper Macros

– Level 2: Macro Variables

– Level 3: Code Generation

23 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Areas of Caution - Macros

• Macros to create or name SAS data sets

• Loops and iterations

• Specific syntax

– Call Symput

– %GOTO

24 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

SAS Procedure Conversion

• SAS procedures map to R functions

– Most common Base SAS, SAS/STAT, and SAS/GRAPH can be found in Base R.

• Most common SAS procedures have approximate 1:1 analogs in R

– E.g. univariate, means, freq, rank, corr, import

25 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Areas of Caution - PROCS

• Some advanced SAS procedures may require literature review of implementation.

• R implements new capabilities faster – consequence of open source.

26 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Data Step Conversion

• Many SAS functions have similar R syntax.

– E.g. String and data manipulation

• Consider each tool’s preferred analysis target

– SAS = data sets = row

– R = vector = column

27 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Areas of Caution – Data Steps

• Missing Data Handling

– SAS uses “.” for numeric and “” for string.

– 27 other missing data values.

– R uses NA. No numeric or string equivalent.

• Date & Time

– SAS date – number of days since 1/1/1960

– R date – number of days since 1/1/1970

28 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Best Practices

• Best practice in SAS may be poor practice in R due to paradigm shift.

• E.g. SAS loop should not convert to R loop – Loops in SAS should generally be reconstructed

as applies in R.

• SAS is procedural, R is functional.

29 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Conversion Samples

30 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Reading a CSV File

• SAS

• R

31 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Merging Data

• SAS

• R

32 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Contingency Tables

• SAS

• R

33 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Logistic Regression

• SAS

• R

34 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Non-Linear Optimization

• SAS

• R

35 of 36 Confidential and Proprietary © 2012 Boston Decision, LLC

Thank you Timothy D‘Auria

36 of 36 www.bostondecision.com info@bostondecision.com

1 Broadway, 14th Fl Cambridge, MA 02142 Phone 617 500 0093

© 2012 Boston Decision, LLC

Disclaimers

• SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

• R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License. R is hosted at www.r-project.org.

• Boston Decision LLC and Rconvert.com are not affiliated with SAS Institute nor the core R development team.

top related