intro to r for sas and spss user webinar

14
Bob Muenchen, Author R for SAS and SPSS Users , Co-Author R for Stata Users [email protected], http://r4stats.com Copyright © 2010, 2011, Robert A Muenchen. All rights reserved. 2 What is R? R’s Advantages R’s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More 3 The most powerful statistical computing language on the planet.” -Norman Nie, Developer of SPSS Language + package + environment for graphics and data analysis Free and open source Created by Ross Ihaka & Robert Gentleman 1996 & extended by many more An implementation of the S language by John Chambers and others R has 4,950 add-ons, or nearly 100,000 procs 4

Upload: revolution-analytics

Post on 15-Jan-2015

18.464 views

Category:

Education


6 download

DESCRIPTION

R is free software for data analysis and graphics that is similar to SAS and SPSS. Two million people are part of the R Open Source Community. Its use is growing very rapidly and Revolution Analytics distributes a commercial version of R that adds capabilities that are not available in the Open Source version. This 60-minute webinar is for people who are familiar with SAS or SPSS who want to know how R can strengthen their analytics strategy.

TRANSCRIPT

Page 1: Intro to R for SAS and SPSS User Webinar

Bob Muenchen, Author R for SAS and SPSS Users,Co-Author R for Stata Users

[email protected], http://r4stats.com

Copyright © 2010, 2011, Robert A Muenchen. All rights reserved. 2

What is R? R’s Advantages R’s Disadvantages Installing and Maintaining R Ways of Running R An Example Program Where to Learn More

3

“The most powerful statistical computing language on the planet.” -Norman Nie, Developer of SPSS

Language + package + environment for graphics and data analysis

Free and open source Created by Ross Ihaka & Robert Gentleman 1996

& extended by many more An implementation of the S language by

John Chambers and others R has 4,950 add-ons, or nearly 100,000 procs

4

Page 2: Intro to R for SAS and SPSS User Webinar

http://r4stats.com/popularity5 6Source: r4stats.com/popularity

1. Data input & management (data step)2. Analytics & graphics procedures (proc step)3. Macro language4. Matrix language5. Output management systems (ODS/OMS)

R integrates these all seamlessly.

7

* SAS Approach;

DATA A; SET A;

logX = log(X);

PROC REG;

MODEL Y = logX;

# R Approach

lm( Y ~ log(X) )

8

Page 3: Intro to R for SAS and SPSS User Webinar

Vast selection of analytics & graphics New methods are available sooner Many packages can run R (SAS, SPSS, Excel…) Its object orientation “does the right thing” Its language is powerful & fully integrated Procedures you write are on an equal footing It is the universal language of data analysis It runs on any computer Being open source, you can study and modify it It is free

9 10

* Using SAS;PROC TTEST DATA=classroom; CLASS gender; VAR score;

# In Rt.test(score ~ gender, data=classroom)

t.test(posttest, pretest , paired=TRUE, data=classroom)

11

Language is somewhat harder to learn Help files are sparse & complex Must find R and its add-ons yourself Graphical user interfaces not as polished Most R functions hold data in main memory Rule-of-thumb: 10 million values per gigabyte SAS/SPSS: billions of records Several efforts underway to break R’s memory limit

including Revolution Analytics’ distribution

12

Page 4: Intro to R for SAS and SPSS User Webinar

Base R plus Recommended Packages like: Base SAS, SAS/STAT, SAS/GRAPH, SAS/IML Studio SPSS Stat. Base, SPSS Stat. Advanced, Regression

Tested via extensive validation programs But add-on packages written by… Professor who invented the method? A student interpreting the method?

13

Email support is free, quick, 24-hours: www.r-project.org/mail.html Stackoverflow.com Quora.com Crossvalidated stats.stackexchange.com

/questions/tagged/r Phone support available commercially

14

1. Go to cran.r-project.org, the Comprehensive R Archive Network

2. Download binaries for Base & run3. Add-ons:

install.packages(“myPackage”)

4. To update: update.packages()

15

Comprehensive R Archive Network Crantastic.com Inside-R.org R4Stats.com

16

Page 5: Intro to R for SAS and SPSS User Webinar

17 18

19 20

Page 6: Intro to R for SAS and SPSS User Webinar

21

Run code interactively Submit code from Excel, SAS, SPSS,… Point-n-click using

Graphical User Interfaces (GUIs) Batch mode

22

23 24

Page 7: Intro to R for SAS and SPSS User Webinar

25 26Copyright © 2010, 2011, Robert A Muenchen. All rights reserved.

run ExportDataSetToR("mydata");

submit/r;mydata$workshop <-factor(mydata$workshop)

summary(mydata)

endsubmit;

27

GET FILE=‘mydata.sav’.BEGIN PROGRAM R.mydata <- spssdata.GetDataFromSPSS(

variables = c("workshop gender q1 to q4"),missingValueToNA = TRUE,row.label = "id" )

summary(mydata)END PROGRAM.

28

Page 8: Intro to R for SAS and SPSS User Webinar

29 30

31 32

Page 9: Intro to R for SAS and SPSS User Webinar

33 34

A company focused on R development & support Run by SPSS founder Norman Nie Their enhanced distribution of R:

Revolution R Enterprise Free for colleges and universities, including for

outside consulting

35

Page 10: Intro to R for SAS and SPSS User Webinar
Page 11: Intro to R for SAS and SPSS User Webinar

43 44

Page 12: Intro to R for SAS and SPSS User Webinar

mydata <- read.csv("mydata.csv")

print(mydata)

mydata$workshop <- factor(mydata$workshop)

summary(mydata)

plot( mydata$q1, mydata$q4 )

myModel <- lm( q4~q1+q2+q3, data=mydata )

summary( myModel )

anova( myModel )

plot( myModel )

45

> mydata <- read.csv("mydata.csv")

> print(mydata)

workshop gender q1 q2 q3 q4

1 1 f 1 1 5 1

2 2 f 2 1 4 1

3 1 f 2 2 4 3

4 2 <NA> 3 1 NA 3

5 1 m 4 5 2 4

6 2 m 5 4 5 5

7 1 m 5 3 4 4

8 2 m 4 5 5 546

> mydata$workshop <-factor(mydata$workshop)

> summary(mydata)workshop gender

1:4 f :3

2:4 m :4

NA's:1

q1 q2 q3 q4

Min. :1.00 Min. :1.00 Min. :2.000 Min. :1.00

1st Qu.:2.00 1st Qu.:1.00 1st Qu.:4.000 1st Qu.:2.50

Median :3.50 Median :2.50 Median :4.000 Median :3.50

Mean :3.25 Mean :2.75 Mean :4.143 Mean :3.25

3rd Qu.:4.25 3rd Qu.:4.25 3rd Qu.:5.000 3rd Qu.:4.25

Max. :5.00 Max. :5.00 Max. :5.000 Max. :5.00

NA's :1.000

47 48

Page 13: Intro to R for SAS and SPSS User Webinar

> myModel <- lm(q4 ~ q1+q2+q3, data=mydata)

> summary(myModel)

Call:

lm(formula = q4 ~ q1 + q2 + q3, data = mydata)

Residuals:

1 2 3 5 6 7 8

-0.3113 -0.4261 0.9428 -0.1797 0.0765 0.0225 -0.1246

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.3243 1.2877 -1.028 0.379

q1 0.4297 0.2623 1.638 0.200

q2 0.6310 0.2503 2.521 0.086

q3 0.3150 0.2557 1.232 0.306

Multiple R-squared: 0.9299, Adjusted R-squared: 0.8598

F-statistic: 13.27 on 3 and 3 DF, p-value: 0.03084

49 50Copyright © 2010, 2011, Robert A Muenchen. All rights reserved.

51 52

Page 14: Intro to R for SAS and SPSS User Webinar

53

R for SAS and SPSS Users, Muenchen R for Stata Users, Muenchen & Hilbe R Through Excel: A Spreadsheet Interface for Statistics,

Data Analysis, and Graphics, Heiberger & Neuwirth Data Mining with Rattle and R: The Art of Excavating

Data for Knowledge Discovery, Williams

54

R is powerful, extensible, free Download it from CRAN Academics download Revolution R Enterprise

for free at www.revolutionanalytics.com You run it many ways & from many packages Several graphical user interfaces are available R's programming language is the way

to access its full power

55

[email protected]

Slides: r4stats.com/misc/webinarPresentation: bit.ly/R-sas-spss