sas and r code for basic statistics

7
SAS and R Code For Basic Statistics Avjinder Singh Kaler

Upload: avjinder-avi-kaler

Post on 10-Jan-2017

136 views

Category:

Education


2 download

TRANSCRIPT

Page 1: SAS and R Code for Basic Statistics

SAS and R Code

For

Basic Statistics Avjinder Singh Kaler

Page 2: SAS and R Code for Basic Statistics

Table Content

1.Reading Data

2.Descriptive Statistics (DS)

3.Correlation and Covariance

4.Analysis of Variance (ANOVA)

5.Regression and Multiple Regression

Page 3: SAS and R Code for Basic Statistics

1. Reading Data

1. SAS I. There are many way of data reading in SAS, but the most common one is

Data data_name; data name used for file

Input x y z; variables

Datalines;

1 2 4

4 5 7

. . . Copy and paste data here

;

Run;

II. Data can be read from the folder of computer.

data data_name;

infile 'C:\Avi \soy.txt';

input GE ENV YIELD;

run;

filename avi 'C:\Avi \soy.txt';

data data_name;

infile avi;

input GE ENV YIELD;

run;

III. Data can be imported using the import option under File tab.

2. R

> getwd() # get current working directory

> setwd("C:/MyFolder") # set working directory

First Set working directory to the folder where data is present

If data is in text format, then data can be read as:

I. mydata <- read.table(“data_name.txt”)

If data is in csv format, then data can be read as:

II. mydata <- read.csv(“data_name.csv”)

Page 4: SAS and R Code for Basic Statistics

2. Descriptive Statistics

1. SAS

Descriptive statistics in SAS can be computed using PROC UNIVARIATE

PROC UNIVARIATE DATA=data_name;

VAR yield; Descriptive statistics for variable

By ENV; By Group

Histogram;

RUN;

Other ways are PROC MEANS and PROC FREQ

PROC FREQ DATA=data_name;

TABLES yield ;

RUN;

PROC MEANS DATA=data_name;

CLASS GE;

VAR yield;

RUN;

2. R

Descriptive statistics in R can be computed by following ways;

summary(mydata)

There are some packages in R that can be loaded and used for DS such as Hmisc, pastecs,

and psych .

library(Hmisc)

describe(mydata)

library(pastecs)

stat.desc(mydata)

library(psych)

describe(mydata)

Note: First install these packages.

Page 5: SAS and R Code for Basic Statistics

3. Correlation and Covariance

There are different methods of correlations such as pearson, spearman or kendal.

1. SAS

Correlation and covariance in SAS for different methods:

proc corr cov data=data_name pearson spearman kendall hoeffding plots=all;

var x y z;

run;

2. R Correlation and covariance in R for different methods:

cor(mydata, use= "complete.obs", method= pearson )

cov(mydata, use= "complete.obs", method= pearson )

Note: Here mydata is numeric data frame. Method can be changed.

There are some packages that can be loaded such as Hmisc package.

library(Hmisc)

rcorr(mydata, type="pearson") # type can be pearson or spearman

Correlation between two variables x and y

cor(x,y)

Page 6: SAS and R Code for Basic Statistics

4. Analysis of Variance (ANOVA)

1. SAS PROC ANOVA, PROC GLM, PROC MIXED, PROC GLIMMIX, and PROC HPMIXED

can be used for ANOVA.

Proc ANOVA for non-missing data.

PROC GLM for fixed effect factors.

PROC MIXED, PROC GLIMMIX, and PROC HPMIXED for random and fixed effect

factors.

Proc ANOVA/GLM data=data_name;

Class factorvar;

Model responsevar= factorvar;

MEANS factorvars / BON T LSD TUKEY;

Run;

Proc MIXED/GLIMMIX/HPMIXED data=data_name;

Class factorvar;

Model responsevar= factorvar;

Random factorvar;

lsmeans A / adjust= BON T TUKEY;

Run;

Note: Select one model depending on your dataset.

2. R

Analysis of variance (ANOVA) can be computed in R using:

x <- aov(responvar ~ factorvar, data=mydata) #CBD

x <- aov(y ~ A + B, data=mydata) #RCBD

x <- aov(y ~ A + B + A:B, data=mydata) #factorial design

summary(x)

Multiple comparisons

TukeyHSD(x)

Page 7: SAS and R Code for Basic Statistics

5. Regression and Multiple Regressions

1. SAS Simple Regression

Proc reg data=data_name;

model response_var = factor_var1;

run;

Multiple Regression

Proc reg data=data_name;

model response_var = factor_var1 factor_var2 factor_var3 factor_var4 ;

run;

2. R

Multiple Linear Regression

x<- lm(y ~ x1 + x2 + x3, data=mydata)

summary(fit) # show results