sas lecture 5 – some regression procedures aidan mcdermott, april 25, 2005

48
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

Upload: shanon-warner

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

SAS Lecture 5 – Some regression procedures

Aidan McDermott,April 25, 2005

Page 2: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

What will the output from this program look like?How many variables will be in the dataset example, and what will be the length and type of each variable?What will the variable package look like?

Page 3: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

What will the output from this program look like?

Page 4: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

Modeling with SAS

examine relationships between variables

estimate parameters and their standard errors

calculate predicted values evaluate the fit or lack of fit of a model test hypotheses

design outcome

Page 5: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

The linear model

Example:

kk xxxy 22110

),0(~ 2 N

AgeHeightWeight 210

Note: outcome variable must be continuous and normal given independent variables

Page 6: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

the linear model with proc reg

estimates parameters by least squares produces diagnostics to test model fit

(e.g. scatter plots) tests hypotheses

Example:

proc reg data=mydata;

model weight = height age;

run;

Page 7: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc reg

Syntax:

proc reg <options>;

model response = effects </options>;

plot yvariable*xvariable = ’symbol’;

by varlist;

output <OUT=SAS data set>

<output statistic list>;

run;

Page 8: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc reg

proc reg statement syntax:

data = SAS data set name input data set

outest = SAS data set name creates data set with parameter estimates

simple prints simple statistics

Page 9: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc regthe model statement

model response=<effects></options>;

required variables must be numeric many options can specify more than one model

statement

Example:

model weight = height age; model weight = height age / p clm cli;

Page 10: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc regthe plot statement plot yvariable*xvariable <=symbol> </options>;

produces scatter plots - yvariable on the vertical axis and xvariable on the horizontal axis

can specify several plots optional symbol to mark points yvariable and xvariable can be variables specified

in model statements or statistics available in output statement

Example:

plot weight * age / pred; plot r. * p. / vref = 0;

Page 11: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc reg

some statistics available for plotting: P. predicted values R. residuals L95. lower 95% CI bound for individual prediction U95. upper 95% CI bound for individual prediction L95M. lower 95% CI bound for mean of dependent variable U95M. upper 95% CI bound for mean of dependent variable

Example:

plot weight * age / pred; plot r. * p. / vref = 0; plot (weight p. l95. U95.) * age / overlay;

Page 12: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc reg

the output statement

output <OUT=SAS data set> keywords=names;

creates SAS data set all original variables included keyword=names specifies the statistics to include

Example:

output out=pvals p=pred r=resid;

Page 13: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

Example: NMES

variables of interest:

totalexp – total medical expenditure ($)

chd5 – indicator of CHD

lastage – age at last interview

male – sex of participant

Page 14: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc reg example

here:

1. model estimate parameters etc 2. plot make three plots 3. output make an output dataset regout

Page 15: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 16: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 17: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

The run statementMany people assume that the run statement ends a procedure such as proc reg.

This is because when SAS encounters a run statement it executes any outstanding instructions in the program buffer. But it may or may not end the procedure.

proc reg data=lecture4.nmes; model totalexp = chd5 lastage male;run; model totalexp = chd5 lastage; plot r.*chd5;run;quit; /* ends the procedure */

Page 18: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc glm (the general linear model) uses least-squares with generalized inverses performs linear regression, analysis of variance, analysis of covariance accepts classification variables (discrete) and continuous variables estimates and performs tests for general linear effects proc anova is suitable for “balanced” designs; proc glm can be used

for either balanced or unbalanced designs suitable for random effects models

Page 19: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc glm

Syntax:

proc glm data=name <options>;

class classification variables;

model response=effects /options;

means effects / options;

random effects / options;

estimate ‘label’ effect value / options;

contrast ‘label’ effect value / options;

run;

Page 20: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc glm

response (dependent) variable is continuous – same normality assumption as in proc reg

independent variables are discrete or continuous; discrete must listed on class statement

interaction terms can be with an asterisk a*b, e.g.

model bmi= a b a*b;

Page 21: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc glm

means effects / options; computes arithmetic means and standard

deviations of all continuous variables in the model (both dependent and independent) within each group for effects specified on the right-hand side of the model statement

only class variables may be specified as effects

options specify multiple comparison methods for main effect terms in the model

Page 22: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc glm example

here:

1. solution show estimated parameters 2. means show means for smoke variable 3. class treat smoke as discrete

Page 23: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 24: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 25: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc glm example

here:

1. format changes reference group

Page 26: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 27: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

reg and glm Both the proc reg and proc glm

procedures are suitable only when the outcome variable is normally distributed.

proc reg has many regression diagnostic features, while proc glm allows you to fit more sophisticated linear models such as random effects models, models for unbalanced designs etc.

Page 28: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

non-normal outcomes In many situations we cannot assume our

response variable is normally distributed. proc reg and proc glm are not suitable for

modeling such outcomes.

Example: Suppose you are interested in estimating

the prevalence of disease in a population. You have an indicator of disease (1 = Yes, 0 = No)

Page 29: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

non-normal outcomes Example: You are interested in estimating how the

incidence of infant mortality has changed as a function of time

Example: You are interested in estimating the

median survival time for two groups of patients receiving either a placebo or treatment.

Page 30: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

Example:

Survey data: parent agrees to close school when certain toxic elements are found in the environment

Variables: close 0 = no, 1 = yes lived years lived in

community

proc logistic

Page 31: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 32: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc logistic

Syntax:

proc logistic <options>;

model response = effects </options>;

class variables;

by variables;

output <out=name> <keyword=name>;

run;

Page 33: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc logistic

• descending option means that we are modeling the probability that close=1 and not the probability that close=0.

Page 34: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 35: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 36: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc genmod• implements the generalized linear model• fits models with normal, binomial or

poisson response variable (among others)• fits generalized estimating equations for

repeated measures data

Page 37: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc genmod

Syntax:

proc genmod <options>;

by variables;

class variables;

model response = effects </options>;

output <out=name> <keyword=name>;

make ‘table’ out=name;

run;

Page 38: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

proc genmod: class statement says which variables are

classification (categorical) variables by statement produces a separate analysis for

each level of the by variables (data must be sorted in the order of the by variables)

response variable is the response (dependent) variable in the regression model.

<effects> are a list of variables. These are the independent variables in the regression model. Any independent variables that are categorical must be listed in the Class statement.

Page 39: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

Example:

• Same model as we produced with proc glm. The default is a linear model.

• smoke will be treated as a categorical variable because of the class statement

Page 40: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 41: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

options for the model statementdist = option specifies the distribution of the response

variable. (default = normal)

link = option specifies the link that will transform the response variable (default = identity)

Examples:

logistic regression: dist=binomial link=logit poisson regression: dist=poisson link=log

Page 42: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

options for the model statementalpha = specifies confidence level for confidence

intervals

waldci or lrci specifies that confidence intervals are to be computed. The waldci gives approximate intervals and doesn’t take as long as lrci. The lrci give intervals based on likelihood ratio.

Page 43: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

the output statement• the output statement is just one of the ways to create

a new SAS dataset containing results form the genmod procedure.

• statement is similar to that found in proc means and proc glm.

Example:

output out=new

predicted=fit upper=upper lower=lower;

Page 44: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

the make statement• the make statement is another way to create a new

SAS dataset containing results form the genmod procedure.

• ods is another more general way (see later).

Example:

make ‘ParameterEstimates’ out=parms;

make ‘ParmInfo’ out=parminfo;

Page 45: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

example: logistic regression

Perform a logistic regression analysis to determine how the odds of CHD are associated with age and gender in the 1987 NMES

Save the parameter estimates as a new dataset. Save the predicted values along with the

original data.

Page 46: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005

Example:

• descending options means that we are modeling the probability that chd5=1 and not the probability that chd5=0.

Page 47: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005
Page 48: SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005