sas lecture 5 – some regression procedures aidan mcdermott, april 25, 2005
TRANSCRIPT
SAS Lecture 5 – Some regression procedures
Aidan McDermott,April 25, 2005
What will the output from this program look like?How many variables will be in the dataset example, and what will be the length and type of each variable?What will the variable package look like?
What will the output from this program look like?
Modeling with SAS
examine relationships between variables
estimate parameters and their standard errors
calculate predicted values evaluate the fit or lack of fit of a model test hypotheses
design outcome
The linear model
Example:
kk xxxy 22110
),0(~ 2 N
AgeHeightWeight 210
Note: outcome variable must be continuous and normal given independent variables
the linear model with proc reg
estimates parameters by least squares produces diagnostics to test model fit
(e.g. scatter plots) tests hypotheses
Example:
proc reg data=mydata;
model weight = height age;
run;
proc reg
Syntax:
proc reg <options>;
model response = effects </options>;
plot yvariable*xvariable = ’symbol’;
by varlist;
output <OUT=SAS data set>
<output statistic list>;
run;
proc reg
proc reg statement syntax:
data = SAS data set name input data set
outest = SAS data set name creates data set with parameter estimates
simple prints simple statistics
proc regthe model statement
model response=<effects></options>;
required variables must be numeric many options can specify more than one model
statement
Example:
model weight = height age; model weight = height age / p clm cli;
proc regthe plot statement plot yvariable*xvariable <=symbol> </options>;
produces scatter plots - yvariable on the vertical axis and xvariable on the horizontal axis
can specify several plots optional symbol to mark points yvariable and xvariable can be variables specified
in model statements or statistics available in output statement
Example:
plot weight * age / pred; plot r. * p. / vref = 0;
proc reg
some statistics available for plotting: P. predicted values R. residuals L95. lower 95% CI bound for individual prediction U95. upper 95% CI bound for individual prediction L95M. lower 95% CI bound for mean of dependent variable U95M. upper 95% CI bound for mean of dependent variable
Example:
plot weight * age / pred; plot r. * p. / vref = 0; plot (weight p. l95. U95.) * age / overlay;
proc reg
the output statement
output <OUT=SAS data set> keywords=names;
creates SAS data set all original variables included keyword=names specifies the statistics to include
Example:
output out=pvals p=pred r=resid;
Example: NMES
variables of interest:
totalexp – total medical expenditure ($)
chd5 – indicator of CHD
lastage – age at last interview
male – sex of participant
proc reg example
here:
1. model estimate parameters etc 2. plot make three plots 3. output make an output dataset regout
The run statementMany people assume that the run statement ends a procedure such as proc reg.
This is because when SAS encounters a run statement it executes any outstanding instructions in the program buffer. But it may or may not end the procedure.
proc reg data=lecture4.nmes; model totalexp = chd5 lastage male;run; model totalexp = chd5 lastage; plot r.*chd5;run;quit; /* ends the procedure */
proc glm (the general linear model) uses least-squares with generalized inverses performs linear regression, analysis of variance, analysis of covariance accepts classification variables (discrete) and continuous variables estimates and performs tests for general linear effects proc anova is suitable for “balanced” designs; proc glm can be used
for either balanced or unbalanced designs suitable for random effects models
proc glm
Syntax:
proc glm data=name <options>;
class classification variables;
model response=effects /options;
means effects / options;
random effects / options;
estimate ‘label’ effect value / options;
contrast ‘label’ effect value / options;
run;
proc glm
response (dependent) variable is continuous – same normality assumption as in proc reg
independent variables are discrete or continuous; discrete must listed on class statement
interaction terms can be with an asterisk a*b, e.g.
model bmi= a b a*b;
proc glm
means effects / options; computes arithmetic means and standard
deviations of all continuous variables in the model (both dependent and independent) within each group for effects specified on the right-hand side of the model statement
only class variables may be specified as effects
options specify multiple comparison methods for main effect terms in the model
proc glm example
here:
1. solution show estimated parameters 2. means show means for smoke variable 3. class treat smoke as discrete
proc glm example
here:
1. format changes reference group
reg and glm Both the proc reg and proc glm
procedures are suitable only when the outcome variable is normally distributed.
proc reg has many regression diagnostic features, while proc glm allows you to fit more sophisticated linear models such as random effects models, models for unbalanced designs etc.
non-normal outcomes In many situations we cannot assume our
response variable is normally distributed. proc reg and proc glm are not suitable for
modeling such outcomes.
Example: Suppose you are interested in estimating
the prevalence of disease in a population. You have an indicator of disease (1 = Yes, 0 = No)
non-normal outcomes Example: You are interested in estimating how the
incidence of infant mortality has changed as a function of time
Example: You are interested in estimating the
median survival time for two groups of patients receiving either a placebo or treatment.
Example:
Survey data: parent agrees to close school when certain toxic elements are found in the environment
Variables: close 0 = no, 1 = yes lived years lived in
community
proc logistic
proc logistic
Syntax:
proc logistic <options>;
model response = effects </options>;
class variables;
by variables;
output <out=name> <keyword=name>;
run;
proc logistic
• descending option means that we are modeling the probability that close=1 and not the probability that close=0.
proc genmod• implements the generalized linear model• fits models with normal, binomial or
poisson response variable (among others)• fits generalized estimating equations for
repeated measures data
proc genmod
Syntax:
proc genmod <options>;
by variables;
class variables;
model response = effects </options>;
output <out=name> <keyword=name>;
make ‘table’ out=name;
run;
proc genmod: class statement says which variables are
classification (categorical) variables by statement produces a separate analysis for
each level of the by variables (data must be sorted in the order of the by variables)
response variable is the response (dependent) variable in the regression model.
<effects> are a list of variables. These are the independent variables in the regression model. Any independent variables that are categorical must be listed in the Class statement.
Example:
• Same model as we produced with proc glm. The default is a linear model.
• smoke will be treated as a categorical variable because of the class statement
options for the model statementdist = option specifies the distribution of the response
variable. (default = normal)
link = option specifies the link that will transform the response variable (default = identity)
Examples:
logistic regression: dist=binomial link=logit poisson regression: dist=poisson link=log
options for the model statementalpha = specifies confidence level for confidence
intervals
waldci or lrci specifies that confidence intervals are to be computed. The waldci gives approximate intervals and doesn’t take as long as lrci. The lrci give intervals based on likelihood ratio.
the output statement• the output statement is just one of the ways to create
a new SAS dataset containing results form the genmod procedure.
• statement is similar to that found in proc means and proc glm.
Example:
output out=new
predicted=fit upper=upper lower=lower;
the make statement• the make statement is another way to create a new
SAS dataset containing results form the genmod procedure.
• ods is another more general way (see later).
Example:
make ‘ParameterEstimates’ out=parms;
make ‘ParmInfo’ out=parminfo;
example: logistic regression
Perform a logistic regression analysis to determine how the odds of CHD are associated with age and gender in the 1987 NMES
Save the parameter estimates as a new dataset. Save the predicted values along with the
original data.
Example:
• descending options means that we are modeling the probability that chd5=1 and not the probability that chd5=0.