manual de sas

8/22/2019 Manual de SAS

1/53

Overview of SAS/STAT Software

SAS/STAT software, a component of the SAS System, provides comprehensivestatistical tools for a wide range of statistical analyses, including analysis ofvariance, regression, categorical data analysis, multivariate analysis, survival

analysis, psychometric analysis, cluster analysis, and nonparametric analysis. Afew examples include mixed models, generalized linear models, correspondenceanalysis, and structural equations. The software is constantly being updated toreflect new methodology.

In addition to 54 procedures for statistical analysis, SAS/STAT software alsoincludes the Market Research Application (MRA), a point-and-click interface tocommonly used techniques in market research. Also, the Analyst Application inthe SAS System provides convenient access to some of the more commonlyused statistical analyses in SAS/STAT software including analysis of variance,regression, logistic regression, mixed models, survival analysis, and some

multivariate techniques.

About This Book

Since SAS/STAT software is a part of the SAS System, this book assumes thatyou are familiar with base SAS software and with the books SAS LanguageReference: Dictionary, SAS Language Reference: Concepts, and the SASProcedures Guide. It also assumes that you are familiar with basic SAS Systemconcepts such as creating SAS data sets with the DATA step and manipulatingSAS data sets with the procedures in base SAS software (for example, the

PRINT and SORT procedures).

Chapter Organization

Typographical Conventions

Options Used in Examples

Chapter Organization

This book is organized as follows.

Chapter 1, this chapter, provides an overview of SAS/STAT software andsummarizes related information, products, and services. The next ten chaptersprovide some introduction to the broad areas covered by SAS/STAT software.Subsequent chapters describe the SAS procedures that make up SAS/STATsoftware. These chapters appear in alphabetical order by procedure name.
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_index.htm#stat_intro_introhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_index.htm#stat_intro_intro


2/53

The chapters documenting the SAS/STAT procedures are organized as follows:

The Overviewsection provides a brief description of the analysis providedby the procedure.

The Getting Startedsection provides a quick introduction to the procedure

through a simple example. The Syntaxsection describes the SAS statements and options that control

the procedure. The Details section discusses methodology and miscellaneous details.

The Examples section contains examples using the procedure.

The References section contains references for the methodology and

examples for the procedure.

Following the chapters on the SAS/STAT procedures, Appendix A, "Special SASData Sets," documents the special SAS data sets associated with SAS/STATprocedures.

Typographical Conventions

This book uses several type styles for presenting information. The following listexplains the meaning of the typographical conventions used in this book:

romanis the standard type style used for most text.

UPPERCASE ROMANis used for SAS statements, options, and other SAS language elements

when they appear in the text. However, you can enter these elements inyour own SAS programs in lowercase, uppercase, or a mixture of the two.

UPPERCASE BOLDis used in the "Syntax" sections' initial lists of SAS statements and options.

obliqueis used for user-supplied values for options in the syntax definitions. In thetext, these values are written in italic.

helveticais used for the names of variables and data sets when they appear in thetext.

boldis used to refer to matrices and vectors.

italicis used for terms that are defined in the text, for emphasis, and forreferences to publications.
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/appssds_index.htm#stat_appssds_appssdshttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/appssds_index.htm#stat_appssds_appssdshttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/appssds_index.htm#stat_appssds_appssdshttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/appssds_index.htm#stat_appssds_appssds


3/53

monospaceis used for example code. In most cases, this book uses lowercase typefor SAS code.

Options Used in Examples

Output of Examples

Most of the output shown in this book is produced with the following SAS Systemoptions:

options linesize=80 pagesize=200 nonumber nodate;

The template STATDOC.TPL is used to create the HTML output that appears inthe online (CD) version. A style template controls stylistic HTML elements suchas colors, fonts, and presentation attributes. The style template is specified in theODS HTML statement as follows:

ODS HTML style=statdoc;

If you run the examples, you may get slightly different output. This is a function ofthe SAS System options used and the precision used by your computer forfloating-point calculations.

Graphics Options

The examples that contain graphical output are created with a specific set ofoptions and symbol statements. The code you see in the examples creates thecolor graphics that appear in the online (CD) version of this book. A slightly

different set of options and statements is used to create the black and whitegraphics that appear in the printed version of the book.

If you run the examples, you may get slightly different results. This may occurbecause not all graphic options for color devices translate directly to black andwhite output formats. For complete information on SAS/GRAPH software andgraphics options, refer to SAS/GRAPH Software: Reference.

The following GOPTIONS statement is used to create the online (color) versionof the graphic output.

filename GSASFILE '';

goptions gsfname=GSASFILE gsfmode =replace

fileonly

transparency dev = gif

ftext = swiss lfactor = 1

htext = 4.0pct htitle = 4.5pct

hsize = 5.625in vsize = 3.5in

noborder cback = white


4/53

horigin = 0in vorigin = 0in ;

The following GOPTIONS statement is used to create the black and whiteversion of the graphic output, which appears in the printed version of the manual.

filename GSASFILE '';

goptions gsfname=GSASFILE gsfmode =replace

gaccess = sasgaedt fileonly

dev = pslepsf

ftext = swiss lfactor = 1

htext = 3.0pct htitle = 3.5pct

hsize = 5.625in vsize = 3.5in

border cback = white

horigin = 0in vorigin = 0in ;

In most of the online examples, the plot symbols are specified as follows:

symbol1 value=dot color=white height=3.5pct;

The SYMBOLn statements used in online examples order the symbol colors asfollows: white, yellow, cyan, green, orange, blue, and black.

In the examples appearing in the printed manual, symbol statements specifyCOLOR=BLACK and order the plot symbols as follows: dot, square, triangle,circle, plus, x, diamond, and star.

The %PLOTIT Macro

Examples that use the %PLOTIT macro are generated by defining a specialmacro variable to specify graphics options. See Appendix B, "Using the%PLOTIT Macro," for details on the options specified in these examples.

Where to Turn for More Information

This section describes other sources of information about SAS/STAT software.

Accessing the SAS/STAT Sample Library

Online Help System

SAS Institute Technical Support Services

Introduction to Regression Procedures
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/appplm_index.htm#stat_appplm_appplmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/appplm_index.htm#stat_appplm_appplmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/appplm_index.htm#stat_appplm_appplmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/appplm_index.htm#stat_appplm_appplmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/intro_sect9.htm


5/53

Overview

Statistical Background

References

Overview

This chapter reviews SAS/STAT software procedures that are used forregression analysis: CATMOD, GLM, LIFEREG, LOGISTIC, NLIN, ORTHOREG,PLS, PROBIT, REG, RSREG, and TRANSREG. The REG procedure providesthe most general analysis capabilities; the other procedures give morespecialized analyses. This chapter also briefly mentions several procedures inSAS/ETS software.

Introduction

Introductory Example

General Regression: The REG Procedure

Nonlinear Regression: The NLIN Procedure

Response Surface Regression: The RSREG Procedure

Partial Least Squares Regression: The PLS ProcedureRegression for Ill-conditioned Data: The ORTHOREG Procedure

Logistic Regression: The LOGISTIC Procedure

Regression With Transformations: The TRANSREG Procedure

Regression Using the GLM, CATMOD, LOGISTIC, PROBIT, and

LIFEREG Procedures

Interactive Features in the CATMOD, GLM, and REG ProceduresIntroduction to Regression Procedures
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect1.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect20.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect2.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect1.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect20.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect2.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htm


6/53

Introduction

Many SAS/STAT procedures, each with special features, perform regressionanalysis. The following procedures perform at least one type of regressionanalysis:

CATMODanalyzes data that can be represented by a contingency table. PROCCATMOD fits linear models to functions of response frequencies, and itcan be used for linear and logistic regression. The CATMOD procedure isdiscussed in detail in Chapter 4, "Introduction to Categorical Data AnalysisProcedures."

GENMODfits generalized linear models. PROC GENMOD is especially suited forresponses with discrete outcomes, and it performs logistic regression and

Poisson regression as well as fitting Generalized Estimating Equations forrepeated measures data. See Chapter 4, "Introduction to Categorical DataAnalysis Procedures," and Chapter 30, "The GENMOD Procedure," formore information.

GLMuses the method of least squares to fit general linear models. In additionto many other analyses, PROC GLM can perform simple, multiple,polynomial, and weighted regression. PROC GLM has many of the sameinput/output capabilities as PROC REG, but it does not provide as manydiagnostic tools or allow interactive changes in the model or data. SeeChapter 3, "Introduction to Analysis-of-Variance Procedures," for a moredetailed overview of the GLM procedure.

LIFEREGfits parametric models to failure-time data that may be right censored.These types of models are commonly used in survival analysis. SeeChapter 9, "Introduction to Survival Analysis Procedures," for a moredetailed overview of the LIFEREG procedure.

LOGISTICfits logistic models for binomial and ordinal outcomes. PROC LOGISTICprovides a wide variety of model-building methods and computes

numerous regression diagnostics. See Chapter 4, "Introduction toCategorical Data Analysis Procedures," for a brief comparison of PROCLOGISTIC with other procedures.

NLINbuilds nonlinear regression models. Several different iterative methods areavailable.
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/genmod_index.htm#stat_genmod_genmodhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introanova_index.htm#stat_introanova_introanovahttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introsurv_index.htm#stat_introsurv_introsurvhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/genmod_index.htm#stat_genmod_genmodhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introanova_index.htm#stat_introanova_introanovahttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introsurv_index.htm#stat_introsurv_introsurvhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcat


7/53

ORTHOREGperforms regression using the Gentleman-Givens computational method.For ill-conditioned data, PROC ORTHOREG can produce more accurateparameter estimates than other procedures such as PROC GLM andPROC REG.

PLSperforms partial least squares regression, principal componentsregression, and reduced rank regression, with cross validation for thenumber of components.

PROBITperforms probit regression as well as logistic regression and ordinallogistic regression. The PROBIT procedure is useful when the dependentvariable is either dichotomous or polychotomous and the independentvariables are continuous.

REGperforms linear regression with many diagnostic capabilities, selectsmodels using one of nine methods, produces scatter plots of raw data andstatistics, highlights scatter plots to identify particular observations, andallows interactive changes in both the regression model and the data usedto fit the model.

RSREGbuilds quadratic response-surface regression models. PROC RSREGanalyzes the fitted response surface to determine the factor levels ofoptimum response and performs a ridge analysis to search for the region

of optimum response.

TRANSREGfits univariate and multivariate linear models, optionally with spline andother nonlinear transformations. Models include ordinary regression and

ANOVA, multiple and multivariate regression, metric and nonmetricconjoint analysis, metric and nonmetric vector and ideal point preferencemapping, redundancy analysis, canonical correlation, and responsesurface regression.

Several SAS/ETS procedures also perform regression. The following procedures

are documented in the SAS/ETS User's Guide.

AUTOREGimplements regression models using time-series data where the errors areautocorrelated.

PDLREGperforms regression analysis with polynomial distributed lags.


8/53

SYSLINhandles linear simultaneous systems of equations, such as econometricmodels.

MODEL

handles nonlinear simultaneous systems of equations, such aseconometric models.

Previous Next


Introductory Example

Regression analysis is the analysis of the relationship between one variable andanother set of variables. The relationship is expressed as an equation that

predicts a response variable (also called a dependent variable orcriterion) from afunction ofregressor variables (also called independent variables, predictors,explanatory variables, factors, orcarriers) andparameters. The parameters areadjusted so that a measure of fit is optimized. For example, the equation for theith observation might be

where yi is the response variable,xi is a regressor variable, and areunknown parameters to be estimated, and is an error term.

You might use regression analysis to find out how well you can predict a child'sweight if you know that child's height. Suppose you collect your data bymeasuring heights and weights of 19 school children. You want to estimate the

intercept and the slope of a line described by the equation

where

Weightis the response variable.

,are the unknown parameters.

Heightis the regressor variable.

is the unknown error.
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect2.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect2.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect2.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htm


9/53

The data are included in the following program. The results are displayed inFigure 2.1 and Figure 2.2.

data class;

input Name $ Height Weight Age;datalines;

Alfred 69.0 112.5 14

Alice 56.5 84.0 13

Barbara 65.3 98.0 13

Carol 62.8 102.5 14

Henry 63.5 102.5 14

James 57.3 83.0 12

Jane 59.8 84.5 12

Janet 62.5 112.5 15

Jeffrey 62.5 84.0 13

John 59.0 99.5 12

Joyce 51.3 50.5 11

Judy 64.3 90.0 14Louise 56.3 77.0 12

Mary 66.5 112.0 15

Philip 72.0 150.0 16

Robert 64.8 128.0 12

Ronald 67.0 133.0 15

Thomas 57.5 85.0 11

William 66.5 112.0 15

;

symbol1 v=dot c=blue height=3.5pct;

proc reg;

model Weight=Height;

plot Weight*Height/cframe=ligr;

run;

The REG Procedure

Model: MODEL1

Dependent Variable: Weight

Analysis of Variance

Source DF

Sum of

Squares

Mean

Square F Value Pr > F

Model 1 7193.24912 7193.24912 57.08


10/53

Root MSE 11.22625 R-Square 0.7705

Dependent Mean 100.02632 Adj R-Sq 0.7570

Coeff Var 11.22330

Parameter Estimates

Variable DFParameter

EstimateStandard

Error t Value Pr > |t|

Intercept 1 -143.02692 32.27459 -4.43 0.0004

Height 1 3.89903 0.51609 7.55


11/53

Weight = -143.0 + 3.9* Height

Regression is often used in an exploratory fashion to look for empiricalrelationships, such as the relationship between Height and Weight. In thisexample, Height is not the cause of Weight. You would need a controlled

experiment to confirm scientifically the relationship. See the "Comments onInterpreting Regression Statistics" section for more information.

The method most commonly used to estimate the parameters is to minimize thesum of squares of the differences between the actual response value and thevalue predicted by the equation. The estimates are called least-squaresestimates, and the criterion value is called the error sum of squares

where b0 and b1 are the estimates of and that minimize SSE.

For a general discussion of the theory of least-squares estimation of linearmodels and its application to regression and analysis of variance, refer to one ofthe applied regression texts, including Draper and Smith (1981), Daniel andWood (1980), Johnston (1972), and Weisberg (1985).

SAS/STAT regression procedures produce the following information for a typicalregression analysis.

parameter estimates using the least-squares criterion estimates of the variance of the error term

estimates of the variance or standard deviation of the sampling distributionof the parameter estimates

tests of hypotheses about the parameters

SAS/STAT regression procedures can produce many other specializeddiagnostic statistics, including

collinearity diagnostics to measure how strongly regressors are related to

other regressors and how this affects the stability and variance of the

estimates (REG) influence diagnostics to measure how each individual observation

contributes to determining the parameter estimates, the SSE, and thefitted values (LOGISTIC, REG, RSREG)

lack-of-fit diagnostics that measure the lack of fit of the regression model

by comparing the error variance estimate to another pure error variancethat is not dependent on the form of the model (CATMOD, PROBIT,RSREG)
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htm#stat_introreg_intregcirshttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htm#stat_introreg_intregcirshttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htm#stat_introreg_intregcirshttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htm#stat_introreg_intregcirs


12/53

diagnostic scatter plots that check the fit of the model and highlighted

scatter plots that identify particular observations or groups of observations(REG)

predicted and residual values, and confidence intervals for the mean and

for an individual value (GLM, LOGISTIC, REG)

time-series diagnostics for equally spaced time-series data that measurehow much errors may be related across neighboring observations. Thesediagnostics can also measure functional goodness of fit for data sorted byregressor or response variables (REG, SAS/ETS procedures).

Previous Next Top

Copyright 2002 by SAS Institute Inc., Cary, NC, USA. All rights reserved.

Previous Next


General Regression: The REG Procedure

The REG procedure is a general-purpose procedure for regression that

handles multiple regression models

provides nine model-selection methods allows interactive changes both in the model and in the data used to fit the

model

allows linear equality restrictions on parameters tests linear hypotheses and multivariate hypotheses

produces collinearity diagnostics, influence diagnostics, and partial

regression leverage plots saves estimates, predicted values, residuals, confidence limits, and other

diagnostic statistics in output SAS data sets generates plots of data and of various statistics "paints" or highlights scatter plots to identify particular observations or

groups of observations uses, optionally, correlations or crossproducts for input

Model-selection Methods in PROC REGThe nine methods of model selection implemented in PROC REG areNONE

no selection. This method is the default and uses the full model given inthe MODEL statement to fit the linear regression.

FORWARD
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect2.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect2.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect2.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htm


13/53

forward selection. This method starts with no variables in the model andadds variables one by one to the model. At each step, the variable addedis the one that maximizes the fit of the model. You can also specify groupsof variables to treat as a unit during the selection process. An optionenables you to specify the criterion for inclusion.

BACKWARDbackward elimination. This method starts with a full model and eliminatesvariables one by one from the model. At each step, the variable with thesmallest contribution to the model is deleted. You can also specify groupsof variables to treat as a unit during the selection process. An optionenables you to specify the criterion for exclusion.

STEPWISEstepwise regression, forward and backward. This method is a modificationof the forward-selection method in that variables already in the model donot necessarily stay there. You can also specify groups of variables totreat as a unit during the selection process. Again, options enable you tospecify criteria for entry into the model and for remaining in the model.

MAXRmaximum R2 improvement. This method tries to find the best one-variablemodel, the best two-variable model, and so on. The MAXR method differsfrom the STEPWISE method in that many more models are evaluated withMAXR, which considers all switches before making any switch. TheSTEPWISE method may remove the "worst" variable without consideringwhat the "best" remaining variable might accomplish, whereas MAXRwould consider what the "best" remaining variable might accomplish.

Consequently, MAXR typically takes much longer to run than STEPWISE.

MINRminimum R2 improvement. This method closely resembles MAXR, but theswitch chosen is the one that produces the smallest increase in R2.

RSQUAREfinds a specified number of models having the highest R2 in each of arange of model sizes.

CP

finds a specified number of models with the lowest Cp within a range ofmodel sizes.

ADJRSQfinds a specified number of models having the highest adjusted R2 within arange of model sizes.

Previous Next Top
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect3.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htm#topofpage


14/53


Previous Next


Nonlinear Regression: The NLIN Procedure

The NLIN procedure implements iterative methods that attempt to find least-squares estimates for nonlinear models. The default method is Gauss-Newton,although several other methods, such as Newton and Marquardt, are available.You must specify parameter names, starting values, and expressions for themodel. All necessary analytical derivatives are calculated automatically for you.Grid search is also available to select starting values for the parameters. Sincenonlinear models are often difficult to estimate, PROC NLIN may not always findthe globally optimal least-squares estimates.

Previous Next Top


Previous Next


Response Surface Regression: The RSREG Procedure

The RSREG procedure fits a quadratic response-surface model, which is usefulin searching for factor values that optimize a response. The following features inPROC RSREG make it preferable to other regression procedures for analyzingresponse surfaces:

automatic generation of quadratic effects

a lack-of-fit test

solutions for critical values of the surface

eigenvalues of the associated quadratic form a ridge analysis to search for the direction of optimum response

Previous Next Top


Previous Next

http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect4.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect5.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htm


15/53

Partial Least Squares Regression: The PLS Procedure

The PLS procedure fits models using any one of a number of linear predictive

methods, includingpartial least squares (PLS). Ordinary least-squaresregression, as implemented in SAS/STAT procedures such as PROC GLM andPROC REG, has the single goal of minimizing sample response prediction error,seeking linear functions of the predictors that explain as much variation in eachresponse as possible. The techniques implemented in the PLS procedure havethe additional goal of accounting for variation in the predictors, under theassumption that directions in the predictor space that are well sampled shouldprovide better prediction fornewobservations when the predictors are highlycorrelated. All of the techniques implemented in the PLS procedure work byextracting successive linear combinations of the predictors, called factors (alsocalled components orlatent vectors), which optimally address one or both of

these two goals -explaining response variation and explaining predictor variation.In particular, the method of partial least squares balances the two objectives,seeking for factors that explain both response and predictor variation.

Previous Next Top


Previous Next


Regression for Ill-conditioned Data: The ORTHOREG Procedure

The ORTHOREG procedure performs linear least-squares regression using theGentleman-Givens computational method, and it can produce more accurateparameter estimates for ill-conditioned data. PROC GLM and PROC REGproduce very accurate estimates for most problems. However, if you have veryill-conditioned data, consider using the ORTHOREG procedure. The collinearitydiagnostics in PROC REG can help you to determine whether PROCORTHOREG would be useful.

Previous Next Top


Previous Next

http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect6.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect7.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htm


16/53

Logistic Regression: The LOGISTIC Procedure

The LOGISTIC procedure fits logistic models, in which the response can be

either dichotomous or polychotomous. Stepwise model selection is available.You can request regression diagnostics, and predicted and residual values.

Previous Next Top


Previous Next


Regression With Transformations: The TRANSREG Procedure

The TRANSREG procedure can fit many standard linear models. In addition,PROC TRANSREG can find nonlinear transformations of data and fit a linearmodel to the transformed variables. This is in contrast to PROC REG and PROCGLM, which fit linear models to data, or PROC NLIN, which fits nonlinear modelsto data. The TRANSREG procedure fits many types of linear models, including

ordinary regression and ANOVA

metric and nonmetric conjoint analysis

metric and nonmetric vector and ideal point preference mapping simple, multiple, and multivariate regression with variable transformations redundancy analysis with variable transformations

canonical correlation analysis with variable transformations

response surface regression with variable transformations

Previous Next Top


Previous Next

http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect8.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect9.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htm


17/53

Regression Using the GLM, CATMOD, LOGISTIC, PROBIT, andLIFEREG Procedures

The GLM procedure fits general linear models to data, and it can performregression, analysis of variance, analysis of covariance, and many other

analyses. The following features for regression distinguish PROC GLM fromother regression procedures:

direct specification of polynomial effects

ease of specifying categorical effects (PROC GLM automatically

generates dummy variables for class variables)

Most of the statistics based on predicted and residual values that are available inPROC REG are also available in PROC GLM. However, PROC GLM does notproduce collinearity diagnostics, influence diagnostics, or scatter plots. Inaddition, PROC GLM allows only one model and fits the full model.

See Chapter 3, "Introduction to Analysis-of-Variance Procedures," and Chapter31, "The GLM Procedure," for more details.

The CATMOD procedure can perform linear regression and logistic regression ofresponse functions for data that can be represented in a contingency table. SeeChapter 4, "Introduction to Categorical Data Analysis Procedures," and Chapter21, "The CATMOD Procedure," for more details.

The LOGISTIC and PROBIT procedures can perform logistic and ordinal logisticregression. See Chapter 4, "Introduction to Categorical Data Analysis

Procedures," Chapter 40, "The LOGISTIC Procedure," and Chapter 57, "ThePROBIT Procedure," for additional details.

The LIFEREG procedure is useful in fitting equations to data that may be right-censored. See Chapter 9, "Introduction to Survival Analysis Procedures," andChapter 37, "The LIFEREG Procedure," for more details.

Previous Next Top


Previous Next

http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introanova_index.htm#stat_introanova_introanovahttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/glm_index.htm#stat_glm_glmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/catmod_index.htm#stat_catmod_catmodhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/logistic_index.htm#stat_logistic_logistichttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/probit_index.htm#stat_probit_probithttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/probit_index.htm#stat_probit_probithttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introsurv_index.htm#stat_introsurv_introsurvhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introsurv_index.htm#stat_introsurv_introsurvhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/lifereg_index.htm#stat_lifereg_lifereghttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introanova_index.htm#stat_introanova_introanovahttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/glm_index.htm#stat_glm_glmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/catmod_index.htm#stat_catmod_catmodhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introcat_index.htm#stat_introcat_introcathttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/logistic_index.htm#stat_logistic_logistichttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/probit_index.htm#stat_probit_probithttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/probit_index.htm#stat_probit_probithttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introsurv_index.htm#stat_introsurv_introsurvhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/lifereg_index.htm#stat_lifereg_lifereghttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect10.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htm


18/53

Interactive Features in the CATMOD, GLM, and REG Procedures

The CATMOD, GLM, and REG procedures do not stop after processing a RUNstatement. More statements can be submitted as a continuation of the previousstatements. Many new features in these procedures are useful to request after

you have reviewed the results from previous statements. The procedures stop ifa DATA step or another procedure is requested or if a QUIT statement issubmitted.

Previous Next Top


Previous Next


Statistical Background

The rest of this chapter outlines the way many SAS/STAT regression procedurescalculate various regression quantities. Exceptions and further details aredocumented with individual procedures.

Linear Models

Parameter Estimates and Associated Statistics

Comments on Interpreting Regression Statistics

Predicted and Residual Values

Testing Linear Hypotheses

Multivariate Tests

Previous Next Top


Previous Next

http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect14.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect14.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect15.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect17.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect19.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect14.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect15.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect11.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect14.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect14.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect14.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect15.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect17.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect19.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect12.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect14.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect14.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect13.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect15.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect15.htm


19/53

Linear Models

In matrix algebra notation, a linear model is written as

where X is the n kdesign matrix (rows are observations and columns are the

regressors), is the k1 vector of unknown parameters, and is the n 1 vectorof unknown errors. The first column ofX is usually a vector of 1s used inestimating the intercept term.

The statistical theory of linear models is based on strict classical assumptions.Ideally, the response is measured with all the factors controlled in anexperimentally determined environment. If you cannot control the factorsexperimentally, some tests must be interpreted as being conditional on theobserved values of the regressors.

Other assumptions are that

the form of the model is correct (all important explanatory variables have

been included) regressor variables are measured without error the expected value of the errors is zero

the variance of the error (and thus the dependent variable) for the ith

observation is , where wi is a known weight factor. Usually, wi=1 for

all iand thus is the common, constant variance. the errors are uncorrelated across observations

When hypotheses are tested, the additional assumption is made that the errorsare normally distributed.

Statistical Model

If the model satisfies all the necessary assumptions, the least-squares estimatesare the best linear unbiased estimates (BLUE). In other words, the estimateshave minimum variance among the class of estimators that are unbiased and arelinear functions of the responses. If the additional assumption that the error termis normally distributed is also satisfied, then

the statistics that are computed have the proper sampling distributions forhypothesis testing parameter estimates are normally distributed various sums of squares are distributed proportional to chi-square, at least

under proper hypotheses ratios of estimates to standard errors are distributed as Student's tunder

certain hypotheses


20/53

appropriate ratios of sums of squares are distributed as Funder certain

hypotheses

When regression analysis is used to model data that do not meet theassumptions, the results should be interpreted in a cautious, exploratory fashion.

The significance probabilities under these circumstances are unreliable.

Box (1966) and Mosteller and Tukey (1977, chaps. 12 and 13) discuss theproblems that are encountered with regression data, especially when the dataare not under experimental control.

Previous Next Top


Previous Next


Parameter Estimates and Associated Statistics

Parameter estimates are formed using least-squares criteria by solving thenormal equations

(X' WX)b = X' Wy

for the parameter estimates b, where Wis a diagonal matrix with the observedweights on the diagonal, yielding

b = (X'WX)-1X'Wy

Assume for the present that X'WX has full column rank k(this assumption is

relaxed later). The variance of the error is estimated by the mean square error

where xi is the ith row of regressors. The parameter estimates are unbiased:

The covariance matrix of the estimates is


21/53

The estimate of the covariance matrix is obtained by replacing with itsestimate, s2, in the formula preceding:

COVB = (X' WX)-1s2

The correlations of the estimates are derived by scaling to 1s on the diagonal.

Let

Standard errors of the estimates are computed using the equation

where (X' WX)-1ii is the ith diagonal element of(X' WX)-1. The ratio

t= [(bi)/( STDERR(bi))]

is distributed as Student's tunder the hypothesis that is zero. Regressionprocedures display the tratio and the significance probability, which is the

probability under the hypothesis of a larger absolute tvalue than was

actually obtained. When the probability is less than some small level, the event isconsidered so unlikely that the hypothesis is rejected.

Type I SS and Type II SS measure the contribution of a variable to the reductionin SSE. Type I SS measure the reduction in SSE as that variable is entered intothe model in sequence. Type II SS are the increment in SSE that results fromremoving the variable from the full model. Type II SS are equivalent to the TypeIII and Type IV SS reported in the GLM procedure. If Type II SS are used in thenumerator of an Ftest, the test is equivalent to the ttest for the hypothesis thatthe parameter is zero. In polynomial models, Type I SS measure the contributionof each polynomial term after it is orthogonalized to the previous terms in the

model. The four types of SS are described in Chapter 11, "The Four Types ofEstimable Functions."

Standardized estimates are defined as the estimates that result when allvariables are standardized to a mean of 0 and a variance of 1. Standardizedestimates are computed by multiplying the original estimates by the samplestandard deviation of the regressor variable and dividing by the sample standarddeviation of the dependent variable.
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introglmest_index.htm#stat_introglmest_introglmesthttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introglmest_index.htm#stat_introglmest_introglmesthttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introglmest_index.htm#stat_introglmest_introglmesthttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introglmest_index.htm#stat_introglmest_introglmest


22/53

R2 is an indicator of how much of the variation in the data is explained by themodel. It is defined as

R2 = 1 - [ SSE/ TSS]

where SSE is the sum of squares for error and TSS is the corrected total sum ofsquares. The Adjusted R2 statistic is an alternative to R2 that is adjusted for thenumber of parameters in the model. This is calculated as

ADJRSQ = 1 - [(n - i)/(n -p)] (1 - R2 )

where n is the number of observations used to fit the model, p is the number ofparameters in the model (including the intercept), and iis 1 if the model includesan intercept term, and 0 otherwise.

Tolerances and variance inflation factors measure the strength of

interrelationships among the regressor variables in the model. If all variables areorthogonal to each other, both tolerance and variance inflation are 1. If a variableis very closely related to other variables, the tolerance goes to 0 and the varianceinflation gets very large. Tolerance (TOL) is 1 minus the R2 that results from theregression of the other variables in the model on that regressor. Varianceinflation (VIF) is the diagonal of(X' WX)-1 if(X' WX) is scaled to correlation form.The statistics are related as

VIF = [1/ TOL]

Models Not of Full Rank

If the model is not full rank, then a generalized inverse can be used to solve thenormal equations to minimize the SSE:

b = (X' WX)- X' Wy

However, these estimates are not unique since there are an infinite number ofsolutions using different generalized inverses. PROC REG and other regressionprocedures choose a nonzero solution for all variables that are linearlyindependent of previous variables and a zero solution for other variables. Thiscorresponds to using a generalized inverse in the normal equations, and theexpected values of the estimates are the Hermite normal form ofX' WXmultiplied by the true parameters:

Degrees of freedom for the zeroed estimates are reported as zero. Thehypotheses that are not testable have ttests displayed as missing. The messagethat the model is not full rank includes a display of the relations that exist in thematrix.


23/53

Previous Next Top


Previous Next


Comments on Interpreting Regression Statistics

In most applications, regression models are merely useful approximations.Reality is often so complicated that you cannot know what the true model is. Youmay have to choose a model more on the basis of what variables can bemeasured and what kinds of models can be estimated than on a rigorous theorythat explains how the universe really works. However, even in cases where

theory is lacking, a regression model may be an excellent predictor of theresponse if the model is carefully formulated from a large sample. Theinterpretation of statistics such as parameter estimates may nevertheless behighly problematical.

Statisticians usually use the word "prediction" in a technical sense. Prediction inthis sense does not refer to "predicting the future" (statisticians call thatforecasting) but rather to guessing the response from the values of theregressors in an observation taken under the same circumstances as the samplefrom which the regression equation was estimated. If you developed a regressionmodel for predicting consumer preferences in 1958, it may not give very good

predictions in 1988 no matter how well it did in 1958. If it is the future you want topredict, your model must include whatever relevant factors may change overtime. If the process you are studying does in fact change over time, you musttake observations at several, perhaps many, different times. Analysis of suchdata is the province of SAS/ETS procedures such as AUTOREG andSTATESPACE. Refer to the SAS/ETS User's Guide for more information onthese procedures.

The comments in the rest of this section are directed toward linear least-squaresregression. Nonlinear regression and non-least-squares regression oftenintroduce further complications. For more detailed discussions of theinterpretation of regression statistics, see Darlington (1968), Mosteller and Tukey(1977), Weisberg (1985), and Younger (1979).

Interpreting Parameter Estimates from a Controlled Experiment

Parameter estimates are easiest to interpret in a controlled experiment in whichthe regressors are manipulated independently of each other. In a well-designedexperiment, such as a randomized factorial design with replications in each cell,you can use lack-of-fit tests and estimates of the standard error of prediction to


24/53

determine whether the model describes the experimental process with adequateprecision. If so, a regression coefficient estimates the amount by which the meanresponse changes when the regressor is changed by one unit while all the otherregressors are unchanged. However, if the model involves interactions orpolynomial terms, it may not be possible to interpret individual regression

coefficients. For example, if the equation includes both linear and quadratic termsfor a given variable, you cannot physically change the value of the linear termwithout also changing the value of the quadratic term. Sometimes it may bepossible to recode the regressors, for example by using orthogonal polynomials,to make the interpretation easier.

If the nonstatistical aspects of the experiment are also treated with sufficient care(including such things as use of placebos and double blinds), then you can stateconclusions in causal terms; that is, this change in a regressor causes thatchange in the response. Causality can never be inferred from statistical resultsalone or from an observational study.

If the model that you fit is not the true model, then the parameter estimates maydepend strongly on the particular values of the regressors used in theexperiment. For example, if the response is actually a quadratic function of aregressor but you fit a linear function, the estimated slope may be a largenegative value if you use only small values of the regressor, a large positivevalue if you use only large values of the regressor, or near zero if you use bothlarge and small regressor values. When you report the results of an experiment,it is important to include the values of the regressors. It is also important to avoidextrapolating the regression equation outside the range of regressors in thesample.

Interpreting Parameter Estimates from an Observational Study

In an observational study, parameter estimates can be interpreted as theexpected difference in response of two observations that differ by one unit on theregressor in question and that have the same values for all other regressors. Youcannot make inferences about "changes" in an observational study since youhave not actually changed anything. It may not be possible even in principle tochange one regressor independently of all the others. Neither can you drawconclusions about causality without experimental manipulation.

If you conduct an observational study and if you do not know the true form of the

model, interpretation of parameter estimates becomes even more convoluted. Acoefficient must then be interpreted as an average over the sampled populationof expected differences in response of observations that differ by one unit on onlyone regressor. The considerations that are discussed under controlledexperiments for which the true model is not known also apply.


25/53

Comparing Parameter Estimates

Two coefficients in the same model can be directly compared only if theregressors are measured in the same units. You can make any coefficient largeor small just by changing the units. If you convert a regressor from feet to miles,the parameter estimate is multiplied by 5280.

Sometimes standardized regression coefficients are used to compare the effectsof regressors measured in different units. Standardizing the variables effectivelymakes the standard deviation the unit of measurement. This makes sense only ifthe standard deviation is a meaningful quantity, which usually is the case only ifthe observations are sampled from a well-defined population. In a controlledexperiment, the standard deviation of a regressor depends on the values of theregressor selected by the experimenter. Thus, you can make a standardizedregression coefficient large by using a large range of values for the regressor.

In some applications you may be able to compare regression coefficients in

terms of the practical range of variation of a regressor. Suppose that eachindependent variable in an industrial process can be set to values only within acertain range. You can rescale the variables so that the smallest possible valueis zero and the largest possible value is one. Then the unit of measurement foreach regressor is the maximum possible range of the regressor, and theparameter estimates are comparable in that sense. Another possibility is to scalethe regressors in terms of the cost of setting a regressor to a particular value, socomparisons can be made in monetary terms.

Correlated Regressors

In an experiment, you can often select values for the regressors such that the

regressors are orthogonal (not correlated with each other). Orthogonal designshave enormous advantages in interpretation. With orthogonal regressors, theparameter estimate for a given regressor does not depend on which otherregressors are included in the model, although other statistics such as standarderrors andp-values may change.

If the regressors are correlated, it becomes difficult to disentangle the effects ofone regressor from another, and the parameter estimates may be highlydependent on which regressors are used in the model. Two correlatedregressors may be nonsignificant when tested separately but highly significantwhen considered together. If two regressors have a correlation of 1.0, it is

impossible to separate their effects.

It may be possible to recode correlated regressors to make interpretation easier.For example, ifXand Yare highly correlated, they could be replaced in a linearregression byX+YandX-Ywithout changing the fit of the model or statistics forother regressors.


26/53

Errors in the Regressors

If there is error in the measurements of the regressors, the parameter estimatesmust be interpreted with respect to the measured values of the regressors, notthe true values. A regressor may be statistically nonsignificant when measuredwith error even though it would have been highly significant if measured

accurately.

Probability Values (p-values)

Probability values (p-values) do not necessarily measure the importance of aregressor. An important regressor can have a large (nonsignificant)p-value if thesample is small, if the regressor is measured over a narrow range, if there arelarge measurement errors, or if another closely related regressor is included inthe equation. An unimportant regressor can have a very small p-value in a largesample. Computing a confidence interval for a parameter estimate gives youmore useful information than just looking at thep-value, but confidence intervalsdo not solve problems of measurement errors in the regressors or highly

correlated regressors.

Thep-values are always approximations. The assumptions required to computeexactp-values are never satisfied in practice.

Interpreting R2

R2 is usually defined as the proportion of variance of the response that ispredictable from (that can be explained by) the regressor variables. It may be

easier to interpret , which is approximately the factor by which thestandard error of prediction is reduced by the introduction of the regressor

variables.

R2 is easiest to interpret when the observations, including the values of both theregressors and response, are randomly sampled from a well-defined population.Nonrandom sampling can greatly distort R2. For example, excessively largevalues ofR2 can be obtained by omitting from the sample observations withregressor values near the mean.

In a controlled experiment, R2 depends on the values chosen for the regressors.A wide range of regressor values generally yields a largerR2 than a narrowrange. In comparing the results of two experiments on the same variables but

with different ranges for the regressors, you should look at the standard error ofprediction (root mean square error) rather than R2.

Whether a given R2 value is considered to be large or small depends on thecontext of the particular study. A social scientist might consider an R2 of 0.30 tobe large, while a physicist might consider 0.98 to be small.


27/53

You can always get an R2 arbitrarily close to 1.0 by including a large number ofcompletely unrelated regressors in the equation. If the number of regressors isclose to the sample size, R2 is very biased. In such cases, the adjusted R2 andrelated statistics discussed by Darlington (1968) are less misleading.

If you fit many different models and choose the model with the largest R2

, all thestatistics are biased and thep-values for the parameter estimates are not valid.Caution must be taken with the interpretation ofR2 for models with no interceptterm. As a general rule, no-intercept models should be fit only when theoretical

justification exists and the data appear to fit a no-intercept framework. The R2 inthose cases is measuring something different (refer to Kvalseth 1985).

Incorrect Data Values

All regression statistics can be seriously distorted by a single incorrect datavalue. A decimal point in the wrong place can completely change the parameterestimates, R2, and other statistics. It is important to check your data for outliers

and influential observations. The diagnostics in PROC REG are particularlyuseful in this regard.

Previous Next Top


Previous Next


Previous Next


Predicted and Residual Values

After the model has been fit, predicted and residual values are usually calculatedand output. The predicted values are calculated from the estimated regressionequation; the residuals are calculated as actual minus predicted. Someprocedures can calculate standard errors of residuals, predicted mean values,and individual predicted values.

Consider the ith observation where xi is the row of regressors, b is the vector ofparameter estimates, and s2 is the mean squared error.

Let
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect15.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect17.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect15.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect15.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect17.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect17.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect16.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htm


28/53

where X is the design matrix for the observed data, xi is an arbitrary regressorvector (possibly but not necessarily a row ofX), Wis a diagonal matrix with theobserved weights on the diagonal, and wi is the weight corresponding to xi.

Then

The standard error of the individual (future) predicted value yi is

If the predictor vectorxi corresponds to an observation in the analysis data, thenthe residual for that observation is defined as

The ratio of the residual to its standard error, called the studentized residual, issometimes shown as

STUDENTi = [( RESIDi)/( STDERR( RESIDi))]

There are two kinds of confidence intervals for predicted values. One type ofconfidence interval is an interval for the mean value of the response. The other

type, sometimes called aprediction orforecasting interval, is an interval for theactual value of a response, which is the mean value plus error.

For example, you can construct for the ith observation a confidence interval thatcontains the true mean value of the response with probability . The upperand lower limits of the confidence interval for the mean value are

where is the tabulated tstatistic with degrees of freedom equal to thedegrees of freedom for the mean squared error.

The limits for the confidence interval for an actual individual response are


29/53

Influential observations are those that, according to various criteria, appear to

have a large influence on the parameter estimates. One measure of influence,Cook's D, measures the change to the estimates that results from deleting eachobservation:

where kis the number of parameters in the model (including the intercept). Formore information, refer to Cook (1977, 1979).

Thepredicted residualfor observation iis defined as the residual for the ithobservation that results from dropping the ith observation from the parameterestimates. The sum of squares of predicted residual errors is called the PRESSstatistic:

Previous Next Top


Previous Next


Testing Linear Hypotheses

The general form of a linear hypothesis for the parameters is

where L is qk, is k1, and c is q1. To test this hypothesis, the linearfunction is taken with respect to the parameter estimates:

Lb - c


30/53

This has variance

where b is the estimate of .

A quadratic form called the sum of squares due to the hypothesis is calculated:

SS(Lb - c) = (Lb - c)' (L(X' WX)- L')-1 (Lb - c)

If you assume that this is testable, the SS can be used as a numerator of the Ftest:

F= [( SS(Lb - c) / q)/(s2)]

This is compared with an Fdistribution with qand dfe degrees of freedom, where

dfe is the degrees of freedom for residual error.

Previous Next Top


Previous Next


Previous Next


Multivariate Tests

Multivariate hypotheses involve several dependent variables in the form

where L is a linear function on the regressor side, is a matrix of parameters, M

is a linear function on the dependent side, and d is a matrix of constants. Thespecial case (handled by PROC REG) in which the constants are the same foreach dependent variable is written

where c is a column vector of constants andj is a row vector of 1s. The specialcase in which the constants are 0 is
http://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect17.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect19.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect20.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect20.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect17.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect17.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect19.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect19.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htm#topofpagehttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../common.hlp/images/copyrite.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect20.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect20.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect18.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect20.htmhttp://v9doc.sas.com/cgi-bin/sasdoc/cgigdoc?file=../statug.hlp/introreg_sect20.htm


31/53

These multivariate tests are covered in detail in Morrison (1976); Timm (1975);Mardia, Kent, and Bibby (1979); Bock (1975); and other works cited in Chapter 5,"Introduction to Multivariate Procedures."

To test this hypothesis, construct two matrices, H and E, that correspond to thenumerator and denominator of a univariate Ftest:

Four test statistics, based on the eigenvalues ofE-1 H or(E+H)-1 H, are formed.

Let be the ordered eigenvalues ofE-1 H (if the inverse exists), and let be the

ordered eigenvalues of(E + H)-1 H. It happens that and

, and it turns out that is the ith canonical correlation.

Letp be the rank of(H+E), which is less than or equal to the number of columnsofM. Let qbe the rank ofL(X' WX)- L'. Let vbe the

manual de sas

Documents