adapted by peter au, george brown college mcgraw-hill ryerson copyright © 2011 mcgraw-hill ryerson...

Adapted by Peter Au, George Brown College

Adapted by Peter Au, George Brown College

McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited.

Chapter 12Chapter 12

Multiple Regression and Model BuildingMultiple Regression and Model Building

Copyright © 2011 McGraw-Hill Ryerson Limited

Multiple Regression and Model Building

Part 1 Basic Multiple RegressionPart 2 Using Squared and Interaction TermsPart 3 Dummy Variables and Advanced Statistical

Inferences (Optional)

12-2


Part 1 Basic Multiple Regression

12.1 The Multiple Regression Model12.2 Model Assumptions and the Standard Error12.3

The Least Squares Estimates and Point Estimation and Prediction

12.4 R2 and Adjusted R2

12.5 The Overall F Test12.6

Testing the Significance of an Independent Variable

12.7 Confidence and Prediction Intervals12-3


Part 2 Using Squared and InteractionTerms

12.8 The Quadratic Regression Model (Optional)12.9 Interaction (Optional)

12-4


Part 3 Dummy Variables andAdvanced Statistical Inferences

12.10Using Dummy Variables to Model Qualitative Independent Variables

12.11The Partial F Test: Testing the Significance of a Portion of a Regression Model

12-5


BASIC MULTIPLE REGRESSIONPart 1

12-6


The Multiple Regression Model• Simple linear regression uses one independent variable to

explain the dependent variable• Some relationships are too complex to be described using a

single independent variable• Multiple regression models use two or more independent

variables to describe the dependent variable• This allows multiple regression models to handle more

complex situations• There is no limit to the number of independent variables a

model can use• Like simple regression, multiple regression has only one

dependent variable

12-7


The Multiple Regression Model• The linear regression model relating y to x1, x2,…, xk is

y = my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxk +e

where• my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxk is the mean value

of the dependent variable y when the values of the independent variables are x1, x2,…, xk

• β0, β1,β2, … βkare the regression parameters relating the mean value of y to x1, x2,…, xk

• ɛ is an error term that describes the effects on y of all factors other than the independent variablesx1, x2,…, xk

12-8


Example: Multiple Regression

• Consider the following data table that relates two independent variables x1 and x2 to the dependent variable y (table 12.1)

12-9

Data x1 x2 y1 28.0 18 12.42 28.0 14 11.73 32.5 24 12.44 39.0 22 10.85 45.9 8 9.46 57.8 16 9.57 58.1 1 8.08 62.5 0 7.5


Plotting y versus x1

12-10


Scatter Plot Analysis• The plot shows that y tends to decrease in a

straight-line fashion as x1 increases• This suggests that if we wish to predict y on the

basis of x1 only, the simple linear regression model y = β0 + β1x1 + ɛ relates y to x1

12-11


Plotting y versus x2

12-12


Scatter Plot Analysis• This plot shows that y tends to increase in a

straight-line fashion as x2 increases• This suggests that if we wish to predict y on the

basis of x2 only, the simple linear regression model y = β0 + β1x2 + ɛ

12-13


Geometric Interpretation• The experimental region is defined to be the range

of the combinations of the observed values of x1 and x2

12-14

L01


Plane of Means• The mean value of y when IV1 (independent

variable one) is x1 and IV2 is x2 is μy|x1, x2 (mu of y given x1 and x2

• Consider the equation μy|x1, x2 = β0 + β1x1 + β2x2, which relates mean y values to x1 and x2

• This is a linear equation with two variables, geometrically this equation is the equation of a plane in three-dimensional space

12-15

L01


Plane of Means

12-16

L01


Model Assumptions and The Standard Error

• We need to make certain assumptions about the error term ɛ

• At any given combination of values of x1, x2, . . . , xk, there is a population of error term values that could occur

12-17

L02


Model Assumptions and The Standard Error

• The model is

y = my|x1,x2,…,xk + e = b0 + b1x1 + b2x2 + … + bkxk +e

• Assumptions for multiple regression are stated about the model error terms, ’s

12-18

L02


The Regression Model Assumptions1. Mean of Zero Assumption

The mean of the error terms is equal to 02. Constant Variance Assumption

The variance of the error terms s2 is, the same for every combination values of x1, x2,…, xk

3. Normality AssumptionThe error terms follow a normal distribution for every combination values of x1, x2,…, xk

4. Independence AssumptionThe values of the error terms are statistically independent of each other

12-19

L02


Sum of Squared Errors

12-20

22 )ˆ( iii yyeSSE


Mean Square Error• This is the point estimate of the residual variance

s2

• This formula is slightly different from simple regression

12-21

12

kn-SSE

MSEs


Standard Error• This is the point estimate of the residual

standard deviation s• MSE is from last slide• This formula too is slightly different from

simple regression

• n-(k+1) is the number of degrees of freedom associated with the SSE

12-22

1

kn-SSE

MSEs


From Our Previous Data• Using Table 12.6

• Compute the SSE to be

12-23

0.1348

38

674.0)1(

2

kn-SSE

MSEs 0.3671 1348.02ss


The Least Squares Estimatesand Point Estimation and Prediction

• Estimation/prediction equation

• is the point estimate of the mean value of the dependent variable when the values of the independent variables are x1, x2,…, xk

• It is also the point prediction of an individual value of the dependent variable when the values of the independent variables are x1, x2,…, xk

• b0, b1, b2,…, bk are the least squares point estimates of the parameters b0, 1, 2,…, k

• x01, x02,…, x0k are specified values of the independent predictor variables x1, x2,…, xk

12-24

kk xbxbxbby 00220110 ...ˆ

L03


Calculating the Model• A formula exists for computing the least squares model

for multiple regression• This formula is written using matrix algebra and is

presented in Appendix F available on Connect• In practice, the model can be easily computed using

Excel, MegaStat or many other computer packages

12-25


Table 12.1 Excel Regression Analysis Output

12-26


Residual Calculation Table 12.1

12-27


R2 and Adjusted R2

1. Total variation is given by the formula

2. Explained variation is given by the formula

3. Unexplained variation is given by the formula

4. Total variation is the sum of explained and unexplained variation

5. R2 is the ratio of explained variation to total variation

12-28

2)y(yi

2ˆ )yy( i

2ˆ )y(y ii

variation Totalvariation Explained

R2


What Does R2 Mean?• The multiple coefficient of determination, R2, is the

proportion of the total variation in the n observed values of the dependent variable that is explained by the multiple regression model

12-29

L04


Multiple Correlation Coefficient R• The multiple correlation coefficient R is just the

square root of R2

• With simple linear regression, r would take on the sign of b1

• There are multiple bi’s in a multiple regression model

• For this reason, R is always positive• To interpret the direction of the relationship

between the x’s and y, you must look to the sign of the appropriate bi coefficient

12-30


The Adjusted R2

• Adding an independent variable to multiple regression will always raise R2

• R2 will rise slightly even if the new variable has no relationship to y

• The adjusted R2 corrects for this tendency in R2

• As a result, it gives a better estimate of the importance of the independent variables

• The bar notation indicates adjusted R2

12-31

)1(1

1RR 22

knn

nk


Calculating R2 and Adjusted R2

12-32

• Excel Multiple Regression Output from Table 12.1

963081.0)12(8

1818

297363.0R 97363.0

25.5487524.87502

R 2 2

Explained variationExplained variation

nn

Total variation

Total variation

kk


The Overall F Test• Hypothesis

• H0: b1= b2 = …= bk = 0 versus• Ha: At least one of b1, b2,…, bk ≠ 0

• Test Statistic

• Reject H0 in favor of Ha if: • F(model) > Fa

* or • p-value < a

*Fa is based on k numerator and n-(k+1) denominator degrees of freedom

12-33

1)](k-/[nvariation) ed(Unexplain/kvariation) (Explained

F(model)


EXCEL ANOVA: Table 12.1 Data

• Test Statistic

• F-test at = 0.05 level of significance• F a is based on 2 numerator and 5 denominator degrees of freedom

• Reject H0 at =0.05 level of significance

12-34

33.92)38/(6737.0

2/8751.241)](k-/[nvariation) ed(Unexplain

/kvariation) (ExplainedF(model)

001.0000.0value-pand79.533.92F(model) 05.F


What Next?• The F test tells us that at least one independent

variable is significant• The natural question is which one(s)?• That question will be addressed in the next section

12-35


Testing the Significance of an Independent Variable

• A variable in a multiple regression model is not likely to be useful unless there is a significant relationship between it and y

• Significance Test Hypothesis• H0: bj = 0 versus • Ha: bj ≠ 0

12-36


Testing Significance of anIndependent Variable

• If the regression assumptions hold, we can reject H0: j = 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds

• Or, equivalently, if the corresponding p-value is less than

12-37


Rejection RulesAlternative Reject H0 If p Value

Ha: βj ≠ 0 |t| > tα/2* Twice area under t

distribution right of |t|

Ha: βj > 0 t > tαArea under t distribution right of t

Ha: βj < 0 t < –tαArea under t distribution left of t

* That is t > tα/2 or t < –tα/2

tα/2, tα, and all p values are based on n - (k + 1)degrees of freedom

12-38



• Test Statistic

• A 100(1-α)% confidence interval for βj is

• t, t/2 and p-values are based on n – (k+1) degrees of freedom

12-39

bj

j

sb

t=

][ 2 jbj stb



• It is customary to test the significance of every independent variable in a regression model

• If we can reject H0: bj = 0 at the 0.05 level of significance, then we have strong evidence that the independent variable xj is significantly related to y

• If we can reject H0: bj = 0 at the 0.01 level of significance, we have very strong evidence that the independent variable xj is significantly related to y

• The smaller the significance level a at which H0 can be rejected, the stronger is the evidence that xj is significantly related to y

12-40


A Note on Significance Testing• Whether the independent variable xj is significantly

related to y in a particular regression model is dependent on what other independent variables are included in the model

• That is, changing independent variables can cause a significant variable to become insignificant or cause an insignificant variable to become significant

• This issue is addressed in a later section on multicollinearity

12-41


Example 12.4 The Sales Territory Performance Case

• A sales manager evaluates the performance of sales representatives by using a multiple regression model that predicts sales performance on the basis of five independent variables• x1 = number of months the representative has been employed by

the company• x2 = sales of the company’s product and competing products in the

sales territory (market potential)• x3 = dollar advertising expenditure in the territory• x4 = weighted average of the company’s market share in the

territory for the previous four years• x5 = change in the company’s market share in the territory over the

previous four years

• y = β0 + β 1x1 + β 2x2 + β 3x3 + β 4x4 + β 5x5 + ɛ12-42


Example 12.4 The Sales Territory Performance Case

• Using MegaStat a regression model was computed using collected data

• The p values associated with Time, MktPoten, Adver, and MktShare are all less than 0.01, we have very strong evidence that these variables are significantly related to y and, thus, are important in this model

• The p value associated with Change is 0.0530, suggesting weaker evidence that this variable is important

12-43

Sbj Sbj


Confidence and PredictionIntervals

• The point on the regression line corresponding to a particular value of x01, x02,…, x0k, of the independent variables is

• It is unlikely that this value will equal the mean value of y for these x values

• Therefore, we need to place bounds on how far the predicted value might be from the actual value

• We can do this by calculating a confidence interval for the mean value of y and a prediction interval for an individual value of y

12-44

kk xbxbxbby 00220110 ...ˆ

L06


Distance Value• Both the confidence interval for the mean value of y

and the prediction interval for an individual value of y employ a quantity called the distance value

• With simple regression, we were able to calculate the distance value fairly easily

• However, for multiple regression, calculating the distance value requires matrix algebra

• See Appendix F on Connect for more details

12-45

L06


A Confidence Interval for a MeanValue of y

• Assume that the regression assumptions hold• The formula for a 100(1-a) confidence interval for

the mean value of y is as follows:

• This is based on n-(k+1) degrees of freedom

12-46

value Distance]ty[ )ˆ()ˆ(/2 sss yyyy

L06


A Prediction Interval for an IndividualValue of y

• Assume that the regression assumptions hold• The formula for a 100(1-a) prediction interval for

an individual value of y is as follows:

• This is based on n-(k+1) degrees of freedom

12-47

value Distance+1],ty[ ˆˆ/2 sss yy


Sales Territory Performance CaseSales Time MktPoten Adver MktShare Change

3669.88 43.10 74065.11 4582.88 2.51 0.343473.95 108.13 58117.30 5539.78 5.51 0.152295.10 13.82 21118.49 2950.38 10.91 -0.724675.56 186.18 68521.27 2243.07 8.27 0.176125.96 161.79 57805.11 7747.08 9.15 0.502134.94 8.94 37806.94 402.44 5.51 0.155031.66 365.04 50935.26 3140.62 8.54 0.553367.45 220.32 35602.08 2086.16 7.07 -0.496519.45 127.64 46176.77 8846.25 12.54 1.244876.37 105.69 42053.24 5673.11 8.85 0.312468.27 57.72 36829.71 2761.76 5.38 0.372533.31 23.58 33612.67 1991.85 5.43 -0.652408.11 13.82 21412.79 1971.52 8.48 0.642337.38 13.82 20416.87 1737.38 7.80 1.014586.95 86.99 36272.00 10694.20 10.34 0.112729.24 165.85 23093.26 8618.61 5.15 0.043289.40 116.26 26879.59 7747.89 6.64 0.682800.78 42.28 39571.96 4565.81 5.45 0.663264.20 52.84 51866.15 6022.70 6.31 -0.103453.62 165.04 58749.82 3721.10 6.35 -0.031741.45 10.57 23990.82 860.97 7.37 -1.632035.75 13.82 25694.86 3571.51 8.39 -0.431578.00 8.13 23736.35 2845.50 5.15 0.044167.44 58.54 34314.29 5060.11 12.88 0.222799.97 21.14 22809.53 3552.00 9.14 -0.74

12-48

Data


Confidence & Prediction Intervals• Using The Sales Territory Performance Case• The point prediction of the sales corresponding to;

• TIME = 85.42• MktPoten = 35182.73• Adver = 7281.65• Mothered = 9.64• Change = 0.28

• Using the regression model from before;• ŷ = -1,113.7879 + 3.6121(85.42) + 0.0421(35,182.73) + 0.1289(7,281.65) +

256.9555(9.64) + 324.5334(0.28) = 4,181.74 (that is, 418,174 units)

• This point prediction is given at the bottom of the MegaStat output in Figure 12.7, which we repeat here:

12-49


MegaStat Output

12-50


Confidence & Prediction Intervals• 95% Confidence Interval

• 95% Prediction Interval

12-51

]58.4478,91.3884[

]829.29674.4181[

]0.109)232.4302.093)(([4181.74

]value Distancety[ /2

s

]88.5129,60.3233[

]137.94874.4181[

]0.1091)232.4302.093)(([4181.74

]value Distance1ty[ /2

s

L06


USING SQUARED AND INTERACTION TERMS

Part 2

12-52


The Quadratic Regression Model

• One useful form of linear regression is the quadratic regression model

• Assume that we have n observations of x and y• The quadratic regression model relating y to x is

y = b0 + b1x + b2x2 + , e where

• b0 + b1x + b2x2 is the mean value of the dependent variable y when the value of the independent variable is x

• b0, b1, and b2 are unknown regression parameters relating the mean value of y to x

• e is an error term that describes the effects on y of all factors other than x and x2

12-53

Table of Contents Next Section Next Part


The Quadratic Regression Model Visually

12-54


A Note on the Quadratic Model• Even though the quadratic model employs the

squared term x2 and, as a result, assumes a curved relationship between the mean value of y and x, this model is a linear regression model

• This is because b0 + b1x + b2x2 expresses the mean value y as a linear function of the parameters b0, b1, and b2

• As long as the mean value of y is a linear function of the regression parameters, we have a linear regression model

12-55


Example 12.6 The Stress and Work Motivation Case

• The human resources department administers a stress questionnaire to 15 employees in which people rate their stress level on a 0 (no stress) to 4 (high stress) scale

• Work performance was measured as the average number of projects completed by the employee per year, averaged over the last five years

12-56


Example 12.6 The Stress and Work Motivation Case

12-57


Regression Analysis

2^

01905.19762.47152.25 xxy

12-58


More Variables• We have only looked at the simple case where we have

y and x• That gave us the following quadratic regression model

y = b0 + b1x + b2x2 + e

• However, we are not limited to just two terms• The following would also be a valid quadratic

regression model

y = b0 + b1x1 + b2x12 + b3x2 + b4x3 + e

12-59


Interaction• Multiple regression models often contain

interaction variables• These are variables that are formed by multiplying

two independent variables together• For example, x1·x2

• In this case, the x1·x2 variable would appear in the model along with both x1 and x2

• We use interaction variables when the relationship between the mean value of y and one of the independent variables is dependent on the value of another independent variable

12-60

Table of Contents Next Section Next Part


Interaction Variable Example• Consider a company that runs both radio and

television ads for its products• It is reasonable to assume that raising either ad

amount would raise sales• However, it is also reasonable to assume that the

effectiveness of television ads depends, in part, on how often consumers hear the radio ads

• Thus, an interaction variable would be appropriate

12-61


Example 12.8 Froid Frozen Foods Experiment

12-62


Plot of y versus x1 (Plot Character Is the Corresponding Value of x2)

12-63


Plot of y versus x2 (Plot Character Is the Corresponding Value of x1)

12-64


Example 12.8 Froid Frozen Foods Experiment

• These last two figures imply that the more is spent on one type of advertising, the smaller the slope for the other type of advertising

• The is, the slope of one line depends on the value on the other variable

• That says that there is interaction between x1 and x2

12-65


MegaStat Output• Froid Frozen Foods Experiment

12-66


Spotting Interactive Terms• It is fairly easy to construct data plots to check for

interaction when a careful experiment is carried out

• It is often not possible to construct the necessary plots with less structured data

• If an interaction is suspected, we can include the interactive term and see if it is significant

12-67


A Note on Interactive ModelConstruction

• When an interaction term (say x1x2) is important to a model, it is the usual practice to leave the corresponding linear terms (x1 and x2) in the model no matter what their p-values

12-68


DUMMY VARIABLES ANDADVANCED STATISTICAL INFERENCES

Part 3

12-69


Using Dummy Variables toModel Qualitative Independent Variables• So far, we have only looked at including

quantitative data in a regression model• However, we may wish to include descriptive

qualitative data as well• For example, might want to include the sex of

respondents• We can model the effects of different levels of a

qualitative variable by using what are called dummy variables• Also known as indicator variables

12-70


How to Construct Dummy Variables

• A dummy variable always has a value of either 0 or 1

• For example, to model sales at two locations, would code the first location as a zero and the second as a 1• Operationally, it does not matter which is coded 0 and which is

coded 1

12-71


Example 12.10• Suppose that Electronics World, a chain of stores

that sells audio and video equipment, has gathered the data in Table 12.13

• These data concern store sales volume in July of last year (y, measured in thousands of dollars), the number of households in the store’s area (x, measured in thousands), and the location of the store

12-72


Example 12.10 The ElectronicsWorld Case

• Location Dummy Variable

12-73

otherwise0

locationmallainisstoreaif1MD


Example 12.10 The ElectronicsWorld Case

12-74


MegaStat Output

12-75


What If We Have More Than TwoCategories?

• Consider having three categories, say A, B, and C• Cannot code this using one dummy variable

• A=0, B=1, and C=2 would be invalid• Assumes the difference between A and B is the same as B and

C• We must use multiple dummy variables

• Specifically, a categories requires a-1 dummy variables• For A, B, and C, would need two dummy variables

• x1 is 1 for A, zero otherwise• x2 is 1 for B, zero otherwise• If x1 and x2 are zero, must be C

• This is why the third dummy variable is not needed

12-76


Example 12.10 Electronics WorldTwo Dummy Variables

• Geometrical Interpretation of the Sales Volume Model y = β0 1 β1x + β2DM + β3xDM + ɛ

12-77


Example 12.10 Electronics WorldTwo Dummy Variables

12-78


Interaction Models• So far, have only considered dummy variables as stand-

alone variables• Model so far is

y = b0 + b1x + b2D + , e where D is dummy variable

• However, can also look at interaction between dummy variable and other variables

• That model would take the for y = b0 + b1x + b2D + b3xD+ e

• With an interaction term, both the intercept and slope are shifted

12-79


Other Uses• So far, we have seen dummy variables used to code

categorical variables• Dummy variables can also be used to flag unusual

events that have an important impact on the dependent variable

• These unusual events can be one-time events• Impact of a strike on sales• Impact of major sporting event coming to town

• Or they can be reoccurring events• Hot temperatures on soft drink sales• Cold temperatures on coat sales

12-80


The Partial F Test: Testing the Significance of a Portion of a Regression Model

• So far, we have looked at testing single slope coefficients using t test

• We have also looked at testing all the coefficients at once using F test

• The partial F test allows us to test the significance of any set of independent variables in a regression model

12-81


The Partial F Test Model• We can use this F test to test the significance of a

portion of a regression mode

12-82


Example 12.11: Electronics World• The model: y = b0 + b1x + b2DM + b3DD + e

• DM and DD are dummy variables• This called the complete model

• Will now look at just the reduced model:y = b0 + b1x + e

• Hypothesis to test• H0: b2 = b3 = 0 verus

Ha: At least one of b2 and b3 does not equal zero

• The SSE for the complete model is SSEC = 443.4650• The SSE for the reduced model is SSER = 2,467.8067

12-83


Example 12.11: Electronics World

• We compare F with F.01 = 7.21• Based on k – g = 2 numerator degrees of freedom• And n – (k + 1) = 11 denominator degrees of freedom• Note that k – g denotes the number of regression parameters set to

0• Since F = 25.1066 > 7.21 we reject the null hypothesis at = 0.01• We conclude that it appears as though at least two locations have

different effects on mean sales volume

12-84

1066.25415/4650.443

2/4650.4438067.467,21/

/

knSSE

gkSSESSEF

c

cR

L05


Summary• The multiple regression model employs at least 2 independent

variables to relate to the dependent variable• Some ways to judge a models overall utility are; standard error,

multiple coefficient of determination, adjusted multiple coefficient of determination, and the overall F test

• Square terms can be used to model quadric relationships while cross product terms can be used to model interaction relationships

• Dummy variables can use used to model qualitative independent variables

• The partial F test can be used to evaluate a portion of the regression model

12-85

adapted by peter au, george brown college mcgraw-hill ryerson copyright © 2011 mcgraw-hill ryerson...

Documents