chapter 14 student lecture notes 14-1 for managers using microsoft excel, 2/e © 1999 prentice-hall,...

25
Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc. Chapter 14 Student Lecture Notes 14-1 © 2004 Prentice-Hall, Inc. Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression © 2004 Prentice-Hall, Inc. Chap 14-2 Chapter Topics The Multiple Regression Model Residual Analysis Testing for the Significance of the Regression Model Inferences on the Population Regression Coefficients Testing Portions of the Multiple Regression Model Dummy-Variables and Interaction Terms Logistic Regression Model © 2004 Prentice-Hall, Inc. Chap 14-3 Population Y-intercept Population slopes Random error The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Dependent (Response) variable Independent (Explanatory) variables 1 2 i i i k ki i Y X X X β β β β ε 0 1 2 = + + + + + L

Upload: doanliem

Post on 21-Apr-2018

225 views

Category:

Documents


4 download

TRANSCRIPT

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-1

© 2004 Prentice-Hall, Inc. Chap 14-1

Basic Business Statistics(9th Edition)

Chapter 14Introduction to Multiple

Regression

© 2004 Prentice-Hall, Inc. Chap 14-2

Chapter TopicsThe Multiple Regression ModelResidual AnalysisTesting for the Significance of the Regression ModelInferences on the Population Regression CoefficientsTesting Portions of the Multiple Regression ModelDummy-Variables and Interaction TermsLogistic Regression Model

© 2004 Prentice-Hall, Inc. Chap 14-3

Population Y-intercept

Population slopes Random error

The Multiple Regression ModelRelationship between 1 dependent & 2 or more

independent variables is a linear function

Dependent (Response) variable

Independent (Explanatory) variables

1 2i i i k ki iY X X Xβ β β β ε0 1 2= + + + + +L

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-2

© 2004 Prentice-Hall, Inc. Chap 14-4

Multiple Regression Model

X2

Y

X1µY|X = β0 + β1X1i + β2X2i

β0

Yi = β0 + β1X1i + β2X2i + εi

ResponsePlane

(X1i,X2i)

(Observed Y)

εi

X2

Y

X1µY|X = β0 + β1X1i + β2X2i

β0

Yi = β0 + β1X1i + β2X2i + εi

ResponsePlane

(X1i,X2i)

(Observed Y)

εi

1X

Y

2X

0 1 1 2 2i i i iY X Xβ β β ε= + + +(Observed )Y

| 0 1 1 2 2Y X i iX Xµ β β β= + +

ResponsePlane

1 2,i iX X

© 2004 Prentice-Hall, Inc. Chap 14-5

Multiple Regression Equation

X2

Y

X1

b0

Yi = b0 + b1X1i + b2X2i + ei

ResponsePlane

(X1i, X2i)

(Observed Y)

^

ei

Yi = b0 + b1X1i + b2X2i

X2

Y

X1

b0

Yi = b0 + b1X1i + b2X2i + ei

ResponsePlane

(X1i, X2i)

(Observed Y)

^

ei

Yi = b0 + b1X1i + b2X2i

0 1 1 2 2i i i iY b b X b X e= + + +Y

1X

2X

(Observed )YResponsePlane

1 2,i iX X

0b

0 1 1 2 2i i iY b b X b X= + +Multiple Regression Equation

© 2004 Prentice-Hall, Inc. Chap 14-6

Multiple Regression Equation

Too complicated

by hand! Ouch!

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-3

© 2004 Prentice-Hall, Inc. Chap 14-7

Interpretation of Estimated Coefficients

Slope (bj )Estimated that the average value of Y changes by bj for each 1 unit increase in Xj , holding all other variables constant (ceterus paribus)Example: If b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1), given the inches of insulation (X2)

Y-Intercept (b0)The estimated average value of Y when all Xj = 0

© 2004 Prentice-Hall, Inc. Chap 14-8

Multiple Regression Model: Example

Oil (Gal) Temp Insulation275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6

230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10

203.50 41 6441.10 21 3323.00 38 352.50 58 10

(0F)Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches.

© 2004 Prentice-Hall, Inc. Chap 14-9

1 2ˆ 562.151 5.437 20.012i i iY X X= − −

Multiple Regression Equation: Example

CoefficientsIntercept 562.1510092X Variable 1 -5.436580588X Variable 2 -20.01232067

Excel Output

For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant.

For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.

0 1 1 2 2i i i k kiY b b X b X b X= + + + +L

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-4

© 2004 Prentice-Hall, Inc. Chap 14-10

Multiple Regression in PHStat

PHStat | Regression | Multiple Regression …

Excel spreadsheet for the heating oil example

Microsoft Excel Worksheet

© 2004 Prentice-Hall, Inc. Chap 14-11

Venn Diagrams and Explanatory Power of Regression

Oil

Temp

Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil

Variations in Oil explained by the error term

Variations in Temp not used in explaining variation in Oil ( )SSE

( )SSR

© 2004 Prentice-Hall, Inc. Chap 14-12

Venn Diagrams and Explanatory Power of Regression

Oil

Temp

2

r

SSRSSR SSE

=

=+

(continued)

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-5

© 2004 Prentice-Hall, Inc. Chap 14-13

Venn Diagrams and Explanatory Power of Regression

Oil

TempInsulation

Overlapping variation in both Temp and Insulation are used in explaining the variation in Oil but NOT in the estimation of nor

1β2β

Variation NOTexplained by Temp nor Insulation( )SSE

© 2004 Prentice-Hall, Inc. Chap 14-14

Coefficient of Multiple Determination

Proportion of Total Variation in Y Explained by All X Variables Taken Together

Never Decreases When a New X Variable is Added to Model

Disadvantage when comparing among models

212

Explained Variation

Total VariationY kSSRrSST• = =L

© 2004 Prentice-Hall, Inc. Chap 14-15

Venn Diagrams and Explanatory Power of Regression

Oil

TempInsulation

212

Yr

SSRSSR SSE

• =

=+

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-6

© 2004 Prentice-Hall, Inc. Chap 14-16

Adjusted Coefficient of Multiple Determination

Proportion of Variation in Y Explained by All the X Variables Adjusted for the Sample Size and the Number of X Variables Used

Penalizes excessive use of independent variablesSmaller thanUseful in comparing among modelsCan decrease if an insignificant new X variable is added to the model

( )2 212

11 11adj Y k

nr rn k•

−⎡ ⎤= − −⎢ ⎥− −⎣ ⎦L

212Y kr • L

© 2004 Prentice-Hall, Inc. Chap 14-17

Coefficient of Multiple Determination

Regression S tatisticsM ultiple R 0.982654757R S quare 0.965610371A djus ted R S quare 0.959878766S tandard E rror 26.01378323Observations 15

Excel Output 21 2Y

S S RrS S T• =

Adjusted r2

reflects the number of explanatory variables and sample size

is smaller than r2

© 2004 Prentice-Hall, Inc. Chap 14-18

Interpretation of Coefficient of Multiple Determination

96.56% of the total variation in heating oil can be explained by temperature and amount of insulation

95.99% of the total fluctuation in heating oil can be explained by temperature and amount of insulation after adjusting for the number of explanatory variables and sample size

212 .9656Y

SSRrSST• = =

2adj .9599r =

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-7

© 2004 Prentice-Hall, Inc. Chap 14-19

Simple and Multiple Regression Compared

The slope coefficient in a simple regression picks up the impact of the independent variable plus the impacts of other variables that are excluded from the model, but are correlated with the included independent variable and the dependent variable

Coefficients in a multiple regression net out the impacts of other variables in the equation

Hence, they are called the net regression coefficientsThey still pick up the effects of other variables that are excluded from the model, but are correlated with the included independent variables and the dependent variable

© 2004 Prentice-Hall, Inc. Chap 14-20

Simple and Multiple Regression Compared: Example

Two Simple Regressions:

Multiple Regression:

0 1

0 2

Oil TempOil Insulation

β β εβ β ε

= + += + +

0 1 2Oil Temp Insulationβ β β ε= + + +

The three ’s do not have the same value

0β The two ’s do not have the same value

The two ’s do not have the same value

The three ’s are different

ε

© 2004 Prentice-Hall, Inc. Chap 14-21

CoefficientsIntercept 562.1510092Temp -5.436580588Insulation -20.01232067

Simple and Multiple Regression Compared: Slope Coefficients

0 1 2Oil Temp Insulationb eb b= + + +

0 1Oil Tempb b e= + + 0 2Oil Insulationb b e= + +

CoefficientsIntercept 436.4382299Temp -5.462207697

CoefficientsIntercept 345.3783784Insulation -20.35027027

-20.0123 -20.3503≠

-5.4366 -5.4622≠The three ’s are differente

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-8

© 2004 Prentice-Hall, Inc. Chap 14-22

Simple and Multiple Regression Compared: r2

Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15

0 1 2Oil Temp Insulationb b b e= + + +

0 1Oil Tempb b e= + + 0 1Oil Insulationb b e= + +Regression Statistics

Multiple R 0.86974117R Square 0.756449704Adjusted R Square 0.737715065Standard Error 66.51246564Observations 15

Regression StatisticsMultiple R 0.465082527R Square 0.216301757Adjusted R Square 0.156017277Standard Error 119.3117327Observations 15

( )0.75645 0.96561 0. 30 216+≠

( )0.97275

=

© 2004 Prentice-Hall, Inc. Chap 14-23

Example: Adjusted r2

Can Decrease

Regression StatisticsMultiple R 0.982654757R Square 0.965610371Adjusted R Square 0.959878766Standard Error 26.01378323Observations 15

0 1 2Oil Temp Insulationβ β β ε= + + +

0 1 2 3Oil Temp Insulation Rainfall β β β β ε= + + + +

Regression StatisticsMultiple R 0.983482856R Square 0.967238528Adjusted R Square 0.958303581Standard Error 25.72417272Observations 15

Adjusted r 2 decreases when k increases from 2 to 3

Rainfall is not useful in explaining the variation in oil consumption.

Try a 3rd explanatory variable

© 2004 Prentice-Hall, Inc. Chap 14-24

Using the Regression Equation to Make Predictions

Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is 6 inches.

The predicted heating oil used is 278.97 gallons.

( ) ( )1 2

ˆ 562.151 5.437 20.012562.151 5.437 30 20.012 6278.969

i i iY X X= − −

= − −

=

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-9

© 2004 Prentice-Hall, Inc. Chap 14-25

Predictions in PHStat

PHStat | Regression | Multiple Regression …Check the “Confidence and Prediction Interval Estimate” box

Excel spreadsheet for the heating oil example

Microsoft Excel Worksheet

© 2004 Prentice-Hall, Inc. Chap 14-26

Residual Plots

Residuals VsMay need to transform Y variable

Residuals VsMay need to transform variable

Residuals VsMay need to transform variable

Residuals Vs TimeMay have autocorrelation

Y

1X

2X1X

2X

© 2004 Prentice-Hall, Inc. Chap 14-27

Residual Plots: Example

Insulation Residual Plot

0 2 4 6 8 10 12

No Discernable Pattern

Temperature Residual Plot

-60

-40

-20

0

20

40

60

0 20 40 60 80

Res

idua

ls

Maybe some non-linear relationship

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-10

© 2004 Prentice-Hall, Inc. Chap 14-28

Testing for Overall Significance

Shows if Y Depends Linearly on All of the XVariables Together as a GroupUse F Test StatisticHypotheses:

H0: β1 = β2 = … = βk = 0 (No linear relationship)H1: At least one βj ≠ 0 ( At least one independentvariable affects Y )

The Null Hypothesis is a Very Strong StatementThe Null Hypothesis is Almost Always Rejected

© 2004 Prentice-Hall, Inc. Chap 14-29

Testing for Overall Significance

Test Statistic:

Where F has k numerator and (n-k-1) denominator degrees of freedom

(continued)

( )/

/ 1MSR SSR kFMSE MSE n k

= =− −

© 2004 Prentice-Hall, Inc. Chap 14-30

ANOVAdf SS MS F Significance F

Regression 2 228014.6 114007.3 168.4712 1.65411E-09Residual 12 8120.603 676.7169Total 14 236135.2

Test for Overall SignificanceExcel Output: Example

k = 2, the number of explanatory variables n - 1

p-value

Test StatisticMSR FMSE

=

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-11

© 2004 Prentice-Hall, Inc. Chap 14-31

Test for Overall Significance:Example Solution

F0 3.89

H0: β1 = β2 = … = βk = 0H1: At least one βj ≠ 0α = .05df = 2 and 12

Critical Value:

Test Statistic:

Decision:

Conclusion:

Reject at α = 0.05.

There is evidence that at least one independent variable affects Y.

α = 0.05

F = 168.47(Excel Output)

© 2004 Prentice-Hall, Inc. Chap 14-32

Test for Significance:Individual Variables

Show If Y Depends Linearly on a Single XjIndividually While Holding the Effects of Other X’s FixedUse t Test StatisticHypotheses:

H0: βj = 0 (No linear relationship)H1: βj ≠ 0 (Linear relationship between Xj and Y)

© 2004 Prentice-Hall, Inc. Chap 14-33

Coefficients Standard Error t Stat P-valueIntercept 562.1510092 21.09310433 26.65094 4.77868E-12Temp -5.436580588 0.336216167 -16.1699 1.64178E-09Insulation -20.01232067 2.342505227 -8.543127 1.90731E-06

t Test StatisticExcel Output: Example

t Test Statistic for X1(Temperature)

t Test Statistic for X2(Insulation)

j

j

b

bt

S=

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-12

© 2004 Prentice-Hall, Inc. Chap 14-34

t Test : Example Solution

H0: β1 = 0

H1: β1 ≠ 0df = 12 Critical Values:

Test Statistic:

Decision:

Conclusion:Reject H0 at α = 0.05.

There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation.

t0 2.1788-2.1788

.025Reject H0 Reject H0

.025

Does temperature have a significant effect on monthly consumption of heating oil? Test at α = 0.05.

t Test Statistic = -16.1699

© 2004 Prentice-Hall, Inc. Chap 14-35

Venn Diagrams and Estimation of Regression Model

Oil

TempInsulation

Only this information is used in the estimation of 2β

Only this information is used in the estimation of

1β This information is NOT used in the estimation of nor1β 2β

© 2004 Prentice-Hall, Inc. Chap 14-36

Confidence Interval Estimate for the Slope

Provide the 95% confidence interval for the population slope β1 (the effect of temperature on oil consumption).

11 1n p bb t S− −±

Coefficients Lower 95% Upper 95%Intercept 562.151009 516.1930837 608.108935Temp -5.4365806 -6.169132673 -4.7040285Insulation -20.012321 -25.11620102 -14.90844

-6.169 ≤ β1 ≤ -4.704We are 95% confident that the estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F holding insulation constant.We can also perform the test for the significance of individual variables, H0: β1 = 0 vs. H1: β1 ≠ 0, using this confidence interval.

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-13

© 2004 Prentice-Hall, Inc. Chap 14-37

Contribution of a SingleIndependent Variable

Let Xj Be the Independent Variable of Interest

Measures the additional contribution of Xj in explaining the total variation in Y with the inclusion of all the remaining independent variables

jX

( )( ) ( )

| all others except

all all others except j j

j

SSR X X

SSR SSR X= −

© 2004 Prentice-Hall, Inc. Chap 14-38

Contribution of a Single Independent Variable kX

( )( ) ( )

1 2 3

1 2 3 2 3

| and

, and and

SSR X X X

SSR X X X SSR X X= −

Measures the additional contribution of X1 in explaining Y with the inclusion of X2 and X3.

From ANOVA section of regression for

From ANOVA section of regression for

0 1 1 2 2 3 3i i i iY b b X b X b X= + + +0 2 2 3 3i i iY b b X b X= + +

Note: the values of the coefficients b0 , b1 , and b2 change in the two regression equations.

© 2004 Prentice-Hall, Inc. Chap 14-39

Coefficient of Partial Determination of

Measures the proportion of variation in the dependent variable that is explained by Xjwhile controlling for (holding constant) the other independent variables

( )( ) ( )

2 all others

| all othersall | all others

Yj

j

j

r

SSR XSST SSR SSR X

• =

− +

jX

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-14

© 2004 Prentice-Hall, Inc. Chap 14-40

Coefficient of Partial Determination for jX

(continued)

( )( ) ( )

1 221 2

1 2 1 2

|, |Y

SSR X Xr

SST SSR X X SSR X X• =− +

Example: Model with two independent variables

© 2004 Prentice-Hall, Inc. Chap 14-41

Venn Diagrams and Coefficient of Partial Determination for jX

Oil

TempInsulation

( )1 2|SSR X X ( )( ) ( )

21 2

1 2

1 2 1 2

|, |

YrSSR X X

SST SSR X X SSR X X

• =

− +

=

© 2004 Prentice-Hall, Inc. Chap 14-42

Coefficient of Partial Determination in PHStat

PHStat | Regression | Multiple Regression …Check the “Coefficient of Partial Determination” box

Excel spreadsheet for the heating oil example

Microsoft Excel Worksheet

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-15

© 2004 Prentice-Hall, Inc. Chap 14-43

Contribution of a Subset of Independent Variables

Let Xs Be the Subset of Independent Variables of Interest

Measures the contribution of the subset Xs in explaining SST with the inclusion of the remaining independent variables

( )( ) ( )

| all others except

all all others except s s

s

SSR X X

SSR SSR X= −

© 2004 Prentice-Hall, Inc. Chap 14-44

Contribution of a Subset of Independent Variables: Example

Let Xs be X1 and X3

( )( ) ( )

1 3 2

1 2 3 2

and |

, and

SSR X X X

SSR X X X SSR X= −

From ANOVA section of regression for

From ANOVA section of regression for

0 1 1 2 2 3 3i i i iY b b X b X b X= + + + 0 2 2i iY b b X= +

© 2004 Prentice-Hall, Inc. Chap 14-45

Testing Portions of Model

Examines the Contribution of a Subset Xs of Explanatory Variables to the Relationship with YNull Hypothesis:

Variables in the subset do not improve the model significantly when all other variables are included

Alternative Hypothesis:At least one variable in the subset is significant when all other variables are included

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-16

© 2004 Prentice-Hall, Inc. Chap 14-46

Testing Portions of Model

One-Tailed Rejection RegionRequires Comparison of Two Regressions

One regression includes everythingAnother regression includes everything except the portion to be tested

(continued)

© 2004 Prentice-Hall, Inc. Chap 14-47

Partial F Test for the Contribution of a Subset of X Variables

Hypotheses:H0 : Variables Xs do not significantly improve the model given all other variables includedH1 : Variables Xs significantly improve the model given all others included

Test Statistic:

with df = m and (n-k-1)m = # of variables in the subset Xs

( )( )

| all others /all

sSSR X mF

MSE=

© 2004 Prentice-Hall, Inc. Chap 14-48

Partial F Test for the Contribution of a Single

Hypotheses:H0 : Variable Xj does not significantly improve the model given all others included

H1 : Variable Xj significantly improves the model given all others included

Test Statistic:

with df =1 and (n-k-1 ) m = 1 here

jX

( )( )

| all othersall

jSSR XF

MSE=

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-17

© 2004 Prentice-Hall, Inc. Chap 14-49

Testing Portions of Model: Example

Test at the α = .05 level to determine if the variable of average temperature significantly improves the model, given that insulation is included.

© 2004 Prentice-Hall, Inc. Chap 14-50

Testing Portions of Model: Example

H0: X1 (temperature) does not improve model with X2(insulation) included

H1: X1 does improve model

α = .05, df = 1 and 12

Critical Value = 4.75

ANOVASS

Regression 51076.47Residual 185058.8Total 236135.2

ANOVASS MS

Regression 228014.6263 114007.313Residual 8120.603016 676.716918Total 236135.2293

(For X1 and X2) (For X2)

Conclusion: Reject H0; X1 does improve model.

( )( )

( )1 2

1 2

| 228,015 51,076261.47

, 676.717SSR X X

FMSE X X

−= = =

© 2004 Prentice-Hall, Inc. Chap 14-51

Testing Portions of Modelin PHStat

PHStat | Regression | Multiple Regression …Check the “Coefficient of Partial Determination” box

Excel spreadsheet for the heating oil example

Microsoft Excel Worksheet

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-18

© 2004 Prentice-Hall, Inc. Chap 14-52

Do We Need to Do Thisfor One Variable?

The F Test for the Contribution of a Single Variable After All Other Variables are Included in the Model is IDENTICAL to the t Test of the Slope for that VariableThe Only Reason to Perform an F Test is to Test Several Variables Together

© 2004 Prentice-Hall, Inc. Chap 14-53

Dummy-Variable ModelsCategorical Explanatory Variable with 2 or More LevelsOnly Intercepts are DifferentAssumes Equal Slopes Across CategoriesThe Number of Dummy-Variables Needed is (# of Levels - 1)Regression Model Has Same Form:

Two Level ExamplesYes or No, On or OffUse Dummy-Variable (Coded as 0 or 1)

0 1 1 2 2i i i k ki iY X X Xβ β β β ε= + + + • • • + +

© 2004 Prentice-Hall, Inc. Chap 14-54

0 1 1 2 0 1 1ˆ (0)i i iY b b X b b b X= + + = +

0 1 1 2 0 2 1 1ˆ (1) ( )i i iY b b X b b b b X= + + = + +

Dummy-Variable Models (with 2 Levels)

Given:

Y = Assessed Value of House

X1 = Square Footage of House

X2 = Desirability of Neighborhood =

Desirable (X2 = 1)

Undesirable (X2 = 0)

0 if undesirable 1 if desirable

0 1 1 2 2i i iY b b X b X= + +

Same slopes

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-19

© 2004 Prentice-Hall, Inc. Chap 14-55

UndesirableDesirable Location

Dummy-Variable Models (with 2 Levels)

(continued)

X1 (Square footage)

Y (Assessed Value)

b0 + b2

b0

Same slopes

Intercepts different

1b

© 2004 Prentice-Hall, Inc. Chap 14-56

Interpretation of the Dummy-Variable Coefficient (with 2 Levels)

Example:

1X : GPA 2X0 non-business degree

1 business degree

Y: Annual salary of college graduate in thousand $

With the same GPA, college graduates with a business degree are making an estimated 6 thousand dollars more than graduates with a non-business degree, on average.

:

0 1 1 2 2 1 2ˆ 20 5 6i i i i iY b b X b X X X= + + = + +

© 2004 Prentice-Hall, Inc. Chap 14-57

Dummy-Variable Models (with 3 Levels)

1

2 3

Given:Assessed Value of the House (1000 $)Square Footage of the House

Style of the House = Split-level, Ranch, Tudor(3 Levels; Need 2 Dummy Variables)

1 if Split-level 1

0 if not

YX

X X

==

⎧= =⎨

0 1 1 2 2 3 3

if Ranch0 if not

iY b b X b X b X

⎧⎨⎩

= + + +

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-20

© 2004 Prentice-Hall, Inc. Chap 14-58

Interpretation of the Dummy-Variable Coefficients (with 3 Levels)

( )

( )

1 2 3

2

1

3

1

1

Given the Estimated Model: ˆ 20.43 0.045 18.84 23.53For Split-level 1 :ˆ 20.43 0.045 18.84For Ranch 1 :ˆ 20.43 0.045 23.53For Tudor:ˆ 20.43 0.045

i i i i

i i

i i

i i

Y X X XX

Y XX

Y X

Y X

= + + +

=

= + +

=

= + +

= +

With the same footage, a Split-level will have an estimated average assessed value of 18.84 thousand dollars more than a Tudor.With the same footage, a Ranch will have an estimated average assessed value of 23.53 thousand dollars more than a Tudor.

© 2004 Prentice-Hall, Inc. Chap 14-59

Regression Model Containing an Interaction Term

Hypothesizes Interaction between a Pair of XVariables

Response to one X variable varies at different levels of another X variable

Contains a Cross-Product Term

Can Be Combined with Other Models E.g., Dummy-Variable Model

0 1 1 2 2 3 1 2i i i i i iY X X X Xβ β β β ε= + + + +

© 2004 Prentice-Hall, Inc. Chap 14-60

Effect of Interaction

Given:

Without Interaction Term, Effect of X1 on Y is Measured by β1

With Interaction Term, Effect of X1 on Y is Measured by β1 + β3 X2

Effect Changes as X2 Changes

0 1 1 2 2 3 1 2i i i i i iY X X X Xβ β β β ε= + + + +

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-21

© 2004 Prentice-Hall, Inc. Chap 14-61

Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1

Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1

Interaction Example

Effect (slope) of X1 on Y depends on X2 value

X1

4

8

12

00 10.5 1.5

YY = 1 + 2X1 + 3X2 + 4X1X2

© 2004 Prentice-Hall, Inc. Chap 14-62

Interaction Regression Model Worksheet

Multiply X1 by X2 to get X1X2Run regression with Y, X1, X2 , X1X2

:::::30653462313

40584233111

X1i X2iX2iX1iYiCase, i

© 2004 Prentice-Hall, Inc. Chap 14-63

Interpretation When There Are 3+ Levels

Male = 0 if female; 1 if malePart-time = 1 if working part-time; 0 if working full-time or not workingFull-time = 1 if working full-time; 0 if working part-time or not workingMale•Part-time = 1 if male and working part-time; 0 otherwise

= (Male times Part-time)Male•Full-time = 1 if male working full-time; 0 otherwise

= (Male times Full-time)

0 1 2 3

4 5

Male Part-time Full-time Male Part-time Male Full-time

Y β β β ββ β ε= + + +

+ • + • +

Consider the effects of gender (male or female) and working status (working part-time, working full-time or not working) on income (Y ).

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-22

© 2004 Prentice-Hall, Inc. Chap 14-64

Interpretation When There Are 3+ Levels

(continued)

Not-working Part-time Full-time

Female β0 2β β0 + 3β β0 + Male

1β β0 + 1

2 4

β ββ β0 +

+ +

1

3 5

β ββ β0 +

+ +

0 1 2 3

4 5

Male Part-time Full-time Male Part-time Male Full-time

Y β β β ββ β ε= + + +

+ • + • +

© 2004 Prentice-Hall, Inc. Chap 14-65

Interpreting ResultsFemaleNot-working:Part-time:

Full-time:

MaleNot-working:Part-time:

Full-time:

Main Effects : Male, Part-time and Full-time

Interaction Effects : Male•Part-time and Male•Full-time

0β 0 1β β+ 1β0 2β β+ 0 1

2 4

β ββ β

++ +

1 4β β+

0 3β β+ 0 1

3 5

β ββ β

++ +

1 5β β+

Difference

© 2004 Prentice-Hall, Inc. Chap 14-66

Suppose X1 and X2 are Numerical Variables and X3 is a Dummy-VariableTo Test if the Slope of Y with X1 and/or X2 are the Same for the Two Levels of X3Model:

Hypotheses:H0: β4 = β5 = 0 (No Interaction between X1 and X3 or X2 and X3 )H1: β4 and/or β5 ≠ 0 (X1 and/or X2 Interacts with X3)

Perform a Partial F Test

Evaluating the Presence of Interaction with Dummy-Variable

0 1 1 2 2 3 3 4 1 3 5 2 3i i i i i i i i iY X X X X X X Xβ β β β β β ε= + + + + + +

( )1 2 3 4 5 1 2 3

1 2 3 4 5

( , , , , ) ( , , ) / 2( , , , , )

SSR X X X X X SSR X X XF

MSE X X X X X−

=

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-23

© 2004 Prentice-Hall, Inc. Chap 14-67

Evaluating the Presence of Interaction with Numerical Variables

Suppose X1, X2 and X3 are Numerical VariablesTo Test If the Independent Variables Interact with Each OtherModel:

Hypotheses:H0: β4 = β5 = β6 = 0 (no interaction among X1, X2 and X3 )H1: at least one of β4, β5, β6 ≠ 0 (at least one pair of X1, X2, X3 interact with each other)

Perform a Partial F Test

0 1 1 2 2 3 3 4 1 2 5 1 3 6 2 3i i i i i i i i i i iY X X X X X X X X Xβ β β β β β β ε= + + + + + + +

( )1 2 3 4 5 6 1 2 3

1 2 3 4 5 6

( , , , , , ) ( , , ) / 3( , , , , , )

SSR X X X X X X SSR X X XF

MSE X X X X X X−

=

© 2004 Prentice-Hall, Inc. Chap 14-68

Logistic Regression Model

Enables the Use of Regression Model to Predict the Probability of a Particular Categorical Response for a Given Set of Explanatory VariablesBased on the Odds Ratio

Represents the probability of a success compared with the probability of failure

probability of successOdds ratio1 probability of success

=−

© 2004 Prentice-Hall, Inc. Chap 14-69

Logistic Regression Model

Logistic Regression Model

Logistic Regression Equation

Estimated Odds Ratio

Estimated Probability of Success

( ) 0 1 1 2 2ln odds ratio i i k ki iX X Xβ β β β ε= + + + + +L

(continued)

( ) 0 1 1 2 2ln estimated odds ratio i i k kib b X b X b X= + + + +L

( )ln estimated odds ratioeestimated odds ratio

1 estimated odds ratio+

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-24

© 2004 Prentice-Hall, Inc. Chap 14-70

Interpretation of Estimated Slope Coefficients

Logistic Regression Equation Has to be Estimated Using Computer Statistical Software, e.g. Minitab®

The Estimated Slope Coefficient bj Measures the Estimated Change in the Natural Logarithm of the Odds Ratio as a Result of a One Unit Change in the Independent Variable Xj Holding Constant the Effects of all the Other Independent Variables

© 2004 Prentice-Hall, Inc. Chap 14-71

The Deviance Statistic

Use to Test whether the Logistic Regression is a Good-Fitting ModelHypotheses

H0 : The model is a good-fitting modelH1 : The model is not a good-fitting model

Test StatisticThe deviance statistic has a χ2 distribution with (n – k – 1) degrees of freedomThe rejection region is always in the upper tail

© 2004 Prentice-Hall, Inc. Chap 14-72

Testing Significance of an Independent Variable

Hypotheses(Xj is not significant)(Xj is significant)

Test StatisticThe Wald statistic is normally distributedA two-tail test with left and right-tail rejection regions

0 : 0jH β =

1 : 0jH β ≠

Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.

Chapter 14 Student Lecture Notes 14-25

© 2004 Prentice-Hall, Inc. Chap 14-73

Chapter SummaryDeveloped the Multiple Regression ModelDiscussed Residual PlotsAddressed Testing the Significance of the Multiple Regression ModelDiscussed Inferences on Population Regression CoefficientsAddressed Testing Portions of the Multiple Regression ModelDiscussed Dummy-Variables and Interaction TermsAddressed Logistic Regression Model