lesson 14 - 3

13
Lesson 14 - 3 Multiple Regression Models

Upload: ainsley-carpenter

Post on 31-Dec-2015

21 views

Category:

Documents


0 download

DESCRIPTION

Lesson 14 - 3. Multiple Regression Models. Objectives. Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the coefficients of a multiple regression equation Determine R 2 and adjusted R 2 Perform an F-test for lack of fit - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lesson 14 - 3

Lesson 14 - 3

Multiple Regression Models

Page 2: Lesson 14 - 3

Objectives

• Obtain the correlation matrix

• Use technology to find a multiple regression equation

• Interpret the coefficients of a multiple regression

equation

• Determine R2 and adjusted R2

• Perform an F-test for lack of fit

• Test individual regression coefficients for significance

• Construct confidence and prediction intervals

• Build a regression model

Page 3: Lesson 14 - 3

Vocabulary• Correlation matrix – shows the linear correlation

among all variables under consideration in a multiple regression model

• Multicollinearity – when two explanatory variables have a high linear correlation between themselves

• Additive effect – explanatory variables do not interact

• Adjusted R2 – modifies the value of R2 based on the sample size, n, and the number of explanatory variables, k; will decrease if an explanatory variable is added to the model that does little to explain the variation in the response variable

Page 4: Lesson 14 - 3

Multiple Regression Model

yi = β0 + β1x1i + β2x2i + … + βkxki + εi

where

yi is the value of the response variable for the ith individualβ0, β1, β2, , βk ,are the parameters to be estimated based on the sample datax1i is the ith observation for the first explanatory variable,x2i is the ith observation for the second explanatory variable and so onεi is am independent random error term that is normally distributed with mean 0 and variance = σ²i = 1, 2, 3, …, n, where n is the sample size

Note: although formulas exists to estimate β0, β1, β2, … , βk exist, we will use Excel to obtain estimates

Page 5: Lesson 14 - 3

Correlation Matrix

• Its good that explanatory variables are highly correlated (either positively or negatively) with the response variable

• There may be problems if the explanatory variables are highly correlated with each other (multi-collinearity)

• General Rule: |correlation| > 0.7 then multi-collinearity may be a problem

Variables X1 X2 X3 Response

X1 1 -------- -------- --------

X2 0.7826 1 -------- --------

X3 -0.2134 -.1826 1 --------

Response -0.7821 -0.9218 0.6487 1

Page 6: Lesson 14 - 3

note:modifies R2 based on sample size, n, and the number of explanatory variables, k

to compensate for adding more variables to the model

explained variation unexplained variationR2 = ------------------------- = 1 - ----------------------------- total variation total variation

n – 1 R2

adj = 1 – ------------- (1 – R2) n – k – 1

R2 and Adjusted R2 Values

Page 7: Lesson 14 - 3

Adjusted R²

• The adjusted R² is used in multiple regression models

• The adjusted R² will decrease if a variable is added to the model that does little to explain the variation in the response variable.

• The adjusted R² will increase if a variable is taken from the model that does little to explain the variation in the response variable.

Page 8: Lesson 14 - 3

Hypothesis Test in Multiple Regression

• The null hypothesis is that none of the explanatory variables have a significant linear relation with the response variable

• The alternative hypothesis is that at least on of the explanatory variables has a significant linear relation with the response variable

Page 9: Lesson 14 - 3

with k – 1 degrees of freedom in the numerator and, n – k degrees of freedom in the denominator

where k is the number of explanatory variables n is the sample size

NOTE: H0: β0 = β1 = β2 = … = βk = 0

use P-value compared to level of significance, α, for Decision Rule

Mean Square due to Regression MSRF = ------------------------------------------- = ------------ Mean Square Error MSE

R2 n – (k + 1)F = ---------- · --------------- 1 – R2 k

F – Test Statistic Using R2

F Test Statistic for Multiple Regression

Page 10: Lesson 14 - 3

Guidelines in Developing a Multiple Regression Model (backwards step-wise regression)

1. Construct a correlation matrix to help identify the explanatory variables that have a high correlation with the response variable. In addition, look for any indication that the explanatory variables are correlated with each other. If two explanatory variables have high correlation, then it’s a tip-off to watch out for multicollinearity – but not conclusive evidence.

2. See if the multiple regression model uses all the explanatory variables that have been identified by the researcher.

3. If the null hypothesis that all the slope coefficients are zero has been rejected, we proceed to look at the individual slope coefficients. Identify those slope coefficients that have small t-test statistics (hence large p-values). These are explanatory variable\ candidates that could be removed from the model. Remove one at a time and then recomputed the regression model.

4. Repeat Step 3 until all slope coefficients are significantly different from zero.

5. Use residual plots to check model appropriateness

Page 11: Lesson 14 - 3

Backwards Step-wise Regression

• Put all possible variables into the model• Run regression model (focus on adjusted R²)• Pull out the variable with the highest p-value

– one with the least likely probability of having a linear relationship with the response variable

• Rerun the model – if adjusted R² goes up; repeat procedures– if adjusted R² goes down then stop

Page 12: Lesson 14 - 3

Example 9 on page 770 - 773

Page 13: Lesson 14 - 3

Summary and Homework

• Summary– Given the appropriate conditions, we can perform

inference on whether the slope and intercept are significantly different from 0

– We can also calculate confidence and prediction intervals to quantify the accuracy of our predictions of the response variable y

– Multiple regression models are models where more than one explanatory variable is considered

• Homework– pg 774 - 782: 1, 3, 4, 6, 8, 17