multiple regression (1)

59
Slide 1 Shakeel Nouman M.Phil Statistics Multiple Regression (1) Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Upload: shakeel-nouman

Post on 17-Feb-2017

788 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Multiple regression (1)

Slide 1

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Shakeel NoumanM.Phil Statistics

Multiple Regression (1)

Page 2: Multiple regression (1)

Slide 2

• Using Statistics• The k-Variable Multiple Regression Model• The F Test of a Multiple Regression Model• How Good is the Regression• Tests of the Significance of Individual

Regression Parameters• Testing the Validity of the Regression

Model• Using the Multiple Regression Model for

Prediction

Multiple Regression (1)11

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 3: Multiple regression (1)

Slide 3

• Qualitative Independent Variables• Polynomial Regression• Nonlinear Models and Transformations• Multicollinearity• Residual Autocorrelation and the Durbin-

Watson Test• Partial F Tests and Variable Selection

Methods• The Matrix Approach to Multiple

Regression Analysis• Summary and Review of Terms

Multiple Regression (2)11

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 4: Multiple regression (1)

Slide 4

Slope: 1

Intercept: 0

Any two points (A and B), or an intercept and slope (0 and

1), define a line on a two-dimensional surface.

B

A

x

y

x2

x1

y

C

A

B

Any three points (A, B, and C), or an intercept and coefficients of x1 and x2 (0 , 1, and 2), define a plane in a

three-dimensional surface.

Lines Planes

11-1 Using Statistics

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 5: Multiple regression (1)

Slide 5

y x x 0 1 1 2 2

The population regression model of a dependent variable, Y, on a set of k independent variables, X1, X2,. . . , Xk is given by:

Y= 0 + 1X1 + 2X2 + . . . + kXk +

where 0 is the Y-intercept of the regression surface and each i , i = 1,2,...,k is the slope of the regression surface - sometimes called the response surface - with respect to Xi.

x2

x1

y 2

10

Model assumptions:1. ~N(0,2), independent of other errors.2. The variables Xi are uncorrelated with the error term.

11-2 The k-Variable Multiple Regression Model

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 6: Multiple regression (1)

Slide 6

In a simple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression line.

In a multiple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression plane.

X

Y

x2

x1

y

y b b x 0 1y b b x b x 0 1 1 2 2

Simple and Multiple Least-Squares Regression

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 7: Multiple regression (1)

Slide 7

The estimated regression relationship:

where is the predicted value of Y, the value lying on the estimated regression surface. The terms b0,...,k are the least-squares estimates of the population regression parameters i.

Y b b X b X b Xk k 0 1 1 2 2

Y

The actual, observed value of Y is the predicted value plus an error:

yj = b0+ b1 x1j+ b2 x2j+. . . + bk xkj+e

The Estimated Regression Relationship

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 8: Multiple regression (1)

Slide 8

2

22211202

212

2

11101

22110

xbxxbxbyx

xxbxbxbyx

xbxbnby

Minimizing the sum of squared errors with respect to the estimated coefficients b0, b1, and b2 yields the following

normal equations:

Least-Squares Estimation: The 2-Variable Normal Equations

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 9: Multiple regression (1)

Slide 9

Y X1 X2 X1X2 X12 X2

2 X1Y X2Y 72 12 5 60 144 25 864 360 76 11 8 88 121 64 836 608 78 15 6 90 225 36 1170 468 70 10 5 50 100 25 700 350 68 11 3 33 121 9 748 204 80 16 9 144 256 81 1280 720 82 14 12 168 196 144 1148 984 65 8 4 32 64 16 520 260 62 8 3 24 64 9 496 186 90 18 10 180 324 100 1620 900--- --- --- --- ---- --- ---- ----

743 123 65 869 1615 509 9382 5040

Normal Equations:

743 = 10b0+123b1+65b2

9382 = 123b0+1615b1+869b2

5040 = 65b0+869b1+509b2

b0 = 47.164942b1 = 1.5990404b2 = 1.1487479

Estimated regression equation:

. . .Y X X 47164942 15990404 114874791 2

Example 11-1

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 10: Multiple regression (1)

Slide 10Example 11-1: Using the

Template

Regression results for Alka-Seltzer sales

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 11: Multiple regression (1)

Slide 11

Total Deviation = Regression Deviation + Error Deviation SST = SSR + SSE

x2

x1

y

y

Y Y : Error Deviation

Y Y : Regression DeviationTotal deviation: Y Y

Decomposition of the Total Deviation in a Multiple

Regression Model

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 12: Multiple regression (1)

Slide 12

A statistical test for the existence of a linear relationship between Y and any or all of the independent variables X1, x2, ..., Xk:

H0: 1 = 2 = ...= k=0H1: Not all the i (i=1,2,...,k) are 0

Source of Variation

Sum of Squares

Degrees of Freedom

Mean Square

F Ratio

Regression SSR k

Error SSE n - (k+1)

Total SST n-1

MSRSSR

k

MSESSE

n k

( ( ))1

MSTSST

n

( )1

11-3 The F Test of a Multiple Regression Model

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 13: Multiple regression (1)

Slide 13

The test statistic, F = 86.34, is greater than the critical point of F(2, 7) for any

common level of significance(p-value 0), so the null hypothesis is rejected, and we might conclude

that the dependent variable is related to one or more of the independent

variables.0F

F Distribution with 2 and 7 Degrees of Freedom

F0.01=9.55

=0.01

Test statistic 86.34f(F)

Using the Template: Analysis of Variance Table (Example 11-1)

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 14: Multiple regression (1)

Slide 14

The multiple coefficient of determination, R2 , measures the proportion ofthe variation in the dependent variable that is explained by the combinationof the independent variables in the multiple regression model:

= SSRSST = 1- SSE

SST R2

The is an unbiasedestimator of the variance of the populationerrors, denoted by 2

:

=

mean square error

Standard error of estimate

, :

( ( ))( )

( ( ))MSE

SSE

n ky y

n k

s MSE

1

2

1

x2

x1

y

Errors: y - y

11-4 How Good is the Regression

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 15: Multiple regression (1)

Slide 15

The , R 2 , is the coefficient ofdetermination with the SSE and SST divided by their respective degrees of freedom:

= 1 -

SSE

(n - (k + 1))

SST

(n - 1)

adjusted multiple coefficient of determination

R 2

SST

SSESSR

=SSR

SST= 1 -

SSE

SSTR2

Example 11-1: s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%

Decomposition of the Sum of Squares and the Adjusted

Coefficient of Determination

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 16: Multiple regression (1)

Slide 16

Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F Ratio

Regression SSR (k)

Error SSE (n-(k+1))=(n-k-1)

Total SST (n-1)

MSRSSR

k

MSE SSEn k

( ( ))1

MSTSST

n

( )1

FMSR

MSE

=SSR

SST= 1 -

SSE

SSTR

2 = 1 -

SSE

(n - (k + 1))

SST

(n - 1)

=MSE

MSTR

2FR

R

n k

k

2

12

1

( )

( ( ))

( )

Measures of Performance in Multiple Regression and the ANOVA Table

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 17: Multiple regression (1)

Slide 17

Hypothesis tests about individual regression slope parameters:

(1) H0: b1= 0H1: b1 0

(2) H0: b2 = 0H1: b2 0 . . .

(k) H0: bk = 0H1: bk 0

Test statistic for test i t bs bn k

i

i

:( )( ( )

1

0

11-5 Tests of the Significance of Individual Regression Parameters

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 18: Multiple regression (1)

Slide 18

VariableCoefficientEstimate

StandardError t-Statistic

Constant 53.12 5.43 9.783 *X1 2.03 0.22 9.227 *X2 5.60 1.30 4.308 *X3 10.35 6.88 1.504

X4 3.45 2.70 1.259

X5 -4.25 0.38 11.184 *n=150 t0.025=1.96

Regression Results for Individual Parameters

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 19: Multiple regression (1)

Slide 19Example 11-1: Using the

TemplateRegression results for Alka-Seltzer sales

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 20: Multiple regression (1)

Slide 20Using the Template: Example 11-

2

Regression results for Exports to Singapore

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 21: Multiple regression (1)

Slide 2111-6 Testing the Validity of the

Regression Model: Residual Plots

Residuals vs M1

It appears that the residuals are randomly distributed with no pattern and with equal variance as M1 increases

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 22: Multiple regression (1)

Slide 2211-6 Testing the Validity of the

Regression Model: Residual Plots

Residuals vs Price

It appears that the residuals are increasing as the Price increases. The variance of the residuals is not constant.

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 23: Multiple regression (1)

Slide 23Normal Probability Plot for the

Residuals: Example 11-2

Linear trend indicates residuals are normally distributed

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 24: Multiple regression (1)

Slide 24

.

.

.

...

.

.

....

... .

* Outlier

y

x

Regression line without outlier

Regression line with outlier

Outliers

... .... ... ... . .

Point with a large value of xiy

x

*

Regression line when all data are

included

No relationship in this cluster

Influential Observations

Investigating the Validity of the Regression: Outliers and Influential

Observations

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 25: Multiple regression (1)

Slide 25

Unusual ObservationsObs. M1 EXPORTS Fit Stdev.Fit Residual St.Resid 1 5.10 2.6000 2.6420 0.1288 -0.0420 -0.14 X 2 4.90 2.6000 2.6438 0.1234 -0.0438 -0.14 X 25 6.20 5.5000 4.5949 0.0676 0.9051 2.80R 26 6.30 3.7000 4.6311 0.0651 -0.9311 -2.87R 50 8.30 4.3000 5.1317 0.0648 -0.8317 -2.57R 67 8.20 5.6000 4.9474 0.0668 0.6526 2.02R

R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it large influence.

Outliers and Influential Observations: Example 11-2

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 26: Multiple regression (1)

Slide 26

Sales

Advertising

Promotions8.00

18.00

312

63.42

89.76

Estimated Regression Plane for Example 11-1

11-7 Using the Multiple Regression Model for Prediction

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 27: Multiple regression (1)

Slide 27

A (1 - a) 100% prediction interval for a value of Y given values of Xi:

A (1 - a) 100% prediction interval for the conditional mean of Y givenvalues of Xi:

( )

[ ( )]

( ,( ( )))

( ,( ( )))

y t s y MSE

y t s E Y

n k

n k

2 1

2

2 1

Prediction in Multiple Regression

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 28: Multiple regression (1)

Slide 28

MOVIEEARN COST PROM BOOK 1 28 4.2 1.0 0 2 35 6.0 3.0 1 3 50 5.5 6.0 1 4 20 3.3 1.0 0 5 75 12.5 11.0 1 6 60 9.6 8.0 1 7 15 2.5 0.5 0 8 45 10.8 5.0 0 9 50 8.4 3.0 1 10 34 6.6 2.0 0 11 48 10.7 1.0 1 12 82 11.0 15.0 1 13 24 3.5 4.0 0 14 50 6.9 10.0 0 15 58 7.8 9.0 1 16 63 10.1 10.0 0 17 30 5.0 1.0 1 18 37 7.5 5.0 0 19 45 6.4 8.0 1 20 72 10.0 12.0 1

An indicator (dummy, binary) variable of qualitative level A:

if level A is obtained if level A is not obtained

X h

10

11-8 Qualitative (or Categorical) Independent Variables (in

Regression)

EXAMPLE113Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 29: Multiple regression (1)

Slide 29

A multiple regression with two quantitative variables (X1 and X2) and one qualitative variable (X3):

A regression with one quantitative variable (X1) and one qualitative variable (X2):

X1

Y

Line for X2=1

Line for X2=0

b0

b0+b2

x2

x1

y

b3

y b b x b x 0 1 1 2 2y b b x b x b x 0 1 1 2 2 3 3

Picturing Qualitative Variables in Regression

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 30: Multiple regression (1)

Slide 30

b0 X1

YLine for X = 0 and X3 = 1

A regression with one quantitative variable (X1) and two qualitative variables (X2 and X2):

b0+b2

b0+b3

Line for X2 = 1 and X3 = 0

Line for X2 = 0 and X3 = 0

A qualitative variable with r

levels or categories is represented with (r-1) 0/1 (dummy)

variables.

Category X2 X3Adventure 0 0Drama 0 1Romance 1 0

y b b x b x b x 0 1 1 2 2 3 3

Picturing Qualitative Variables in Regression: Three Categories and

Two Dummy Variables

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 31: Multiple regression (1)

Slide 31

Salary = 8547 + 949 Education + 1258 Experience - 3256 Gender (SE) (32.6) (45.1) (78.5) (212.4) (t) (262.2) (21.0) (16.0) (-15.3)

On average, female salaries are $3256 below male salariesGender

if Femaleif Male

10

Using Qualitative Variables in Regression: Example 11-4

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 32: Multiple regression (1)

Slide 32

A regression with interaction between a quantitative variable (X1) and a qualitative variable (X2 ):

X1

YLinforX20

02

0

LinforX21

Slop1

Slop13

y b b x b x b x x 0 1 1 2 2 3 1 2

Interactions between Quantitative and Qualitative Variables: Shifting Slopes

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 33: Multiple regression (1)

Slide 33

One-variable polynomial regression model:Y=0+1 X + 2X2 + 3X3 +. . . + mXm +

where m is the degree of the polynomial - the highest power of X appearing in the equation. The degree of the polynomial is the order of the model.

X1

Y

X1

Y

y b b X 0 1

( )

y b b X b Xb

0 1 2

2

20

y b b X 0 1

y b b X b X b X 0 1 2

2

3

3

11-9 Polynomial Regression

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 34: Multiple regression (1)

Slide 34Polynomial Regression:

Example 11-5

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 35: Multiple regression (1)

Slide 35

Variable Estimate Standard Error T-statistic X1 2.34 0.92 2.54 X2 3.11 1.05 2.96 X1

2 4.22 1.00 4.22 X2

2 3.57 2.12 1.68 X1X2 2.77 2.30 1.20

Polynomial Regression: Other Variables and Cross-Product Terms

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 36: Multiple regression (1)

Slide 36

TheY X X XThe

Y X X X

:

multiplicative model

logarithmic transformation

:

log log log log log log

0 1 2 3

0 1 1 2 2 3 3

1 2 3

11-10 Nonlinear Models and Transformations: Multiplicative

Model

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 37: Multiple regression (1)

Slide 37

TheY eThe

Y X

X

:

exponential model

logarithmic transformation

:

log log log

0

0 1 1

1

Transformations: Exponential Model

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 38: Multiple regression (1)

Slide 38

151050

30

20

10

ADVERT

SALE

S

Sim ple R e gre s s io n of S ale s o n Ad ve rtis ing

3210

3.5

2.5

1.5

LOGADV

LOG

SALE

R e gre s sion of Log(S ale s) on Log(Advertising)

R- S q u a re d = 0 .8 9 5Y = 6 .5 9 2 7 1 + 1.19 176 X

R- Sq uar ed = 0 .9 47Y = 1.70 0 8 2 + 0 .5 53 13 6 X

3210

25

15

5

LOGADV

SALE

S

R- Sq uared = 0 .9 78Y = 3 .6 6 8 2 5 + 6 .78 4 X

Regre s sion of S ale s on Log(Advertising)

22122

1.5

0.5

-0.5

-1.5

Y-HAT

RE

SID

S

R e sidual Plo ts : S ale s vs Log(Advertising)

Plots of Transformed Variables

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 39: Multiple regression (1)

Slide 39

• Square root transformation:Useful when the variance of the regression errors is

approximately proportional to the conditional mean of Y• Logarithmic transformation:

Useful when the variance of regression errors is approximately proportional to the square of the conditional mean of Y

• Reciprocal transformation:Useful when the variance of the regression errors is

approximately proportional to the fourth power of the conditional mean of Y

Y Y

Y Ylog( )

YY1

Variance Stabilizing Transformations

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 40: Multiple regression (1)

Slide 40

E Y X ee

p pp

X

X( )

log

( )

( )

0 1

0 11

1

y

x

1

0

Logistic Function

The logistic function:

Transformation to linearize the logistic function:

Regression with Dependent Indicator Variables

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 41: Multiple regression (1)

Slide 41

x2

x1

Orthogonal X variables provide information from independent sources. No multicollinearity.

x2 x1

Perfectly collinear X variables provide identical information

content. No regression.

Some degree of collinearity. Problems with regression depend

on the degree of collinearity.

x2

x1

A high degree of negative collinearity also causes problems

with regression.

x2x1

11-11: Multicollinearity

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 42: Multiple regression (1)

Slide 42

• Variances of regression coefficients are inflated.• Magnitudes of regression coefficients may be different

from what are expected.• Signs of regression coefficients may not be as expected.• Adding or removing variables produces large changes in

coefficients.• Removing a data point may cause large changes in

coefficient estimates or signs.• In some cases, the F ratio may be significant while the t

ratios are not.

Effects of Multicollinearity

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 43: Multiple regression (1)

Slide 43Detecting the Existence of Multicollinearity: Correlation Matrix of Independent Variables and Variance Inflation

Factors

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 44: Multiple regression (1)

Slide 44

1.00.50.0

100

50

0Rh2

VIFRelationship between VIF and Rh

2

The associated with

where R is the value obtained for the regression of X on the other independent variables.

h2 2

variance inflation factor X

VIF XR

R

h

hh

:

( ) 1

1 2

Variance Inflation Factor

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 45: Multiple regression (1)

Slide 45Variance Inflation Factor (VIF)

Observation: The VIF (Variance Inflation Factor) values for both variables Lend and Price are both greater than

5. This would indicate that some degree of multicollinearity exists with respect to these two

variables. Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 46: Multiple regression (1)

Slide 46

• Drop a collinear variable from the regression

• Change in sampling plan to include elements outside the multicollinearity range

• Transformations of variables• Ridge regression

Solutions to the Multicollinearity Problem

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 47: Multiple regression (1)

Slide 47

An autocorrelation is a correlation of the values of a variable with values of the same variable lagged one or more periods back. Consequences of autocorrelation include inaccurate

estimates of variances and inaccurate predictions.

Lagged Residuals

i i i-1 i-2 i-3 i-4 1 1.0 * * * * 2 0.0 1.0 * * *

3 -1.0 0.0 1.0 * * 4 2.0 -1.0 0.0 1.0 *

5 3.0 2.0 -1.0 0.0 1.0 6 -2.0 3.0 2.0 -1.0 0.0 7 1.0 -2.0 3.0 2.0 -1.0 8 1.5 1.0 -2.0 3.0 2.0 9 1.0 1.5 1.0 -2.0 3.010 -2.5 1.0 1.5 1.0 -2.0

The Durbin-Watson test (first-order autocorrelation):

H0: r1 = 0 H1:r1 0

The Durbin-Watson test statistic:

dei eii

n

eii

n

( )12

22

1

11-12 Residual Autocorrelation and the Durbin-Watson Test

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 48: Multiple regression (1)

Slide 48

k = 1 k = 2 k = 3 k = 4 k = 5 n dL dU dL dU dL dU dL dU dL dU 15 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 0.56 2.21 16 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 0.62 2.15 17 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 0.67 2.10 18 1.16 1.39 1.05 1.53 0.93 1.69 0.82 1.87 0.71 2.06

. . . . . . . . . . . . . . . . . . 65 1.57 1.63 1.54 1.66 1.50 1.70 1.47 1.73 1.44 1.77 70 1.58 1.64 1.55 1.67 1.52 1.70 1.49 1.74 1.46 1.77 75 1.60 1.65 1.57 1.68 1.54 1.71 1.51 1.74 1.49 1.77 80 1.61 1.66 1.59 1.69 1.56 1.72 1.53 1.74 1.51 1.77 85 1.62 1.67 1.60 1.70 1.57 1.72 1.55 1.75 1.52 1.77 90 1.63 1.68 1.61 1.70 1.59 1.73 1.57 1.75 1.54 1.78 95 1.64 1.69 1.62 1.71 1.60 1.73 1.58 1.75 1.56 1.78100 1.65 1.69 1.63 1.72 1.61 1.74 1.59 1.76 1.57 1.78

Critical Points of the Durbin-Watson Statistic: =0.05, n= Sample Size, k = Number of

Independent Variables

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 49: Multiple regression (1)

Slide 49

PositiveAutocorrelation

NegativeAutocorrelation

Test isInconclusive

NoAutocorrelation

Test isInconclusive

0 dL dU 4-dL4-dU 4

For n = 67, k = 4: dU1.73 4-dU2.27 dL1.47 4-

dL2.53 < 2.58 H0 is rejected, and we conclude there is negative first-order

autocorrelation.

Using the Durbin-Watson Statistic

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 50: Multiple regression (1)

Slide 50

Full model:Y = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 +

Reduced model:Y = 0 + 1 X1 + 2 X2 +

Partial F test:H0: 3 = 4 = 0

H1: 3 and 4 not both 0

Partial F statistic:

where SSER is the sum of squared errors of the reduced model, SSEF is the sum of squared errors of the full model; MSEF is the mean square error of the full model [MSEF = SSEF/(n-

(k+1))]; r is the number of variables dropped from the full model.

F(r, (n (k 1))

(SSER

SSEF

) / r

MSEF

11-13 Partial F Tests and Variable

Selection Methods

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 51: Multiple regression (1)

Slide 51

• All possible regressionsRun regressions with all possible

combinations of independent variables and select best model

Variable Selection Methods

A p-value of 0.001 indicates that we should reject the null hypothesis H0: the slopes for Lend and Exch. are zero.

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 52: Multiple regression (1)

Slide 52

• Stepwise proceduresForward selection

» Add one variable at a time to the model, on the basis of its F statistic

Backward elimination» Remove one variable at a time, on the basis of its F

statisticStepwise regression

» Adds variables to the model and subtracts variables from the model, on the basis of the F statistic

Variable Selection Methods

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 53: Multiple regression (1)

Slide 53

ComputFttiticforchvrilnotinthmodl

Entrmotignificnt(mlltpvlu)vrilintomodl

ClcultprtilFforllvrilinthmodl

Ithrvrilwithpvlu>Pout?Rmov

vril

Stop

Y

NoIthrtltonvrilwithpvlu>Pin?

No

Stepwise Regression

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 54: Multiple regression (1)

Slide 54

MTB > STEPWISE 'EXPORTS' PREDICTORS 'M1’ 'LEND' 'PRICE’ 'EXCHANGE'

Stepwise Regression

F-to-Enter: 4.00 F-to-Remove: 4.00

Response is EXPORTS on 4 predictors, with N = 67

Step 1 2Constant 0.9348 -3.4230

M1 0.520 0.361T-Ratio 9.89 9.21

PRICE 0.0370T-Ratio 9.05

S 0.495 0.331R-Sq 60.08 82.48

Stepwise Regression: Using the Computer (MINITAB)

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 55: Multiple regression (1)

Slide 55

MTB > REGRESS 'EXPORTS’ 4 'M1’ 'LEND’ 'PRICE' 'EXCHANGE';SUBC> vif;SUBC> dw.

Regression AnalysisThe regression equation is

EXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE

Predictor Coef Stdev t-ratio p VIFConstant -4.015 2.766 -1.45 0.152

M1 0.36846 0.06385 5.77 0.000 3.2LEND 0.00470 0.04922 0.10 0.924 5.4PRICE 0.036511 0.009326 3.91 0.000 6.3EXCHANGE 0.268 1.175 0.23 0.820 1.4

s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 4 32.9463 8.2366 73.06 0.000

Error 62 6.9898 0.1127Total 66 39.9361

Durbin-Watson statistic = 2.58

Using the Computer: MINITAB

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 56: Multiple regression (1)

Slide 56

Parameter Estimates

Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T|

INTERCEP 1 -4.015461 2.76640057 -1.452 0.1517 M1 1 0.368456 0.06384841 5.771 0.0001 LEND 1 0.004702 0.04922186 0.096 0.9242 PRICE 1 0.036511 0.00932601 3.915 0.0002 EXCHANGE 1 0.267896 1.17544016 0.228 0.8205

Variance Variable DF Inflation

INTERCEP 1 0.00000000 M1 1 3.20719533

LEND 1 5.35391367 PRICE 1 6.28873181

EXCHANGE 1 1.38570639

Durbin-Watson D 2.583(For Number of Obs.) 67

1st Order Autocorrelation -0.321

Using the Computer: SAS (continued)

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 57: Multiple regression (1)

Slide 57

The population regression

yyy

y

x x x xx x x xx x x x

x x x xk

k

k

k

n n n nk

. . . ..

model:

.

.

.

.........

. . . . .

. . . . .

. . . . ..

1

2

3

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

1

2

111

1

3

1

2

3

.

.

.

.

.

.

k k

Y XThe estimated regression

model:

Y = Xb+ e

11-15: The Matrix Approach to Regression Analysis (1)

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 58: Multiple regression (1)

Slide 58

The normal equationsX Xb X Y

Estimatorsb X X X Y

values

Y Xb X X X X Y HYV b X Xs b MSE X X

:

:( )

: ( )( ) ( )( ) ( )

1

1

2 1

2 1

Predicted

The Matrix Approach to Regression Analysis (2)

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer

Page 59: Multiple regression (1)

Slide 59

M.Phil (Statistics)

GC University, . (Degree awarded by GC University)

M.Sc (Statistics) GC University, . (Degree awarded by GC University)

Statitical Officer(BS-17)(Economics & Marketing Division)

Livestock Production Research Institute Bahadurnagar (Okara), Livestock & Dairy Development

Department, Govt. of Punjab

Name                                       Shakeel NoumanReligion                                  ChristianDomicile                            Punjab (Lahore)Contact #                            0332-4462527. 0321-9898767E.Mail                                [email protected] [email protected]

Multiple Regression (1) By Shakeel Nouman M.Phil Statistics Govt. College University Lahore, Statistical Officer