economics 173 business statistics lecture 20 fall, 2001© professor j. petry

16
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry http://www.cba.uiuc.edu/jpetry/ Econ_173_fa01/

Upload: shanon-manning

Post on 03-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

Economics 173Business Statistics

Lecture 20

Fall, 2001©

Professor J. Petry

http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/

Page 2: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

2

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.4924143R Square 0.2424719Adjusted R Square 0.20189Standard Error 40.243529Observations 60

ANOVAdf SS MS F Significance F

Regression 3 29029.71625 9676.57208 5.974883 0.001315371Residual 56 90694.33308 1619.54166Total 59 119724.0493

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 51.391216 23.51650385 2.18532554 0.033064 4.282029664 98.5004Lot size 0.6999045 0.558855319 1.25238937 0.215633 -0.419616528 1.819425Trees 0.6788131 0.229306132 2.96029204 0.0045 0.219458042 1.138168Distance -0.3783608 0.195236549 -1.9379609 0.057676 -0.769466342 0.012745

Page 3: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

3

• Example – Vacation Homes (18.1)1. What is the standard error of the estimate? Interpret its value.2. What is the coefficient of determination? What does this statistic tell

you?3. What is the coefficient of determination, adjusted for degrees of

freedom? Why does this value differ from the coefficient of determination? What does this tell you about the model?

=========================================================1. Test the overall validity of the model. What does the p-value of the test

statistic tell you?2. Interpret each of the coefficients.3. Test to determine whether each of the independent variables is linearly

related to the price of the lot.

Page 4: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

4

• The required conditions for the model assessment to apply must be checked.

– Is the error variable normally distributed?

– Is the error variance constant?– Are the errors independent?– Can we identify outliers?– Is multicollinearity a problem?

18.4 Regression Diagnostics - II

Draw a histogram of the residuals

Plot the residuals versus y

Plot the residuals versus the time periods

Page 5: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

5

• Example 18.2 House price and multicollinearity

– A real estate agent believes that a house selling price can be predicted using the house size, number of bedrooms, and lot size.

– A random sample of 100 houses was drawn and data recorded.

– Analyze the relationship among the four variables

Price Bedrooms H Size Lot Size124100 3 1290 3900218300 4 2080 6600117800 3 1250 3750

. . . .

. . . .

Page 6: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

6

Regression StatisticsMultiple R 0.74833R Square 0.559998Adjusted R Square0.546248Standard Error25022.71Observations 100

ANOVAdf SS MS F Significance F

Regression 3 7.65E+10 2.55E+10 40.7269 4.57E-17Residual 96 6.01E+10 6.26E+08Total 99 1.37E+11

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 37717.59 14176.74 2.660526 0.009145 9576.963 65858.23Bedrooms 2306.081 6994.192 0.329714 0.742335 -11577.3 16189.45H Size 74.29681 52.97858 1.402393 0.164023 -30.8649 179.4585Lot Size -4.36378 17.024 -0.25633 0.798244 -38.1562 29.42862

• Solution• The proposed model is

PRICE = 0 + 1BEDROOMS + 2H-SIZE +3LOTSIZE + – Excel solution

The model is valid, but no variable is significantly relatedto the selling price !!

Page 7: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

7

– when regressing the price on each independent variable alone, it is found that each variable is strongly related to the selling price.

– Multicollinearity is the source of this problem.Price Bedrooms H Size Lot Size

Price 1Bedrooms 0.645411 1H Size 0.747762 0.846454 1Lot Size 0.740874 0.83743 0.993615 1

• Multicollinearity causes two kinds of difficulties:– The t statistics appear to be too small.– The coefficients cannot be interpreted as “slopes”.

• However,

Page 8: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

8

• Remedying violations of the required conditions

– Nonnormality or heteroscedasticity can be remedied using transformations on the y variable.

– The transformations can improve the linear relationship between the dependent variable and the independent variables.

– Many computer software systems allow us to make the transformations easily.

Page 9: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

9

• A brief list of transformations» y’ = log y (for y > 0)

• Use when the s increases with y, or• Use when the error distribution is positively skewed

» y’ = y2

• Use when the s2 is proportional to E(y), or

• Use when the error distribution is negatively skewed» y’ = y1/2 (for y > 0)

• Use when the s2 is proportional to E(y)

» y’ = 1/y• Use when s2

increases significantly when y increases beyond some value.

Page 10: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

10

• Example 18.3: Analysis, diagnostics, transformations.

– A statistics professor wanted to know whether time limit affect the marks on a quiz?

– A random sample of 100 students was split into 5 groups.– Each student wrote a quiz, but each group was given a

different time limit. See data below.

Time 40 45 50 55 6020 24 26 30 32

23 26 25 32 31

. . . . .

. . . . .

Marks Analyze these results, and include diagnostics

Page 11: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

11

0

10

20

30

40

50

-2.5 -1.5 -0.5 0.5 1.5 2.5 MoreSUMMARY OUTPUT

Regression StatisticsMultiple R 0.86254R Square 0.743974Adjusted R Square 0.741362Standard Error 2.304609Observations 100

ANOVAdf SS MS F Significance F

Regression 1 1512.5 1512.5 284.7743 9.42E-31Residual 98 520.5 5.311224Total 99 2033

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept -2.2 1.64582 -1.33672 0.184409 -5.46608 1.066077Time 0.55 0.032592 16.87526 9.42E-31 0.485322 0.614678

This model is useful andprovides a good fit.

The errors seem to benormally distributed

The model tested:MARK = 0 + 1TIME +

Page 12: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

12

-3

-2

-1

0

1

2

3

4

20 22 24 26 28 30 32

Standardized errors vs. predicted mark.

The standard error of estimate seems to increase with the predicted value of y.

Two transformations are used to remedy this problem:1. y’ = logey2. y’ = 1/y

Page 13: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

13

Let us see what happens when a transformation is applied

Mark

15

20

25

30

35

40

0 20 40 60 80

LogMark

2

3

4

0 20 40 60 80

40,18

40,2340, 3.135

40, 2.89

Loge23 = 3.135

Loge18 = 2.89

The original data, where “Mark” is a function of “Time”

The modified data, where LogMark is a function of “Time"

Page 14: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

14

The new regression analysis and the diagnostics are:

The model tested:LOGMARK = ’0 + ’1TIME + ’

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.8783R Square 0.771412Adjusted R Square0.769079Standard Error0.084437Observations 100

ANOVAdf SS MS F Significance F

Regression 1 2.357901 2.357901 330.7181 3.58E-33Residual 98 0.698705 0.00713Total 99 3.056606

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept 2.129582 0.0603 35.31632 1.51E-57 2.009918 2.249246Time 0.021716 0.001194 18.18566 3.58E-33 0.019346 0.024086

Predicted LogMark = 2.1295 + .0217Time

This model is useful andprovides a good fit.

Page 15: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

15

The errors seem to benormally distributed

Standard Residuals

-4

-2

0

2

4

2.9 3 3.1 3.2 3.3 3.4 3.5

0

10

20

30

40

-2.5 -1.5 -0.5 0.5 1.5 2.5 More

The standard errors still changes with the predicted y, but the change is smaller than before.

Page 16: Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry

16

Let TIME = 55 minutes

LogMark = 2.1295 + .0217Time = 2.1295 + .0217(55) = 3.323

To find the predicted mark, take the antilog:

antiloge3.323 = e3.323 = 27.743

How do we use the modified model to predict?