part 16: regression model specification 16-1/25 statistics and data analysis professor william...

26
Part 16: Regression Model Specification 6-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

Upload: britton-malone

Post on 04-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-1/25

Statistics and Data Analysis

Professor William Greene

Stern School of Business

IOMS Department

Department of Economics

Page 2: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-2/25

Statistics and Data Analysis

Part 16 – Aspects of Regression

Page 3: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-3/25

Regression Models

Prediction Loose Ends

Trimming Truncation

Summary Where to next

Page 4: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-4/25

Prediction

Use of the model for predictionUse “x” to predict y based on y = α+βx+ε

Sources of uncertainty Predicting “x” first Using sample estimates of α and β (and,

possibly, σ) Can’t predict noise, ε Predicting outside the range of experience –

uncertainty about the reach of the regression model.

Page 5: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-5/25

Base Case Prediction For a given value of x*: Use the equation.

True y = α + βx* + ε Obvious estimate: y = a + bx

(Note, no prediction for ε) Minimal sources of prediction error

Can never predict ε at all The farther from the center of experience,

the greater is the uncertainty.

Page 6: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-6/25

Prediction Interval

22e N 2

i 1 i

Prediction includes a range of uncertainty

ˆPoint estimate: y a bx*

The range of uncertainty around the prediction:

1 (x * x)a bx* 1.96 S 1+

N (x x)

The usual 95% Due to ε Due to estimating α and β with a and b

(Remember the empirical rule, 95% of the distribution within two standard deviations.)

Page 7: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-7/25

Slightly Simpler Formula for Prediction

22 2e

Prediction includes a range of uncertainty

ˆPoint estimate: y a bx*

The range of uncertainty around the prediction:

1a bx* 1.96 S 1+ (x * x) SE(b)

N

Page 8: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-8/25

Prediction from Internet Buzz Regression

Buzz = 0.48242

Max(Buzz)= 0.79

Page 9: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-9/25

Prediction Interval for Buzz = .8

2 2 2

2 2 2

Predict Box Office for Buzz = .8

a+bx = -14.36 + 72.72(.8) = 43.82

1 s 1 (.8 Buzz) SE(b)

N

113.3863 1 (.8 .48242) 10.94

62

13.93

Interval = 43.82 1.96(13.93)

= 16.52 to

e

71.12

Page 10: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-10/25

Predicting Using a Loglinear Equation

Predict the log first Prediction of the log Prediction interval – (Lower to Upper)

Prediction = exp(lower) to exp(upper)

This produces very wide intervals.

Page 11: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-11/25

Interval Estimates for the Sample of Monet Paintings

ln (SurfaceArea)

ln (

US$)

7.67.47.27.06.86.66.46.26.0

18

17

16

15

14

13

12

11

10

S 1.00645R-Sq 20.0%R-Sq(adj) 19.8%

Regression95% PI

Fitted Line Plotln (US$) = 2.825 + 1.725 ln (SurfaceArea)Regression Analysis: ln (US$) versus

ln (SurfaceArea) The regression equation isln (US$) = 2.83 + 1.72 ln (SurfaceArea)Predictor Coef SE Coef T PConstant 2.825 1.285 2.20 0.029ln (SurfaceArea) 1.7246 0.1908 9.04 0.000S = 1.00645 R-Sq = 20.0% R-Sq(adj) = 19.8%

Mean of ln (SurfaceArea) = 6.72918

Page 12: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-12/25

Prediction for An Out of Sample Monet

Claude Monet: Bridge Over a Pool of Water Lilies. 1899. Original, 36.5”x29.”

2 2 2

2 2

lnSurface ln(36.5 29) 6.96461

Prediction 2.83 1.72(6.96461) 14.809

1Uncertainty 1.96 1.00645 1 (6.96461 6.72918) (.1908)

328

1.96 1.012942(1.003049) (.23453) (.1908)

1.96(1.008984)

1.977608

Prediction Interval = 14.809 1.977608

= 12.83139 to 16.786608

Page 13: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-13/25

Predicting y when the Model Describes log y

Predicted Price: Mean = Exp(a + bx )

= Exp(14.809 ) = $2

The inter

,700,641.

val predicts log price. What abo

78

Upper Limit

ut the

= Exp(

price?

14.809+1.9776)

= $19,513,166.53

Lower Limit = Exp(14.809-1.9776)

= $ 373,771.53

Page 14: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-14/25

39.5 x 39.125. Prediction by our model = $17.903MPainting is in our data set. Sold for 16.81M on 5/6/04 Sold for 7.729M 2/5/01Last sale in our data set was in May 2004Record sale was 6/25/08. market peak, just before the crash.

Van Gogh: Irises

Page 15: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-15/25

Uncertainty in Prediction

2 2 2e

1 1.96 s 1+ (x* x) (SE(b))

N

The interval is narrowest at x* = , the center of our experience. The interval widens as we move away from the center of our experience to reflect the greater uncertainty.(1) Uncertainty about the prediction of x(2) Uncertainty that the linear relationship will continue to exist as we move farther from the center.

x

Page 16: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-16/25

http://www.nytimes.com/2006/05/16/arts/design/16oran.html

Page 17: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-17/25

32.1” (2 feet 8 inches)

26.2” (2 feet 2.2”)

167” (13 feet 11 inches)

78.74” (6 Fe

et 7 inch

)

"Morning", Claude Monet 1920-1926, oil on canvas 200 x 425 cm, Musée de l

Orangerie, Paris France. Left panel

Page 18: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-18/25

Predicted Price for a Huge Painting

Regression Equation: ln $ = 2.825 + 1.725 ln Surface Area

Width = 167 Inches

Height = 78.74 Inches

Area = 13,149.58 Square inches, ln = 9.484

Predicted ln Price = 2.825 + 1.725 (9.484) = 19.185

Predicted Price = exp(19.185) = $214,785,473.40

Page 19: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-19/25

Prediction Interval for Price

22 2

e

Prediction Interval for ln Price is

1Predicted ln Price 1.96 S 1 ln Area* ln Area ( )

ln Area* = ln (167 78.74) = 9.484

ln Area = 6.72918 (computed from the data)

S = 1.00645 (from

e SE bN

22 2

regression results)

SE(b) = 0.1908

119.185 1.96 (1.00645) 1 9.484 6.72918 (.1908)

328

19.185 2.228 = [16.957 to 21.413]

Predicted Price = exp(16.957) to exp(21.413) =

$23,138,304 to $1,993,185,600

Page 20: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-20/25

Use the Monet Model to Predict a Price for a Dali?

118” (9 feet 10 inches)

157

” (1

3 F

eet

1 in

ch)

Hallucinogenic Toreador

26

.2”

(2 f

ee

t 2

.2”) 32.1” (2 feet 8 inches)

Average Sized Monet

Page 21: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-21/25

Page 22: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-22/25

Forecasting Out of Sample

Income

G

2750025000225002000017500150001250010000

8

7

6

5

4

3

S 0.370241R-Sq 88.0%R-Sq(adj) 87.8%

Regression95% PI

Fitted Line PlotG = 1.928 + 0.000179 Income

Per Capita Gasoline Consumption vs. Per Capita Income, 1953-2004.

How to predict G for 2017? You would need first to predict Income for 2017.

How should we do that?

Regression Analysis: G versus Income The regression equation isG = 1.93 + 0.000179 IncomePredictor Coef SE Coef T PConstant 1.9280 0.1651 11.68 0.000Income 0.00017897 0.00000934 19.17 0.000S = 0.370241 R-Sq = 88.0% R-Sq(adj) = 87.8%

Page 23: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-23/25

Data Trimming

ln (SurfaceArea)

ln (

US$)

9876543

18

17

16

15

14

13

12

11

10

9

S 1.10354R-Sq 33.4%R-Sq(adj) 33.2%

Fitted Line Plotln (US$) = 5.290 + 1.326 ln (SurfaceArea)

ln (SurfaceArea)

ln (

US$)

7.67.47.27.06.86.66.46.26.0

18

17

16

15

14

13

12

11

10

S 1.09636R-Sq 17.8%R-Sq(adj) 17.6%

Fitted Line Plotln (US$) = 3.068 + 1.662 ln (SurfaceArea)

All 430 Sales:

4.290 + 1.326 log area

377 Sales of area 403.4 < area < 2981.0(log > 6 and < 8)

3.068 + 1.662 log area The sample is restricted to particular values of X – area between 403 and 2981. Trimming is generally benign, but the regression should be understood to apply to the specified range of x. The trimming is based on a variable not related to the underlying noise in Y.

DataSubset Worksheet Rows that match condition.

Page 24: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-24/25

Truncation

ln (SurfaceArea)

ln (

US$)

7.57.06.56.05.5

15.0

14.5

14.0

13.5

13.0

S 0.487426R-Sq 5.9%R-Sq(adj) 5.4%

Fitted Line Plotln (US$) = 11.44 + 0.3821 ln (SurfaceArea)

ln (SurfaceArea)

ln (

US$)

9876543

18

17

16

15

14

13

12

11

10

9

S 1.10354R-Sq 33.4%R-Sq(adj) 33.2%

Fitted Line Plotln (US$) = 5.290 + 1.326 ln (SurfaceArea)

Entire Sample: 5.290+1.326 log AreaSubsample: 500,000 < Price < 3,000,000 11.44 + 0.3821 log Area

Truncation based on the values of the dependent variable is VERY BAD. It reduces and sometimes destroys the relationship. This is one reason we resist removing “outliers” from the sample.

Page 25: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-25/25

Where Have We Been? Sample data – describing, display Probability models

Models for random experiments Models for random processes underlying

sample data Random variables Models for covariation of random variables Linear regression model for covariation of a

pair of variables

Page 26: Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department

Part 16: Regression Model Specification16-26/25

Where Do We Go From Here? Simple linear regression

Thus far, mostly a descriptive device Use for prediction and forecasting Yet to consider: Statistical inference, testing the

relationship Multiple linear regression

More than one variable to explain the variation of Y More elaborate model building