linear regression models powerful modeling technique tease out relationships between...

24
Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need an error term Measurement errors, wrong model, omitted variables, inherent randomness Linear models often misused.

Post on 22-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Linear Regression Models Powerful modeling technique Tease out relationships between

“independent” variables and 1 “dependent” variable

Models not perfect…need an error term Measurement errors, wrong model, omitted

variables, inherent randomness Linear models often misused.

Page 2: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Example: Lake Water Quality Chlorophyll-a (C) widely used indicator –

measure of eutrophication Nitrogen (N) associated with

eutrophication Q: Golf Course Development. Nitrogen

expected to . By how much will C increase/decrease?

How should we proceed?

Page 3: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Plot C vs. N

5 1 0 1 5 2 0 2 5

N i t ro g e n

0

5 0

1 0 0

1 5 0

Ch

loro

ph

yll

Page 4: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

A “Better” Model Explain (single) regression line (model?).

Neg. relationship suggests a problem. Omitted variable: Phosphorus (P)

Want to tease out effect of N, P separately. Write a Multiple Linear Regression Model:

Model designed to “tease out” effect of N and effect of P, separately, on C.

(**) Define and interpret variables, parameters.

ii2i10i NPC

Page 5: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Estimation Use data to estimate parameter values

that give “best fit”: b0=-9.4, b1=0.3, b2=1.2

Answer: A one unit increase in N, results in about a 1.2 unit increase in C.

Importance: Omitting phosphorus from model introduced significant bias!!!

Page 6: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Question: US Gas Consumption Gasoline consumption produces

many negative byproducts. Policy may be directed at increasing

the price of gas to reduce consumption.

But what is effect of price change? Question: What is the price elasticity

of demand for gasoline in the U.S.?

Page 7: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Some Gasoline Data

1962 1972 1982 1992

YEAR

0.7

0.8

0.9

1.0

1.1

1.2

G.P

OP

0.6 1.1 1.6 2.1 2.6 3.1 3.6 4.1

PG

0.7

0.8

0.9

1.0

1.1

1.2

G.P

OP

Page 8: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Gas Data Cont’d Gas consumption increases through time.

But no info here about price. Next plot shows (+) relationship between

gas price and gas consumption. Note opposite of demand curve. Something is wrong here…

Just as in Eutrophication problem, may have omitted important variables.

May have other problems, too.

Page 9: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

The OLS “Estimator” Estimator: A rule or strategy for using

data to estimate an unknown parameter. Defined before the data are drawn.

Ordinary Least Squares (OLS) estimator finds value of parameter that minimizes sum of squared deviations (see C vs. N plot)

Several assumptions for OLS estimator to apply to a model

Page 10: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Linear Model The model must be linear

Linear in parameters, not in variables.• Difference between parameter, variable.

Examples:

t)S1(

t1t

t3t

t

2t

t

ttt

teSR

Z)Xlog(

XY

XY

Page 11: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Transforming Models Previous “Ricker” model is non-

linear (in the parameter). Sometimes, can transform model so

linear. When plot, graph is nonlinear.

Take log of both sides, giving:

)log()S1()Slog()Rlog( ttt1t

Page 12: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

CLRM: Assumption 1 Dependent variable (Y) is function of

specific set of independent variables (X’s). Linear in parameters Additive error Coefficients are constant but unknown

Violations called “specification errors”, e.g.

Wrong regressors (a.k.a. indep. vars; X’s) Nonlinearity Changing parameters (e.g. through time)

Page 13: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

CLRM: Assumption 2 Disturbances (i’s) are independently and

identically distributed ~ (0,2) Typically we assume i~ N(0,2) Mean = 0 Constant variance, 2 (but unknown) Errors uncorrelated with one another

Example of violations: Measurement Bias (seep gas flux) Heteroskedasticity (variance differs). Autocorrelated Errors (disturbances correlated)

Page 14: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

CLRM: Assumption 3 It is possible to repeat the sample with

same independent variables. If had same levels of explanatory vars, would

it be possible to generate same value of Y? Common Violations:

Errors in variables – measurement error in X. Autoregression – when lagged dependent

variable should be independent variable Simultaneous Equations – several

relationships act jointly.

Page 15: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Properties of Estimators Estimators have many properties.

“6” is an estimator, but not a very good one. Two main properties we care about:

Unbiased: The expected distance of estimator from thing it is estimating is 0.

Efficient: Small variance (spread) “6” is biased, but has a very small variance

(zero). OLS estimator is unbiased and has minimum

variance of all unbiased estimators.

Page 16: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Correlation vs. Causation Now we know just enough to be

dangerous! Can estimate how any set of variables affects

some other variable….Very Powerful. Problem is: Correlation doesn’t imply

Causation! …. Why Data Mining is bad. Chicken production, Global CO2. May be “spurious” (no underlying relationship)

Difficult to tease out statistically. “Granger Causality”

Page 17: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Violations & Consequences

Problem Consequences

Autocorrelation Unbiased, wrong inf.

Heterskedasticity Unbiased, wrong inf.

Contemporaneous Correlation (X, corr.)

Biased

Multicollinearity Usually OK

Omitted Variables Biased

Included Regressors Unbiased, extra noise

True model nonlinear Biased, Wrong inf.

Page 18: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Guide to Model Specification

1. Start with theory to generate model2. Check assumptions of CLRM3. Collect and plot data4. Estimate model, test restrictions

Possibly perform Box-Cox transform5. Check R2, and “Adjusted R2”6. Plot residuals – look for patterns7. Seek explanations for patterns

Page 19: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

What’s a Residual? General form of linear model:

Graphically on board.

)"residual("YYˆ

)predicted(XˆˆY

)true(XY

iii

iii

Page 20: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Residual Plots Residuals vs. Fit Normal Quantile

Plot

Fitted : Phosphorus + Nitrogen

Res

idua

ls

50 100 150 200

-40

-20

020

4060

7

10

14

Quantiles of Standard Normal

Res

idua

ls

-2 -1 0 1 2

-40

-20

020

4060

7

10

14

Page 21: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Back to Gasoline Consumption Recall, interested in how gas consumption

is affected by price increase (say $0.10/gal.)

Variables: Gas consumption per capita (G) Gas price (Pg) Income (Y) New car price (Pnc) Used car price (Puc)

Page 22: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

2 Alternative Specifications Linear specification:

Log-log specification (often used with economic data)

One way to test specification is Box-Cox Transform (see 3 lectures back)

tt4t3t2t10t PucPncYPgG

tt4t3t2t10t )Puclog()Pnclog()Ylog()Pglog()Glog(

Page 23: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Results of Linear Model

Parameter estimate, (p-value of t-test). Low p-value: “statistically significant”

R2 measures goodness of fit of model. Low p-value of F statistic means model

has explanatory power.

b0 b b2 b3 b4 R2 p (F)

-.09(.08)

-.04(.002)

.0002(.000)

-.10(.11)

-.04(.08)

.97 .000

Page 24: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need

Answer to Question A 1 unit increase in price leads to

a .04 unit decrease in gas consumption.

Units are: G(1000 gallons), Pg($). So, a $0.10 increase in gas price

leads to, on average, a 4 gallon decrease in gas consumption…not much!