part 6: multiple regression 6-1/35 regression models professor william greene stern school of...
TRANSCRIPT
Part 6: Multiple Regression6-1/35
Regression ModelsProfessor William Greene
Stern School of Business
IOMS Department
Department of Economics
Part 6: Multiple Regression6-2/35
Regression and Forecasting Models
Part 6 – Multiple Regression
Part 6: Multiple Regression6-3/35
Part 6: Multiple Regression6-4/35
Part 6: Multiple Regression6-5/35
Part 6: Multiple Regression6-6/35
Part 6: Multiple Regression6-7/35
Part 6: Multiple Regression6-8/35
Part 6: Multiple Regression6-9/35
Part 6: Multiple Regression6-10/35
Part 6: Multiple Regression6-11/35
Part 6: Multiple Regression6-12/35
Part 6: Multiple Regression6-13/35
Multiple Regression Agenda
The concept of multiple regression Computing the regression equation Multiple regression “model” Using the multiple regression model Building the multiple regression model Regression diagnostics and inference
Part 6: Multiple Regression6-14/35
Concept of Multiple Regression
Different conditional means Application: Monet’s signature
Holding things constant Application: Price and income effects Application: Age and education Sales promotion: Price and competitors
The general idea of multiple regression
Part 6: Multiple Regression6-15/35
Monet in Large and Small
ln (SurfaceArea)
ln (
US$)
7.67.47.27.06.86.66.46.26.0
18
17
16
15
14
13
12
11
S 1.00645R-Sq 20.0%R-Sq(adj) 19.8%
Fitted Line Plotln (US$) = 2.825 + 1.725 ln (SurfaceArea)
Log of $price = a + b log surface area + e
Logs of Sale prices of 328 signed Monet paintings
The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model.
Part 6: Multiple Regression6-16/35
How much for the signature?
The sample also contains 102 unsigned paintings
Average Sale Price
Signed $3,364,248
Not signed $1,832,712
Average price of a signed Monet is almost twice that of an unsigned one.
Part 6: Multiple Regression6-17/35
Can we separate the two effects?
Average Prices
Small Large
Unsigned 346,845 5,795,000
Signed 689,422 5,556,490
What do the data suggest?
(1) The size effect is huge
(2) The signature effect is confined to the small paintings.
Part 6: Multiple Regression6-18/35
Thought experiments: Ceteris paribus
Monets of the same size, some signed and some not, and compare prices. This is the signature effect.
Consider signed Monets and compare large ones to small ones. Likewise for unsigned Monets. This is the size effect.
Part 6: Multiple Regression6-19/35
A Multiple Regression
ln (SurfaceArea)
ln (
US$)
7.67.47.27.06.86.66.46.26.0
18
17
16
15
14
13
12
11
10
01
Signed
Scatterplot of ln (US$) vs ln (SurfaceArea)
Ln Price = b0 + b1 ln Area + b2 (0 if unsigned, 1 if signed) + e
b2
Part 6: Multiple Regression6-20/35
Monet Multiple Regression
Regression Analysis: ln (US$) versus ln (SurfaceArea), Signed The regression equation isln (US$) = 4.12 + 1.35 ln (SurfaceArea) + 1.26 SignedPredictor Coef SE Coef T PConstant 4.1222 0.5585 7.38 0.000ln (SurfaceArea) 1.3458 0.08151 16.51 0.000Signed 1.2618 0.1249 10.11 0.000S = 0.992509 R-Sq = 46.2% R-Sq(adj) = 46.0%
Interpretation (to be explored as we develop the topic):(1) Elasticity of price with respect to surface area is 1.3458 – very large
(2) The signature multiplies the price by exp(1.2618) (about 3.5), for any given size.
Part 6: Multiple Regression6-21/35
Ceteris Paribus in Theory
Demand for gasoline: G = f(price,income)
Demand (price) elasticity:eP = %change in G given %change in P holding income constant.
How do you do that in the real world? The “percentage changes” How to change price and hold income
constant?
Part 6: Multiple Regression6-22/35
The Real World Data
Part 6: Multiple Regression6-23/35
U.S. Gasoline Market, 1953-2004
Year
Data
2001199319851977196919611953
5
4
3
2
1
logGlogIncomelogPg
Variable
Time Series Plot of logG, logIncome, logPg
Part 6: Multiple Regression6-24/35
Shouldn’t Demand Curves Slope Downward?
G
GasP
rice
0.650.600.550.500.450.400.350.30
140
120
100
80
60
40
20
0
Scatterplot of GasPrice vs G
Part 6: Multiple Regression6-25/35
A Thought Experiment
The main driver of gasoline consumption is income not price
Income is growing over time.
We are not holding income constant when we change price!
How do we do that? Income
g
2750025000225002000017500150001250010000
7
6
5
4
3
Scatterplot of g vs Income
Part 6: Multiple Regression6-26/35
How to Hold Income Constant?
Multiple Regression Using Price and Income
Regression Analysis: G versus GasPrice, Income
The regression equation isG = 0.134 - 0.00163 GasPrice + 0.000026 Income
Predictor Coef SE Coef T PConstant 0.13449 0.02081 6.46 0.000GasPrice -0.0016281 0.0004152 -3.92 0.000Income 0.00002634 0.00000231 11.43 0.000
It looks like the theory works.
Part 6: Multiple Regression6-27/35
Application: WHO
WHO data on 191 countries in 1995-1999. Analysis of Disability Adjusted Life Expectancy = DALE EDUC = average years of education PCHexp = Per capita health expenditure
DALE = α + β1EDUC + β2HealthExp + ε
Part 6: Multiple Regression6-28/35
The (Famous) WHO Data
Part 6: Multiple Regression6-29/35
Part 6: Multiple Regression6-30/35
Specify the Variables in the Model
Part 6: Multiple Regression6-31/35
Part 6: Multiple Regression6-32/35
Graphs
Part 6: Multiple Regression6-33/35
Regression Results
Part 6: Multiple Regression6-34/35
Practical Model Building
Understanding the regression: The left out variable problem
Using different kinds of variables Dummy variables Logs Time trend Quadratic
Part 6: Multiple Regression6-35/35
A Fundamental Result What happens when you leave a crucial
variable out of your model?Regression Analysis: g versus GasPrice (no income)The regression equation isg = 3.50 + 0.0280 GasPricePredictor Coef SE Coef T PConstant 3.4963 0.1678 20.84 0.000GasPrice 0.028034 0.002809 9.98 0.000Regression Analysis: G versus GasPrice, Income The regression equation isG = 0.134 - 0.00163 GasPrice + 0.000026 IncomePredictor Coef SE Coef T PConstant 0.13449 0.02081 6.46 0.000GasPrice -0.0016281 0.0004152 -3.92 0.000Income 0.00002634 0.00000231 11.43 0.000
Part 6: Multiple Regression6-36/35
An Elaborate Multiple Loglinear Regression Model
Part 6: Multiple Regression6-37/35
A Conspiracy Theory for Art Sales at
Auction
Sotheby’s and Christies, 1995 to about 2000 conspired on commission rates.
Part 6: Multiple Regression6-38/35
If the Theory is Correct…
ln (SurfaceArea)
ln (
US$)
9876543
18
17
16
15
14
13
12
11
10
9
Scatterplot of ln (US$) vs ln (SurfaceArea)
Sold from 1995 to 2000
Sold before 1995 or after 2000
Part 6: Multiple Regression6-39/35
Evidence
The statistical evidence seems to be consistent with the theory.
Part 6: Multiple Regression6-40/35
A Production Function Multiple Regression Model
Sales of (Cameras/Videos/Warranties) = f(Floor Space, Staff)
Part 6: Multiple Regression6-41/35
Production Function for Videos
How should I interpret the negative coefficient on logFloor?
Part 6: Multiple Regression6-42/35
An Application to Credit Modeling
Part 6: Multiple Regression6-43/35
Age and Education Effects on Income
Part 6: Multiple Regression6-44/35
A Multiple Regression
+----------------------------------------------------+| LHS=HHNINC Mean = .3520836 || Standard deviation = .1769083 || Model size Parameters = 3 || Degrees of freedom = 27323 || Residuals Sum of squares = 794.9667 || Standard error of e = .1705730 || Fit R-squared = .07040754 |+----------------------------------------------------++--------+--------------+--+--------+|Variable| Coefficient | Mean of X|+--------+--------------+-----------+ Constant| -.39266196 AGE | .02458140 43.5256898 EDUC | .01994416 11.3206310+--------+--------------+-----------+
Part 6: Multiple Regression6-45/35
Education and Age Effects on Income
Effect on log Income of 8 more years of education