ch i introduction to regression and least squares
TRANSCRIPT
Chapter I. Slide 1
Introduction to Rossi’s MGMT 264B Slides
“I have made this longer than usual, because I lack the time to make it short.” - Pascal
I had the time! A lot of work has gone into simplifying and stripping down to the
essence. This means that the slides will require careful reading. They are designed to be self-contained but efficient! I hope you enjoy reading them.
Read them three times:
1. Before class (write questions in margin) 2. During class 3. After class
Introduction to Rossi’s MGMT 264B Slides
At the end of each chapter, there are symbol and R command glossaries, and a list of important equations.
There are some symbols:
Note page available
Cool Move but difficult
Requires further thought
Chapter I. Slide 2
I. Introduction to Regression and Least Squares
a. Conditional Prediction b. Hedonic Pricing and Flat Panel TVs c. Linear Prediction d. Least Squares e. Intuition behind Least Squares f. Relationship between b and r g. Decomposing the variance and R2
Chapter I. Slide 3
Chapter I. Slide 4
a. Conditional Prediction
The basic problem:
Available data
Formulate a model to predict a variable of
interest
Use prediction to make a business decision
Chapter I. Slide 5
a. Examples of Conditional Prediction
1. Pricing Product Characteristics (Hedonic Pricing): - Predict market value for various product characteristics - Decision: optimal product configuration
2. Optimal portfolio choice: - Predict future joint distribution of asset returns - Decision: Construct optimal portfolio (choose weights)
3. Determination of promotional strategy: - Predict sales volume response to a price reduction - Decision: what is optimal promotional strategy?
Chapter I. Slide 6
b. Hedonic Regression
A Hedonic regression is a method which relates market prices to product characteristics.
The Hedonic regression provides a measure of what the
market will bear. Contrast with conjoint that measures the demand side only – what consumers are willing to pay. Closely related to Value Equivalence Line.
Prediction problem: predict what market will bear for a
product that doesn’t necessarily exist, e.g. value a property with no market transaction.
Chapter I. Slide 7
b. Example: Predicting Flat-Panel TV Prices
Problem: – Predict market price based on observed characteristics.
For example, we are considering introducing a new model with 58” diagonal. We’d like to know what the market will bear for TV with “average quality.”
Solution:
– Look at existing products for which we know the price and some observed characteristics
– Build a prediction rule that predicts price as a function of the observed characteristics.
Chapter I. Slide 8
b. Flat-Panel TV Data
What characteristics do we use? We must select factors useful for prediction and we have to develop a specific quantitative measure of these factors (a variable)
– Many factors or variables affect the price of a flat-panel TV • Screen size • Technology (plasma/LCD/LED) • Brand name (?)
– Easy to quantify price and size but what about other variables such as aesthetics?
For simplicity let’s focus only on screen size
b. R and course package, PERregress
Chapter I. Slide 9
Where to get R? http://cran.stat.ucla.edu/ R is a free package with versions for Windows, MAC OS, and Linux. Visit CRAN site nearest you (see above or google “CRAN”) and download and install R. Next you need to install the “package” made specifically for this course. This is easy to do. The package contains all of the datasets needed for the course as well as some customized functions. Start up R and select the “Packages and Data” menu item. Select a mirror site nearest you (such as UCLA) and then install the package “PERregress.”
b. Install course package, PERregress
Chapter I. Slide 10
.
R must be version 2.12.1 or higher!
b. Alternatively, install RStudio
Chapter I. Slide 11
See Videos section of CCLE web site.
Chapter I. Slide 12
b. Flat-Panel TV Data
R stores the data as a table or spreadsheet called a dataframe.
Chapter I. Slide 13
Another way is to think of the data is as a scatter plot. In other words, view the data as a collection of points in the plane (Xi, Yi)
b. Flat-Panel TV Data
35 40 45 50 55 60 65
500
1000
1500
2000
2500
3000
3500
4000
Size
Price
Chapter I. Slide 14
There appears to be a linear relationship between price and size. Let’s plot a line fit by the “eyeball” method. R command c() means put two or more numbers in a list called a vector. Here the list is the intercept and slope!
The line shown was fit by the “eyeball” method.
c. Linear Prediction
35 40 45 50 55 60 65
500
1000
1500
2000
2500
3000
3500
4000
Size
Price
Chapter I. Slide 15
c. Linear Prediction
Let Y denote the “dependent” variable and X the “independent” or predictor variable.
Recall that the equation of a line is:
Y = b0 + b1X Where:
b0 = is the intercept b1 = is the slope
– The intercept units are in the units of Y ($) – The slope is in unit of Y/unit of X ($/inch)
Chapter I. Slide 16
c. Linear Prediction
Y
X b0 2 1
b1 Y = b0 + b1X
Our “eyeball” line has b0 = -1400 b1 = 55
Chapter I. Slide 17
c. Linear Prediction
We can now predict the price of a flat-panel TV when we know only the size. We simply “read the value off the line” that we drew: For example, given a flat-panel TV with a size = 58 Predicted price = -1400 + 55(58 inches) = $1790
NOTE: the unit conversion from inches to dollars is done for us by the slope coefficient (b1)
Chapter I. Slide 18
c. Linear Fitting & Prediction in R
We can fit a line to the data using the R function, lm()! Dependent variable
Independent variable
Chapter I. Slide 19
c. Linear Fitting & Prediction in R
R chooses a different line from ours: R fit: Price = -1408.93 +
57.13 Size
.
35 40 45 50 55 60 65
500
1000
1500
2000
2500
3000
3500
4000
Size
Price
R line
Eyeball line
Chapter I. Slide 20
c. Linear Fitting & Prediction in R
Let’s look at the R prediction results when size = 58
= -1408.93 + 57.13 (58)
35 40 45 50 55 60 65
500
1000
1500
2000
2500
3000
3500
4000
Size
Price
Chapter I. Slide 21
Natural questions to ask at this point:
– How does R select a best fitting line? – Can we say anything about the accuracy of the predictions? – What does all that other R output mean?
c. Linear Fitting & Prediction in R
Chapter I. Slide 22
d. Least Squares
A strategy for estimating the slope and intercept parameters
Data: We observe the data recorded as the pairs (Xi, Yi) i = 1,…,N
Problem: Choose a fitted line. (b0, b1) A reasonable way to fit a line is to minimize the amount by which the fitted value differs from the actual value. This is called the residual.
Chapter I. Slide 23
d. Least Squares: Fitted Values & Residuals
What does “fitted value” mean?
Yi
Xi
Ŷi
The dots are the observed values and the line represents our fitted values given by:
Ŷi = b0 + b1Xi
Chapter I. Slide 24
d. Least Squares: Fitted Values & Residuals
What about the residual for the ith observation?
Yi
Xi
Ŷi ei = Yi – Ŷi = Residual
Chapter I. Slide 25
d. Least Squares: Fitted Values & Residuals
Note: The length of the red and green lines are the residuals
Yi
Xi
positive residuals negative residuals
The fitted value and the residual decompose the Yi observation into two parts: Yi = Ŷi + (Yi – Ŷi) = Ŷi + ei
Chapter I. Slide 26
d. The Least Squares Criterion
Ideally we want to minimize the size of all residuals. We must therefore trade off between moving closer to some
points and at the same time moving away from other points
The line fitting process:
i. Compute residuals ii. Square residuals and add up iii. Pick the best fitting line (intercept and slope)
Least Squares:
choose b0 and b1 to minimize
∑=
N
1i2ie
Chapter I. Slide 27
d. The Least Squares Criterion
Click on chart in slideshow to activate applet
Chapter I. Slide 28
What are the formulas which do the job? Least Squares Solution:
d. The Least Squares Criterion
b1=(Xi !X)(Yi !Y)
i=1
N"(Xi !X)2
i=1
N"=
sXY
sX2
Where: sXY = sample covariance (X, Y) sX
2 = sample variance of (X)
XbYb 10 −=
Chapter I. Slide 29
d. The Least Squares Criterion
Lets put the data into Excel:
Now simply calculate 10.75/5 to determine the slope = 2.15
Chapter I. Slide 30
e. Intuition Behind the Least Squares Formula
The equations for b0 and b1 show us how to ‘process’ the data (X1, Y1), (X2, Y2),…, (XN, YN) to generate guesses for the intercept and slope. The formulas use sample quantities that we are familiar with and have used in the past to calculate sample covariances (numerator) and sample variances (denominator). Can we develop an intuition the formulas that is based on prediction?
Chapter I. Slide 31
e. Intuition Behind the Least Squares Formula
The intercept: The formula for b0 insures that the fitted regression line passes through the point of means . If you put in the average value of X, the least squares prediction is the average value of Y.
If we substitute in for the intercept using the LS formula, we obtain: If X is above the mean, then b1 tells us how much to scale this deviation above the mean to produce a forecast of Y relative to the mean of Y
)Y,X(
)XX(bYY 1 −=−
Chapter I. Slide 32
e. Intuition Behind the Least Squares Formula
We can think of least squares as a two-part process:
i. Plot the point of means
ii. Find a line rotating through that point which has the smallest sum of squared residuals
There are many possible lines that pass thru the point of means. The least squares approach suggests that one line is the best. Can we understand this formula by application of a fundamental intuition derived from prediction?
Chapter I. Slide 33
Quick Review: Correlation
A intuition for the slope formula can be developed but it is more complicated. Let’s review the basic concept of a correlation first. The sample coefficient between two variables is defined as:
Remember: the correlation coefficient is a unitless measure of linear association between two variables.
rxy =(Xi!X)(Yi!Y)"
(Xi!X)2 (Yi!Y)2""=
sXY
sXsY
Chapter I. Slide 34
Quick Review: Correlation
Examples of various samples with varying degrees of correlation
r=.5
210-1-2
1
0
-1
-2
X
Y
210-1-2
2
1
0
-1
-2
-3
X
Y
r=0
r=.75
210-1-2
4
3
2
1
0
X
Y
r=1
1.50.5-0.5-1.5
3.5
2.5
1.5
0.5
-0.5
-1.5
-2.5
-3.5
-4.5
X
Y
Chapter I. Slide 35
e. Intuition Behind the Least Squares Formula
What is the intuition for the LS formula for the slope, b1? Does there seem to be any relationship between e and Size in the flat-panel TV dataset?
35 40 45 50 55 60 65
-500
0500
1000
1500
Size
e
Does it make sense that e and size should be not be correlated? YES! Suppose we decide to ignore the least squares fit and make a bad choice for b1 (green line)
35 40 45 50 55 60 65
500
1000
1500
2000
2500
3000
3500
4000
Size
Price
Chapter I. Slide 36
e. Intuition Behind the Least Squares Formula
Our bad choice -2543 + 80 * size
Least squares fit -1408.93 + 57.13 * size
Chapter I. Slide 37
e. Intuition Behind the Least Squares Formula
Let’s look at the relationship between e and Size for the “bad fit”: Clearly, we have left something on the table in terms of prediction!
35 40 45 50 55 60 65
-1000
-500
0500
1000
Size
e1
We tend to underestimate (+e) the value of the TVs with size below the average size and overestimate (-e) the value of flat-panel TVs larger than average
Chapter I. Slide 38
Y = Ŷ + e
e. Intuition Behind the Least Squares Formula
As long as the correlation between e and x is non-zero, we could always adjust our prediction rule to do better i.e. need to exploit all of the predictive power available in the X values and put this into Ŷ, leaving no “Xness” in the residuals. In summary: the following decomposition of each observation is made using the fitted and residual values:
“made from X” corr(Ŷ,X) = 1
unrelated to X Corr(e,X) = 0
Chapter I. Slide 39
e. Intuition Behind the Least Squares Formula
The Fundamental Properties of Optimal Prediction: – The optimal prediction of Y for an “average” or
representative X should be the “average” or representative value of Y
– Prediction errors should be unrelated to the information used to formulate the predictions
For linear models:
–
–
= =ˆIf X X, Y Y
( ) ( )− = =ˆcorr Y Y,X corr e,X 0
Chapter I. Slide 40
f. The Relationship Between b and r
b1, can be written as the ratio of the sample covariance to the sample variance of X: Close to, but not the same as the sample correlation. Recall that b must “convert” the units of x into the units of y and is expressed as units of y per unit of x. However the sample correlation coefficient is unit-less. The relationship is given by:
b1=
sXY
sX2
b1=
sxy
sx2 = rxy !
sy
sx
Chapter I. Slide 41
f. The Relationship between b and r
Returning to the flat-panel TV price data, we can use commands in R to compute the least squares estimate of slope using only simple statistics. " "
sx2 sxy
Chapter I. Slide 42
f. The Relationship between b and r
Now let’s use the relationship between the regression coefficient and the sample correlation to compute the slope estimate.
sy
sx
Chapter I. Slide 43
g. Decomposing the Variance – The ANOVA Table We now know that:
We can now use these factors to decompose the total variance of Y:
Var(Y) = Var(Ŷ) + Var(e) + 2 cov(Ŷ,e) or,
Var(Y) = Var(Ŷ) + Var(e) What is true for the average square (variance) is also true for the the sum of total squares,
•
• corr(e,X) =0
• corr(Ŷ,e) = 0
( )= =∑ ie 0 implies e 0, as well!
∑∑∑===
+−=−N
1i
2i
N
1i
2i
N
1i
2i e)YY()YY(
Total Sum of Squares
SST
Regression SS SSR
Error SS SSE
Chapter I. Slide 44
Let’s find this on the R printout
g. Decomposing the Variance – The ANOVA Table
SSE SSR
Chapter I. Slide 45
g. A Goodness of Fit Measure: R2
We have a good fit if:
To summarize how close SSR is to SST, we define the coefficient of determination: Two interpretations:
i. Percentage of Variation in Y explained by X ii. Square of correlation (hence “r” squared)
SSTSSE1
SSTSSRR2 −==
• SSR is large • SSE is small • Fit would be perfect if SST = SSR
Chapter I. Slide 46
g. A Goodness of Fit Measure: R2
R2 on the R printout
Chapter I. Slide 47
g. Misuse of R2
R squared is often mis-used. Some establish arbitrary benchmarks for “high” values. For
example, most regard values of over .8 as “high.” “high” values of R2 are often associated with claims that the
model is: 1. adequate or correctly specified – “valid” 2. highly accurate for prediction
Unfortunately, neither is correct. More later!
Chapter I. Slide 48
Appendix: Derivation of LS Slope
We have satisfied ourselves that corr(e, X) =0. Now let us use this intuition to derive the least squares formula for b1
Verification: realize that corr(e,X) = 0 is equivalent to cov(e, X) = 0 (why?) substitute in for e:
( )( )= = − −∑ i icov(e,X) cov(X,e) 1/(N 1) X X e
XbYb 10 −=
( )( ) ( )( )⎡ ⎤− − − = − − − − =⎣ ⎦∑ ∑i i 0 1 i i i 1 1 iX X Y b b X X X Y Y b X b X 0
Chapter I. Slide 49
Appendix: Derivation of LS Slope
Verification (continued): or or Solving for b1
( )( )⎡ ⎤− − − − =⎣ ⎦∑ i i 1 1 iX X Y Y b X b X 0
∑∑
−−−
= 2i
ii1 )XX(
)YY()XX(b
( ) ( )( )− − − − =∑ i i 1 iX X Y Y b X X 0
( )( ) ( )⎡ ⎤− − − − =⎢ ⎥⎣ ⎦∑ 2
i i 1 iX X Y Y b X X 0
Chapter I. Slide 50
Glossary of Symbols
"X= independent or explanatory variable Y= dependent variable "" b0 - least squares estimate of intercept b1 - least squares estimate of regression slope ei - least squares residual Ŷ - fitted value s - sample standard dev (subscript tells you of what var)
s2 - sample variance (subscript tells you of what var) r - sample correlation coefficient SST - total sum of squares SSR - regression sum of squares SSE - error sum of squares R2 - coefficient of determination, goodness of fit measure
Chapter I. Slide 51
Important Equations
Ŷi = b0 + b1Xi
ei = Yi – Ŷi = Residual
b1=(Xi !X)(Yi !Y)
i=1
N"(Xi !X)2
i=1
N"=
sXY
sX2
XbYb 10 −=
fitted value
least squares formulae for
slope and intercept
residual
Chapter I. Slide 52
Important Equations
least squares line passes thru point of
means
relationship between
slope coef and
correlation
)XX(bYY 1 −=−
b1=
sxy
sx2 = rxy !
sy
sx
SSTSSE1
SSTSSRR2 −==
Alternative definitions for R-squared
Glossary of R commands
- abline(c(intercept,slope)): Adds one line through the current plot with the defined intercept and slope
- attach(A) : The data frame A is attached to the R search path. So objects in A can be accessed by simply giving their names.
- anova(): Computes analysis of variance (or deviance) tables for one or more fitted model objects.
- c(): Combine values into a vector or list - cov(x,y),cor(x,y): Compute the covariance or correlation of
variables x and y. - data(A) : allows access to data frame A that is in the data library. - head(A) : Returns the first parts of a data frame A. - library(“PERregress”) : loads the R package containing
datasets and customized functions for our class. - lm(Y~X): Fits a linear model. Y is dep var and X is indep var.
Chapter I. Slide 53
Glossary of R commands
- plot(X,Y): Plots X and Y - predict(): Predictions from the results of various model fitting
functions.!- A=read.csv(“file_name”): Reads file in .csv format and creates
data frame A from B. - str(object): tells you the “structure” of an object, e.g. str
(dataframe). - summary(): Generic function used to produce result summaries of
the results of various model fitting functions. - var(X),mean(X),sd(X): Compute the variance, mean, standard
deviation of a variable named X.!
Chapter I. Slide 54