linear regression - ut · 2012. 2. 13. · quiz: what does it mean: linear? 0 10 20 30 40 50 60 70...
TRANSCRIPT
![Page 2: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/2.jpg)
Which of the following is most
related to linear regression?
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
1) Information Gain
2) Linear Atavism
3) Regression to
Mean
4) Method of Least
Squares
![Page 3: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/3.jpg)
Introduction to Linear Regression
Linear regression is an approach to modeling the
relationship between a response variable Y and one or
more explanatory variables denoted X (predictors), e.g
regression is the study of dependence.
A response variable Y must be continuous.
The case of one explanatory variable is called simple
regression.
More than one explanatory variable is multiple
regression.
![Page 4: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/4.jpg)
Scatterplot
![Page 5: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/5.jpg)
Scatterplot
![Page 6: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/6.jpg)
Short Quiz
Sketch on each plot what you think is the best-fitting line
for predicting y from x.
1) 2)
3) 4)
![Page 7: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/7.jpg)
Short quiz
pic y prediction Residual sum of
squares
1
2
3
4
![Page 8: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/8.jpg)
• Cross at the average y-value for each x and draw the
best-fitting line to the crosses
• Re-compute the y prediction and sum of squared
errors.
1) 2)
3) 4)
![Page 9: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/9.jpg)
Linear Regression Function
![Page 10: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/10.jpg)
Linear Regression Function
Mean function
Intercept
Slope
![Page 11: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/11.jpg)
Linear Regression Function
Mean function
Intercept
Slope
Intercept and slope are unknown, want to estimate
![Page 12: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/12.jpg)
Linear regression function
![Page 13: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/13.jpg)
Residuals (Errors)
![Page 14: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/14.jpg)
Objective function
residual sum of squares (RSS, SSE):
Ordinary Least Squares (OLS)
![Page 15: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/15.jpg)
Minimization
![Page 16: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/16.jpg)
Example
b1 = sum((x-mean(x))*(y - mean(y)))
/ sum((x-mean(x))^2)
[1] 0.541747
b0 = mean(y) - b1*mean(x)
[1] 75.99029
![Page 17: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/17.jpg)
Example
b1 = sum((x-mean(x))*(y - mean(y)))
/ sum((x-mean(x))^2)
[1] 0.541747
b0 = mean(y) - b1*mean(x)
[1] 75.99029
b1 = cov(x,y)/var(x)
![Page 18: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/18.jpg)
Example
lm(y ~ x)
![Page 19: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/19.jpg)
Example
![Page 20: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/20.jpg)
Example
y = 75.99 + 0.54 M_height_cm
![Page 21: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/21.jpg)
Multiple regression
Usually we have more than one variable:
or in matrix notation:
![Page 22: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/22.jpg)
Matrix notation
n observations, p explanatory variables,
dim(Y) = n × 1, dim(X) = n ×(p+1), dim(ß) = (p+1) ×1,
dim(e) = n × 1
![Page 23: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/23.jpg)
OLS for multiple regression
![Page 24: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/24.jpg)
Example
b = solve(t(X) %*% X) %*% t(X) %*% y
![Page 25: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/25.jpg)
Example
b = solve(t(X) %*% X) %*% t(X) %*% y
b = ginv(X) %*% y
lm(y ~ X)
lm(y ~ x1 + x2)
![Page 26: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/26.jpg)
Types of predictors • The intercept (model can be with or without);
lm(y ~ x1 + x2 – 1)
• Transformations of predictors lm(y ~ x1 + log(x2))
• Polynomials
lm(y ~ x1 + I(x2^2))
• Interactions and other combinations of predictors
lm(y ~ x1/x2)
• Dummy variables and factors lm(y ~ is_male)
![Page 27: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/27.jpg)
Polynomials
![Page 28: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/28.jpg)
Polynomials
m2 <- lm(Salary ~ Experience + I(Experience^2), data = prof)
![Page 29: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/29.jpg)
Quiz: What does it mean: linear?
In which case we cannot use linear
regression?
![Page 30: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/30.jpg)
Quiz: What does it mean:
linear?
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
![Page 31: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/31.jpg)
Dummy variables
• Are binary variables (i.e 0 or 1) created
from a variable with the higher level of
measurement (categorical variable):
Eye color Code
Brown 1
Blue 2
Grey 3
Eye color Is _Brown Is_Blue Is_Grey
Brown 1 0 0
Blue 0 1 0
Grey 0 0 1
![Page 32: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/32.jpg)
Dummy variables
• Are binary variables (i.e 0 or 1) created
from a variable with the higher level of
measurement (categorical variable):
Eye color Code
Brown 1
Blue 2
Grey 3
Eye color Is _Brown Is_Blue Is_Grey
Brown 1 0 0
Blue 0 1 0
Grey 0 0 1
![Page 33: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/33.jpg)
Example
Salary for males: 85181.8 +958.1 yrs.since.phd + 7923.6 * 1 =
93105.4 + 958.1 yrs.since.phd
Salary for females: 85181.8 +958.1 yrs.since.phd + 7923.6 * 0 =
85181.8 +958.1 yrs.since.phd
![Page 34: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/34.jpg)
Diagnostics
![Page 35: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/35.jpg)
Leverage points Demo: http://www.stat.sc.edu/~west/javahtml/Regression.html
![Page 36: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/36.jpg)
• a leverage point is an observation that has an
extreme value on one or more explanatory variables.
• a point is a bad leverage point if its Y -value does
not follow the pattern set by the other data points.
• a bad leverage point is a leverage point which is
also an outlier.
![Page 37: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/37.jpg)
Standardized residuals
![Page 38: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/38.jpg)
Goodness-of-fit-measures
• R-squared
(square of the sample correlation coefficient between
the outcomes and their predicted values)
• Coefficient Significance: (used to test the hypothesis that the true value of the coefficient is
non-zero, in order to confirm that the independent variable really
belongs in the model)
-------------------------------------------------------------------------------------
• Measures on the test set (RSS, R-squared)
![Page 39: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/39.jpg)
Over- and underfitting
![Page 40: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/40.jpg)
Regularization
• Simple objective function:
min(Error)
• … with regularization:
min(Error + ʎ Complexity)
![Page 41: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/41.jpg)
Regularization
• Simple objective function:
min(Error)
• … with regularization:
min(Error + ʎ Complexity)
Penalty for more complex models:
with larger values of lambda,
greater penalty – more compact
model
![Page 42: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/42.jpg)
Regularization
• OLS objective function:
min( 𝑒2)
• OLS with regularization (Ridge regression):
min( 𝑒2 + 𝜆 𝛽𝑖2)
![Page 43: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/43.jpg)
Regularization
• OLS objective function:
min( 𝑒2)
• OLS with regularization (Ridge regression):
min( 𝑒2 + 𝜆 𝛽𝑖2)
![Page 44: Linear Regression - ut · 2012. 2. 13. · Quiz: What does it mean: linear? 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North. Dummy variables • Are binary](https://reader034.vdocuments.us/reader034/viewer/2022051904/5ff5683c754590421604d9d7/html5/thumbnails/44.jpg)
Literature
• A modern approach to Regression with R, Simon
Sheather;
• Applied linear regression, Weisberg