linear regression
DESCRIPTION
Linear Regression. The Least squares Regression model. Regression Line. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. We often use regression to predict the value of y given an x value. Equation of a Regression Line. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/1.jpg)
Linear RegressionThe Least squares Regression
model
![Page 2: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/2.jpg)
Regression Line A regression line is a line that
describes how a response variable y changes as an explanatory variable x changes.
We often use regression to predict the value of y given an x value
![Page 3: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/3.jpg)
Equation of a Regression Line
A regression line relating x to y has an equation:
yˆ (read y hat) is the predictor value of the response variable y for a given value of the explanatory variable x.
b is the slope, the amount y is expected to change when x increases one unit
a is the y intercept the predicted value of y when x=0
bxay ˆ
![Page 4: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/4.jpg)
Prediction Interpolation is the use of a
regression line to predict between known observations
Extrapolation is the use of a regression line to predict outside known observations• predictions from extrapolation are often not
accurate
![Page 5: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/5.jpg)
Residuals A residual is the difference between
an observed value of the response variable and the value predicted by the regression line• Residual=observed y- predicted y• Residual =y-yˆ
![Page 6: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/6.jpg)
![Page 7: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/7.jpg)
Least Square Regression line
The least squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible.
Equation:
xbya
s
srb
y
x
![Page 8: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/8.jpg)
Other Caclulations
xbya
xx
yyxxb
2)(
))((
![Page 9: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/9.jpg)
How well does a line fit the data?
Since residuals tell us how far the data is from the regression line they are a natural place to look for the fit.
A residual plot is a scatterplot of the residuals against the explanatory variable.
![Page 10: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/10.jpg)
How do residual plots help us assess the fit of the data?
A residual plot in effect turns the regression line horizontal
The Residual plots magnify the deviation of points from the line• Making it easier to see unusual observations
and patterns
![Page 11: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/11.jpg)
What we look for in residual plots
NO obvious pattern• A curved pattern shows a nonlinear
relationship.• A megaphone pattern shows growth of
residuals
The residuals should be relatively small• The typical prediction error.
![Page 12: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/12.jpg)
The average prediction error
The standard deviation of residuals (s)
2
)ˆ(
2
2
2
n
yys
n
residualss
i
![Page 13: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/13.jpg)
Home example We want to predict the price of a home in
Arvada. A random sample of 10 homes for sell is taken. Thousand
Make a prediction for the cost of the 11th house if we know the square footage is 1789 ft2Square ft Price
(thousands)
Square ft Price (thousands
1429 201 1785 325
1982 333 2001 450
1359 205 1835 360
1761 370 1948 407
1883 454 1489 293
![Page 14: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/14.jpg)
Well here is what I would do
I would make a scatter plot. Than I would find the linear
regression Finally, I would use the regression
line to predict the cost.
Here’s what I found:• y=231.67+.33x• r=.87 Thus the price of the
home• r2=.76 would be $353.47
thousand
![Page 15: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/15.jpg)
D what is the r2 thing? r2 is the coefficient of
determination.• Yes I know it is r squared, but why do we
bother?
![Page 16: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/16.jpg)
More house example Now I am going to change one small
thing in our house example, we don’t know the size of the 11th house. What would to predict the price to be now?
I would predict the price to be $339.8
Not as good as our last prediction but not bad.
![Page 17: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/17.jpg)
Explain v Unexplained variability
We would expect our linear regression model to predict the price better than the mean, but is it really that much different?
The sum of squares prediction errors if we use the mean is 70913.6• This is the sum of squares of TOTAL
variation SST The sum of squares residuals is
16754.6• This is the sum of square of the ERROR SSE
![Page 18: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/18.jpg)
How SST and SSE make r2
The ratio SSE/SST tells us how the proportion of variation in y still remaining.• SSE/SST=16754.6/70913.6 = .236
Thus 23.6 % of the variation is unaccounted for in our model
• Thus the percentage accounted for in our model is 1-.236= .764
![Page 19: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/19.jpg)
HOLD ON Wasn’t r2=.76?
Yes, it was. In fact we can calculate r2 by finding the ratio SSE/SST and subtracting it from 1.
Thus, what r2 tells us is the amount of variability explained by the model
![Page 20: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/20.jpg)
So finally
2
2
2
)(
)ˆ(
1
yySST
yySSE
SST
SSEr
i
i
![Page 21: Linear Regression](https://reader036.vdocuments.us/reader036/viewer/2022062723/56813d81550346895da760ab/html5/thumbnails/21.jpg)