linear regression

21
Linear Regression The Least squares Regression model

Upload: carlyn

Post on 07-Jan-2016

46 views

Category:

Documents


1 download

DESCRIPTION

Linear Regression. The Least squares Regression model. Regression Line. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. We often use regression to predict the value of y given an x value. Equation of a Regression Line. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linear Regression

Linear RegressionThe Least squares Regression

model

Page 2: Linear Regression

Regression Line A regression line is a line that

describes how a response variable y changes as an explanatory variable x changes.

We often use regression to predict the value of y given an x value

Page 3: Linear Regression

Equation of a Regression Line

A regression line relating x to y has an equation:

yˆ (read y hat) is the predictor value of the response variable y for a given value of the explanatory variable x.

b is the slope, the amount y is expected to change when x increases one unit

a is the y intercept the predicted value of y when x=0

bxay ˆ

Page 4: Linear Regression

Prediction Interpolation is the use of a

regression line to predict between known observations

Extrapolation is the use of a regression line to predict outside known observations• predictions from extrapolation are often not

accurate

Page 5: Linear Regression

Residuals A residual is the difference between

an observed value of the response variable and the value predicted by the regression line• Residual=observed y- predicted y• Residual =y-yˆ

Page 6: Linear Regression
Page 7: Linear Regression

Least Square Regression line

The least squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible.

Equation:

xbya

s

srb

y

x

Page 8: Linear Regression

Other Caclulations

xbya

xx

yyxxb

2)(

))((

Page 9: Linear Regression

How well does a line fit the data?

Since residuals tell us how far the data is from the regression line they are a natural place to look for the fit.

A residual plot is a scatterplot of the residuals against the explanatory variable.

Page 10: Linear Regression

How do residual plots help us assess the fit of the data?

A residual plot in effect turns the regression line horizontal

The Residual plots magnify the deviation of points from the line• Making it easier to see unusual observations

and patterns

Page 11: Linear Regression

What we look for in residual plots

NO obvious pattern• A curved pattern shows a nonlinear

relationship.• A megaphone pattern shows growth of

residuals

The residuals should be relatively small• The typical prediction error.

Page 12: Linear Regression

The average prediction error

The standard deviation of residuals (s)

2

)ˆ(

2

2

2

n

yys

n

residualss

i

Page 13: Linear Regression

Home example We want to predict the price of a home in

Arvada. A random sample of 10 homes for sell is taken. Thousand

Make a prediction for the cost of the 11th house if we know the square footage is 1789 ft2Square ft Price

(thousands)

Square ft Price (thousands

1429 201 1785 325

1982 333 2001 450

1359 205 1835 360

1761 370 1948 407

1883 454 1489 293

Page 14: Linear Regression

Well here is what I would do

I would make a scatter plot. Than I would find the linear

regression Finally, I would use the regression

line to predict the cost.

Here’s what I found:• y=231.67+.33x• r=.87 Thus the price of the

home• r2=.76 would be $353.47

thousand

Page 15: Linear Regression

D what is the r2 thing? r2 is the coefficient of

determination.• Yes I know it is r squared, but why do we

bother?

Page 16: Linear Regression

More house example Now I am going to change one small

thing in our house example, we don’t know the size of the 11th house. What would to predict the price to be now?

I would predict the price to be $339.8

Not as good as our last prediction but not bad.

Page 17: Linear Regression

Explain v Unexplained variability

We would expect our linear regression model to predict the price better than the mean, but is it really that much different?

The sum of squares prediction errors if we use the mean is 70913.6• This is the sum of squares of TOTAL

variation SST The sum of squares residuals is

16754.6• This is the sum of square of the ERROR SSE

Page 18: Linear Regression

How SST and SSE make r2

The ratio SSE/SST tells us how the proportion of variation in y still remaining.• SSE/SST=16754.6/70913.6 = .236

Thus 23.6 % of the variation is unaccounted for in our model

• Thus the percentage accounted for in our model is 1-.236= .764

Page 19: Linear Regression

HOLD ON Wasn’t r2=.76?

Yes, it was. In fact we can calculate r2 by finding the ratio SSE/SST and subtracting it from 1.

Thus, what r2 tells us is the amount of variability explained by the model

Page 20: Linear Regression

So finally

2

2

2

)(

)ˆ(

1

yySST

yySSE

SST

SSEr

i

i

Page 21: Linear Regression