regression regression relationship = trend + scatter observed value = predicted value + prediction...

11
Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error 8 y = 5 + 2x data point (8, 25) 25 21 prediction error Regression is about fitting a line or curve to bivariate data to predict the value of a variable y based on the value of an independent variable x.

Upload: scot-mason

Post on 03-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Regression

Regression relationship = trend + scatter

Observed value = predicted value + prediction error

8

y = 5 + 2x

data point(8, 25)

25

21

prediction error

Regression is about fitting a line or curve to bivariate data to predict the value of a variable y based on the value of an independent variable x.

Page 2: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Residual

A line of best fit will be used to predict a value of y for a given value of x. The difference between the measured value y and the predicted value ŷ is called the residual.

Residual = y-ŷ Residual = observed value – predicted

value

Page 3: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Regression Line

Obviously, we would like all theseresiduals to be as small as possible.

A technique is least squares regression minimises the sum of the squares of the residuals, the line found by this technique is therefore called the least squares regression line of y on x, or simply the regression line.

Page 4: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Complete the table below

xy 25ˆ xy 25ˆ xy 25ˆ xbby 10ˆ

y

y - y

Data Point (8, 25) (3, 7) (-2, -3) (x, y)

Observed y-value 25 y

Fitted line

Predicted value / fitted value 21

Prediction error / residual 4

Page 5: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Complete the table below

xy 25ˆ xy 25ˆ xy 25ˆ xbby 10ˆ

y

y - y

Data Point (8, 25) (3, 7) (-2, -3) (x, y)

Observed y-value 25 7 -3 y

Fitted line

Predicted value / fitted value 21 19 -1

Prediction error / residual 4 -12 -2

Page 6: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

The Least Squares Regression Line Choose the line with smallest sum of

squared prediction errors.

Minimise the sum of squared prediction errors

Minimise 2 errors prediction

Which line?

Page 7: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

The Least Squares Regression Line

There is one and only one least squares regression line for every linear regression

for the least squares line but it is also true for many other lines

is on the least squares line Calculator or computer gives the

equation of the least squares line

0errorsprediction

),( yx

Page 8: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Residuals Plot

The pattern of residuals allows you to see if your regression line is a good fit for the data and how reliable interpolation and extrapolation will be.

If the model is a good fit, the residuals will oscillate closely above and below the zero line.

Page 9: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Temperature oF

Chirps per second

   

69.4 15.4

69.7 14.7

71.6 16.0

75.2 15.5

76.3 14.4

79.6 15.0

80.6 17.1

80.6 16.0

82.0 17.1

82.6 17.2

83.3 16.2

83.5 17.0

84.3 18.4

88.6 20.0

93.3 19.8

Crickets: Temperature vs Chirps

y = 0.2119x - 0.3091R2 = 0.6975

10.0

12.0

14.0

16.0

18.0

20.0

22.0

60 70 80 90 100

Temperature oF

Nu

mb

er o

f ch

irp

s p

er s

eco

nd

Correlation coefficient = 0.8352This is √0.6975

Page 10: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Predicted chirps per

second

Observed chirps per

second Residuals

     

14.4 15.4 1.0

14.5 14.7 0.2

14.9 16.0 1.1

15.6 15.5 -0.1

15.9 14.4 -1.5

16.6 15.0 -1.6

16.8 17.1 0.3

16.8 16.0 -0.8

17.1 17.1 0.0

17.2 17.2 0.0

17.3 16.2 -1.1

17.4 17.0 -0.4

17.6 18.4 0.8

18.5 20.0 1.5

19.5 19.8 0.3

The regression line is:y = 0.2119x - 0.3091which is what we use to get the predicted value of y.Eg. x = 69.4 oFy = 0.2119(69.4) – 0.3091

= 14.4 chirps per second

Residual = Observed – Predicted Value Value

Temperature oF

Chirps per second

   

69.4 15.4

69.7 14.7

71.6 16.0

75.2 15.5

76.3 14.4

79.6 15.0

80.6 17.1

80.6 16.0

82.0 17.1

82.6 17.2

83.3 16.2

83.5 17.0

84.3 18.4

88.6 20.0

93.3 19.8

ResidualsResiduals

Page 11: Regression Regression relationship = trend + scatter Observed value = predicted value + prediction error Regression is about fitting a line or curve to

Predicted chirps per

second

Observed chirps per

second Residuals

     

14.4 15.4 1.0

14.5 14.7 0.2

14.9 16.0 1.1

15.6 15.5 -0.1

15.9 14.4 -1.5

16.6 15.0 -1.6

16.8 17.1 0.3

16.8 16.0 -0.8

17.1 17.1 0.0

17.2 17.2 0.0

17.3 16.2 -1.1

17.4 17.0 -0.4

17.6 18.4 0.8

18.5 20.0 1.5

19.5 19.8 0.3

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

60 70 80 90 100

Temperature

Re

sid

ua

ls

ResidualsResiduals

The plot of the residuals shows that they are randomly scattered, so in this case a linear model is appropriate.