linear regression

28
LINEAR REGRESSION LINEAR REGRESSION

Upload: hector-hebert

Post on 03-Jan-2016

11 views

Category:

Documents


0 download

DESCRIPTION

LINEAR REGRESSION. LINEAR REGRESSION. The equation of the linear model y = a + b x represents a generic line on the scatter plot. How can we find a and b in such a way that the line is the best and is determined univocally? We refer to the so called the Method of Least Square. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LINEAR REGRESSION

LINEAR REGRESSIONLINEAR REGRESSION

Page 2: LINEAR REGRESSION

The equation of the linear model

yy = = aa + + b b xx

represents a generic line on the scatter plot. How can we find a and b in such a way that the line is the best and is determined univocally?

We refer to the so called the Method of Method of Least Square. Least Square. The line obtained by using least square method is called least least square regression line.square regression line.

LINEAR REGRESSIONLINEAR REGRESSION

Page 3: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: the method of least squarethe method of least square

Each value of y obtained for a member from the survey is called the observed or actual value observed or actual value of y. . The corresponding value on the regression line is called theoretical theoretical or predicted value predicted value of .The difference is called residualresidual (or errorerror) and is indicated with e. e. For a given household e e indicates the difference between the observed value of food expenditure and the theoretical (predicted) value given by the regression model.

Income

F

ood

expe

nditu

re

y

ee

Regression line

y

y

yy

Page 4: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: the method of least squarethe method of least square

yye ˆ eexpenditur food ltheoreticaeexpenditur food observed

The value of e is positive if the observed point is above the regression line and negative if it is below the regression line.

Among all the possible lines that interpolate the observed points the best line should be the one that minimizes all the differences that is the sum of the residual should be minimized.

But, whatever the line is, the sum of these residuals is always zero

yy

0)ˆ( yye

Page 5: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: the method of least squarethe method of least square

Hence, to find the line that best fits the scatter of points, we minimize the error sum of squareserror sum of squares, denoted by SSE SSE, which is obtained by adding the squares of errors. Thus

min)ˆ(22 yyeSSE

The value of and which give the minimum SSE are called least square estimatesleast square estimates of a and b and the regression line obtained is called least square line.

a b

Page 6: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: the method of least squarethe method of least square

The least square values of a and b are computed as follows:

xbyabxx

xy ˆ and SS

SSˆ

n

xx

n

yxxy xxxy

2

2SS and SS

where

and SS stands for “sum of squares”.

The least squares regression line is also called the regression of y on x.

xbay ˆˆˆ

Page 7: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: example 1example 1

Find the least squares regression line for the data on incomes and food expenditure of the seven households.

We have to compute the least square values of a and b.

We can do it in 4 steps.

Income (X) Food Expenditure (Y)

35 49 21 39 15 28 25

915 711 5 8 9

Page 8: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: example 1example 1

Step 1Step 1. Compute yxyx and ,,

Income (X) Food Expenditure (Y)

35 49 21 39 15 28 25

915 711 5 8 9

212x 64y

1429.97/64/

2857.307/212/

64 212

nyy

nxx

yx

Page 9: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: example 1example 1

Step 2Step 2. Compute 2 and xxy

Income (X)

Food Expenditure(Y) xy x²

35492139152825

915 711 5 8 9

315735147429 75224225

12252401 4411521 225 784 625

Σx = 212 Σy = 64 Σxy = 2150 Σx² = 7222

Thus 7222 and 2150 2xxy

Page 10: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: example 1example 1

Step 3Step 3. Compute xxxy SS and SS

4286.801

7

)212(7222SS

7143.2117

)64)(212(2150SS

22

2

n

xx

n

yxxy

xx

xy

Step 4Step 4. Compute a and b.

1414.1)2857.30)(2642(.1429.9

2642.4286.801

7143.211

xbya

SS

SSb

xx

xy

Page 11: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: example 1example 1

Thus, the estimated regression model is

ŷ = 1.1414 + .2642x

Using this model we can find the predicted value of y for any specific value of x

For instance, suppose we randomly select a household whose monthly income is $3500 so that x=35. The predicted value of food expenditure for this household is

ŷ = 1.1414 + (.2642)(35) =$10.3884 hundred=$1038.84

In other words, based on our regression line, we predict that a household with a monthly income of $3500 is expected to spend $1038.84 per month on food.

Page 12: LINEAR REGRESSION

LINEAR REGRESSION: LINEAR REGRESSION: example 1example 1

In our data, there is one household whose income is $3500. The observed food expenditure for the household is $900.

Income (X) Food Expenditure (Y)

35 49 21 39 15 28 25

915 711 5 8 9

The difference between the observed and the predicted values gives the residual error of prediction. It is equal to:

The negative error indicates that the predicted value of y is greater than the observed value of y. Thus if we use the regression model, the household’s food expenditure is overestimated by $138.84.

84.138$hundreds 3884.1$3884.1000.9ˆ yye

Page 13: LINEAR REGRESSION

Interpretation of a and bInterpretation of a and b

Interpretation of aInterpretation of a

Consider the household with zero income

ŷ = 1.1414 + .2642(0) = $1.1414 hundred

Thus, we can state that households with no income is expected to spend $114.14 per month on food.Thus a gives the predicted value of y for x=0 based on the regression model estimated for the sample data.However, the regression line is valid only for the values of x between 15 and 49. Thus, the prediction of a household with zero income is not credible. We should to be very careful in interpreting a!

Page 14: LINEAR REGRESSION

Interpretation of a and bInterpretation of a and b

Interpretation of bInterpretation of b

The value of b in the regression model gives the change in y due to change of one unit in x. It is called regression coefficient.regression coefficient.For example in our regression model we have that

when x=30 ŷ =1.1414+.2642(30)=9.0674

when x=31 ŷ =1.1414+.2642(31)=9.3316

Hence, when x increased by one unit, from 30 to 31, y increased by 9.3316-9.0674=0.2642, which is the value of b. Because our unit of measurement is hundreds of dollars, we can state that, on average, a $100 increase in income will cause a $26.42 increase in food expenditure. We can also state that, on average, a $1 increase in income of a household will increase the food expenditure be $0.2642. “On average” means that the food may increase or may not increase if the income increases by $100.

Page 15: LINEAR REGRESSION

Coefficient of determinationCoefficient of determinationThe linear regression should be apply with caution. When we use a linear regression we assume that the relationship between two variables is described by a straight line. In the real world, the relationship between the two variables may not be linear.

In such cases fitting a linear regression would be wrong

Page 16: LINEAR REGRESSION

Coefficient of determinationCoefficient of determinationIf we want to evaluate how good the regression model is, that is how well the independent variable explains the dependent variable using a linear model we can use the coefficient of determination.coefficient of determination.

Consider the case in which x=0. The regression model becomes:

y=a

But remind that the least square value of a is

if x=0

Thus if x=0 the regression line is .

xbya ˆ ya ˆ

yy

Page 17: LINEAR REGRESSION

Coefficient of determinationCoefficient of determination

Graphically

X

Y

yy

The picture represents the extreme situation in which there is no linear relation between x and y.

Page 18: LINEAR REGRESSION

Coefficient of determinationCoefficient of determination

X

Y

yy

jy

jy

y

In the picture we can add the regression model and the observed y for the individual j:

)ˆ()ˆ( yyyyyy jjjj

We obtain

We can observe that

if there is linear independence between x and y

if there is strong dependence between x and y

yy j ˆ

jj yy ˆ

Page 19: LINEAR REGRESSION

Coefficient of determinationCoefficient of determination

j jj jjj j yyyyyy 222 )ˆ()ˆ()(

If consider the square of these differences and sum by all the individuals we obtain:

SST = SSE + SSR SST = SSE + SSR

SSTSST= total sum of squarestotal sum of squares. It expresses all the variability of the y variable

SSRSSR= regression sum of squaresregression sum of squares. It expresses the portion of SST explained by the regression model.

SSESSE= error sum of squareserror sum of squares. It expresses the portion of SST not explained by the regression model.

Page 20: LINEAR REGRESSION

Coefficient of determinationCoefficient of determinationThe coefficient of determinationcoefficient of determination, denoted by rr22, represents the proportion of SST that is explained by the use of the regression model. The computational formula for r2 is

SST

SSESSTr

SST

SSRr

22 or

The value of r2 lies in the range 0 to 1. If it is close to 1 it means that almost all the variability of y is explained by the regression model. In other words, the regression model is a good model.

The computational formula of r2 is:

yy

xy

SS

bSSr 2

Page 21: LINEAR REGRESSION

Coefficient of determination: Coefficient of determination: example 1example 1

On monthly incomes and food expenditure of seven households calculate the coefficient of determination.

From earlier calculation we know

b=.2642 SSxy =211.7143 and SSyy= 60.8571

92.08571.60

)7143.211)(2642.0(2 r

We can state that the 92% of the variability of y is explained by the regression model (the 92% of the food expenditure is determined by the monthly income of the seven households).

Page 22: LINEAR REGRESSION

LINEAR CORRELATION LINEAR CORRELATION COEFFICIENTCOEFFICIENT

Page 23: LINEAR REGRESSION

LINEAR CORRELATION LINEAR CORRELATION COEFFICIENTCOEFFICIENT

Another measure of the relationship between two variables is the linearlinear correlation coefficientcorrelation coefficient.

The linear correlation coefficient measures how closely the points in a scatter diagram are spread around the regression line. It is indicated with r.

The value of the correlation coefficient always lies in the range -1 to 1, that is, .

There are three possible extreme values of r

11 r

Page 24: LINEAR REGRESSION

LINEAR CORRELATION LINEAR CORRELATION COEFFICIENTCOEFFICIENT

There are 3 possible extreme values of r:

1. r=1: perfect positive linear correlation between the two variables

2. r=-1: perfect negative linear correlation

3. r=0: no linear correlation

x

r=1y

x

y

r=-1

y

x

Page 25: LINEAR REGRESSION

LINEAR CORRELATION LINEAR CORRELATION COEFFICIENTCOEFFICIENT

We do not usually encounter an example with perfect positive or perfect negative correlation.

What we observe in real-word problems is either a positive linear correlation with 0<r<1 or a negative linear correlation with -1<r<0.

If the correlation between two variables is positive and close to 1, we say that the variables have a strong positive strong positive correlation.correlation.

If the correlation between two variables is positive but close to 0, we say that the variables have a weak positive weak positive correlation.correlation.

Page 26: LINEAR REGRESSION

LINEAR CORRELATION LINEAR CORRELATION COEFFICIENTCOEFFICIENT

If the correlation between two variables is negative and close to -1, we say that the variables have a strong negative strong negative correlation.correlation.

If the correlation between two variables is negative but close to 0, we say that the variables have a weak negative weak negative correlation.correlation.

Page 27: LINEAR REGRESSION

LINEAR CORRELATION LINEAR CORRELATION COEFFICIENTCOEFFICIENT

The simple linear correlationsimple linear correlation measures the strength of the linear relationship between two variables for a sample and is calculated as

yyxx

xy

SSSS

SSr

Page 28: LINEAR REGRESSION

LINEAR CORRELATION LINEAR CORRELATION COEFFICIENT: example 1COEFFICIENT: example 1

Calculate the correlation coefficient for the example on incomes and food expenditures of seven households.

96.)8571.60)(4286.801(

7143.211

yyxx

xy

SSSS

SSr

The linear correlation tells us how strongly the two variables are linearly related. The correlation coefficients of .96 for incomes and food expenditures of seven households indicates that income and food expenditure are very strongly and positive correlated.