chapter 5 regression. chapter 51 u objective: to quantify the linear relationship between an...

39
Chapter 5 Regression

Upload: lee-atkins

Post on 24-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5

Regression

Page 2: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 2

Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y).

We can then predict the average response for all subjects with a given value of the explanatory variable.

Linear Regression

Page 3: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Regression Line

A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x

Chapter 5 3

Page 4: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Staying Thin

STATE: Some people don’t gain weight even when they overeat. Maybe differences in “nonexercise activity” account for this.

Researchers deliberately fed 16 healthy young adults for eight weeks and measured fat gain (kg) and changes in energy use (cal) from nonexercise activities.

Chapter 5 4

Page 5: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Staying Thin

Question: Do people with large increases in nonexercise activity (NEA) gain less weight?

Chapter 5 5

NEA change (cal) -94 -57 -29 135 143 151 245 355Fat gain (kg) 4.2 3.0 3.7 2.7 3.2 3.6 2.4 1.3NEA change (cal) 392 473 486 535 571 580 620 690Fat gain (kg) 3.8 1.7 1.6 2.2 1.0 0.4 2.3 1.1

Page 6: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Staying Thin

PLAN: Make a scatterplot of the data and examine the pattern. If it’s linear, use correlation to measure the strength and direction. Draw a regression line on the plot to predict fat gain from change in NEA.

Chapter 5 6

Page 7: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 7

Page 8: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Staying Thin

SOLVE: The scatterplot shows a moderately strong negative correlation: r = -0.78. The line on the plot is the regression line that predicts weight gain from changes in NEA

Chapter 5 8

Page 9: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 9

Least Squares

Used to determine the “best” line

We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict)

Least Squares: use the line that minimizes the sum of the squares of the vertical distances of the data points from the line

Page 10: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Straight Lines

A straight line relating response variable y to explanatory variable x is given by:

b is the slope, the amount y changes when x increases by one unit

a is the intercept, the value of y when x = 0

Chapter 5 10

y a bx

Page 11: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 11

Least Squares Regression Line

Regression equation: y = a + bx^

– x is the value of the explanatory variable– “y-hat” is the average value of the response

variable (predicted response for a value of x)

– note that a and b are just the intercept and slope of a straight line

– note that r and b are not the same thing, but their signs will agree

Page 12: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Using Regression Line

Any line describing the weight gain-NEA data has the form:

The line in the plot has specific values for a and b to produce this regression line:

Chapter 5 12

fat gain NEA changea b

fat gain 3.505 0.00344 NEA change

Page 13: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 13

Page 14: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Using Regression Line

The slope b = -0.00344 says that for each added calorie of NEA, fat gain goes down by 0.00344 kg. This is the rate of change of fat gain with changes in NEA.

The intercept a = 3.505 says that our estimate of the amount of weight gained for someone whose NEA doesn’t change at all (NEA change = 0) is 3.505 kg.

Chapter 5 14

Page 15: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Regression Line Calculation

y

x

sb r

s

a y bx

Chapter 5 15

^ Regression equation: y = a + bx

where sx and sy are the standard deviations of the two variables, and r is their correlation

Page 16: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 16

Regression CalculationCase Study

Per Capita Gross Domestic Productand Average Life Expectancy for

Countries in Western Europe

Page 17: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 17

Country Per Capita GDP (x) Life Expectancy (y)

Austria 21.4 77.48

Belgium 23.2 77.53

Finland 20.0 77.32

France 22.7 78.63

Germany 20.8 77.17

Ireland 18.6 76.39

Italy 21.5 78.51

Netherlands 22.0 78.15

Switzerland 23.8 78.99

United Kingdom 21.2 77.37

Regression CalculationCase Study

Page 18: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Regression CalculationCase Study

21.52 77.754 0.809

1.532 0.795x y

x y r

s s

Chapter 5 18

Linear regression equation:

0.795(0.809) 0.420

1.532

77.754 - (0.420)(21.52) 68.716

y

x

sb r

s

a y bx

ˆ 68.716 0.420y x

Page 19: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 19

Coefficient of Determination (R2)

Measures usefulness of regression prediction R2 (or r2, the square of the correlation): measures

what fraction of the variation in the values of the response variable (y) is explained by the regression line r=1: R2=1: regression line explains all (100%) of

the variation in y r=.7: R2=.49: regression line explains almost half

(50%) of the variation in y

Page 20: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Using R2

For the NEA data (predicting weight gain with changes in NEA):

That is, about 61% of the variation in weight gained is accounted for by the linear relationship with changes in NEA. The other 39% or so is not explained by this relationship.Chapter 5 20

2 2

0.7786

( 0.7786) 0.6062

r

r

Page 21: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 21

Residuals

A residual is the difference between an observed value of the response variable and the value predicted by the regression line:

residual = y y

Page 22: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 22

Residuals

A residual plot is a scatterplot of the regression residuals against the explanatory variable

– used to assess the fit of a regression line

– look for a “random” scatter around zero

Page 23: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example - Empathy

“Empathy” is understanding how others feel To investigate empathy, researchers

measured brain activity in women who watched their partner receive a harmless but painful electric shock and compared that to their score on an ‘empathy test’

Question: do women who score high on empathy also respond more to watching their partner being zapped?

Chapter 5 23

Page 24: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example 5.5 - Empathy

The next slide displays a scatterplot of the data collected by the researchers

There’s a positive association – as we suspected

The overall pattern is moderately linear with

The line on the plot is the least squares line of brain activity vs. empathy score

Chapter 5 24

0.515r

Page 25: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 25

Page 26: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example 5.5 - Empathy

The least squares regression line for brain activity vs. empathy score is given by:

For subject 1 whose values are empathy score = 38 and brain activity value = -0.120, we predict:

Chapter 5 26

ˆ 0.0578 0.00761y x

ˆ 0.0578 0.00761 38 0.231y

Page 27: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example 5.5 - Empathy

So the residual for subject 1 the residual is:

The residual is negative because the regression line is above the observed value

Chapter 5 27

observed - predicted

ˆ-

0.120

Res

0

i

.23

dual

0 3

1

. 51

y y

Page 28: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 28

Page 29: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example 5.5 - Empathy

We can plot the residuals themselves against the explanatory variable to create a residual plot

The residual plot helps to show how well the regression line describes the data

The horizontal red “residual = 0” line corresponds to the regression line in the scatterplot

Chapter 5 29

Page 30: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 30

Page 31: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 31

Outliers and Influential Points

An outlier is an observation that lies far away from the other observations

– outliers in the y direction have large residuals

– outliers in the x direction are often influential for the least-squares regression line, meaning that the removal of such points would markedly change the equation of the line

Page 32: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Influential Observation

Subject 16 in the empathy data seems to be an outlier:– Removing subject 16 from the calculation of r

reduces r from 0.515 to 0.331 – yes, it is influential in this sense

– Calculating r = 0.515 isn’t useful because it depends so much on subject 16

Question: does subject 16 also have a big influence on the least squares regression line?

Chapter 5 32

Page 33: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 33

Page 34: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Influential Observation

As you can see in plot on the previous slide, removing the point for subject 16 from the calculation of the least squares regression line has almost no effect – so it is not influential where it is ….

However, if we move it around, it does become influential

Chapter 5 34

Page 35: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 35

Page 36: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Example – Influential Observation

When we move the point for subject 16 straight down, it pulls the regression line with it

when an outlier does not lie close to the regression line determined by the rest of the points in the dataset, it is influential

in the regression setting, not all outliers are influential

Chapter 5 36

Page 37: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 37

Cautionsabout Correlation and Regression

only describe linear relationships are both affected by outliers always plot the data before interpreting beware of extrapolation

– predicting outside of the range of x

beware of lurking variables– have important effect on the relationship among the

variables in a study, but are not included in the study

association does not imply causation

Page 38: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We

Chapter 5 38

Evidence of Causation A properly conducted experiment establishes

the connection Other considerations:

– The association is strong– The association is consistent

The connection happens in repeated trials The connection happens under varying conditions

– Higher doses are associated with stronger responses– Alleged cause precedes the effect in time– Alleged cause is plausible (reasonable explanation)

Page 39: Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We