simple linear regression

34
Statistics: Unlocking the Power of Data STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction Cautions Least Squares regression

Upload: louvain

Post on 23-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

STAT 101 Dr. Kari Lock Morgan 11/6/12. Simple Linear Regression. SECTION 2.6 Interpreting coefficients Prediction Cautions Least Squares regression. Crickets and Temperature. Can you estimate the temperature on a summer evening, just by listening to crickets chirp? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

STAT 101Dr. Kari Lock Morgan

11/6/12

Simple Linear Regression

SECTION 2.6• Interpreting coefficients• Prediction• Cautions• Least Squares regression

Page 2: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

• Can you estimate the temperature on a summer evening, just by listening to crickets chirp?

• We will fit a model to predict temperature based on cricket chirp rate

Crickets and Temperature

Page 3: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Crickets and Temperature

Response Variable, y

ExplanatoryVariable, x

Page 4: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Linear Model

A linear model predicts a response variable, y, using a linear function of

explanatory variables

Simple linear regression predicts on response variable, y, as a linear

function of one explanatory variable, x

Page 5: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression LineGoal: Find a straight line that best fits the data in a

scatterplot

Page 6: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Equation of the Line

��= ��0+ ��1𝑥where x is the explanatory variable, and is the predicted response variable.

The estimated regression line is

Intercept Slope

• Slope: increase in predicted y for every unit increase in x

• Intercept: predicted y value when x = 0

Page 7: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression in StatKey

𝑇𝑒𝑚𝑝=37.679+0.231 h𝐶 𝑖𝑟𝑝𝑠

Page 8: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression in RStudio

𝑇𝑒𝑚𝑝=37.6786+0.2307 h𝐶 𝑖𝑟𝑝𝑠

Page 9: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression Model

Which is a correct interpretation?

a) The average temperature is 37.68

b) For every extra 0.23 chirps per minute, the predicted temperature increases by 1 degree

c) Predicted temperature increases by 0.23 degrees for each extra chirp per minute

d) For every extra 0.23 chirps per minute, the predicted temperature increases by 37.68

𝑇𝑒𝑚𝑝=37.68+0.23 h𝐶 𝑖𝑟𝑝𝑠

Page 10: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Units• It is helpful to think about units when

interpreting a regression equation

y units y units x units units unitsyx

degrees degrees chirps per minute

degrees/chirps per min

��= ��0+ ��1𝑥

𝑇𝑒𝑚𝑝=37.68+0.23 h𝐶 𝑖𝑟𝑝𝑠

Page 11: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Prediction• The regression equation can be used to

predict y for a given value of x

• If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is

37.68 0.23 140 69.88

𝑇𝑒𝑚𝑝=37.68+0.23 h𝐶 𝑖𝑟𝑝𝑠

Page 12: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Prediction37.68 0.23 140 69.88

Page 13: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Prediction

If the crickets are chirping about 180 times per minute, your best guess at the temperature is

(a) 60(b) 70(c) 80

37.68 0.23 180 79.08

Page 14: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Prediction

The intercept tells us that the predicted temperature when the crickets are not chirping at all is 37.68. Do you think this is a good prediction?

(a) Yes(b) No

None of the observed chirp rates are anywhere close to 0, so we have no way of knowing if the linear trend continues all the way down to 0.

𝑇𝑒𝑚𝑝=37.68+0.23 h𝐶 𝑖𝑟𝑝𝑠

Page 15: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression Caution 1

• Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!)

• If none of the x values are anywhere near 0, then the intercept is meaningless!

Page 16: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Duke Rank and Duke Shirts

a) positively associated

b) negatively associated

c) not associated

d) other

Are the rank of Duke among schools applied to and the number of Duke shirts owned

2 4 6 8

1020

3040

DukeRank

DukeShirts

Page 17: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Duke Rank and Duke Shirts

a) positively associated

b) negatively associated

c) not associated

d) other

Are the rank of Duke among schools applied to and the number of Duke shirts owned

2 4 6 8

1020

3040

DukeRank

DukeShirts

2 4 6 8

1020

3040

DukeRank

DukeShirts

Page 18: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression Caution 2

• Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear

• ALWAYS PLOT YOUR DATA!

• The regression line/equation should only be used if the association is approximately linear

Page 19: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression Caution 3

• Outliers (especially outliers in both variables) can be very influential on the regression line

• ALWAYS PLOT YOUR DATA!

http://illuminations.nctm.org/LessonDetail.aspx?ID=L455

Page 20: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Life Expectancy and Birth Rate

Coefficients: (Intercept) LifeExpectancy 83.4090 -0.8895

Which of the following interpretations is correct?

(a) A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy

(b) Increasing life expectancy by 1 year will cause birth rate to decrease by 0.89

(c) Both

(d) Neither(a) The model is predicting birth rate based on life

expectancy, not the other way around(b) Can only make conclusions about causality from a

randomized experiment.

Page 21: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression Caution 4

• Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease

• Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)

Page 22: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Explanatory and Response

• Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response

Page 23: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Regression LineHow do we find the best fitting line???

Page 24: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Predicted and Actual Values• The actual response value, y, is the response

value observed for a particular data point

• The predicted response value, , is the response value that would be predicted for a given x value, based on a model

• In linear regression, the predicted values fall on the regression line directly above each x value

• The best fitting line is that which makes the predicted values closest to the actual values

Page 25: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Predicted and Actual Values

y y

Page 26: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Residual

The residual for each data point is

actual – predicted =

The residual is also the vertical distance from each point to the line

Page 27: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Residual

residual { ˆ 63.5 61.44 2.06yy

• Want to make all the residuals as small as possible.

• How would you measure this?

Page 28: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Least Squares Regression

Least squares regression chooses the regression line that minimizes

the sum of squared residuals

2

1

minimize ( )ˆn

i ii

y y

Page 29: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Least Squares Regression

Page 30: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Presidential ElectionsCan we use the current approval rating (50%,

according to a Rasmussen Poll yesterday) to predict whether he will win the election tonight?

We can build a model using data from past elections to predict an incumbent’s margin of victory based on approval rating

Page 31: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Presidential Elections

What is Obama’s predicted margin of victory, based on his approval rating?

Page 32: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

Presidential ElectionsFor this number to be useful, we need an

interval estimate for Obama’s margin of victory

We’ll learn how to be compute these next class

For now: if Obama’s approval rating really is 50%, then we are 95% confident that Obama’s margin of victory will be between -8.8 and 19.4

Page 33: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

• Group project on regression (modeling) to be done in your lab groups

• Need a data set with a quantitative response variable and multiple explanatory variables; must have at least one categorical and at least one quantitative explanatory variable

Project 2

Page 34: Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

To DoRead Section 2.6, 9.1

Complete the anonymous midterm evaluation TODAY by 5pm

Do Homework 7 (due Tuesday, 11/13) NO LATE HOMEWORK ACCEPTED – SOLUTIONS

WILL BE POSTED IMMEDIATELY AFTER CLASS

Find data for Project 2 Presentations last lab (Monday, 12/3) Report due last day of class (Thursday, 12/6)