warm up use calculator to find r,, a, b. chapter 8 lsrl-least squares regression line
TRANSCRIPT
Warm upRoller Coaster State Drop (ft) Duration (sec)
Incredible Hulk FL 105 135 Millennium Force OH 300 105
Goliath CA 255 180 Nitro NJ 215 240
Magnum XL-2000 OH 195 120 The Beast OH 141 65
Son of Beast OH 214 140 Thunderbolt PA 95 90 Ghost Rider CA 108 160
Raven IN 86 90
Use calculator to find r, , a , b
Chapter 8
LSRL-Least Squares Regression Line
Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination
AP Statistics Objectives Ch8
Bivariate data
• x – variable: is the independent or explanatory variable
• y- variable: is the dependent or response variable
• Use x to predict y
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide 8 - 5
Fat Versus Protein: An Example
The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide 8 - 6
The Linear Model
The correlation in this example is 0.83. It says “There seems to be a linear association between these two variables,” but it doesn’t tell what that association is.
We can say more about the linear relationship between two quantitative variables with a model.
A model simplifies reality to help us understand underlying patterns and relationships.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide 8 - 7
The Linear Model (cont.)
The linear model is just an equation of a straight line through the data. The points in the scatterplot don’t all line up,
but a straight line can summarize the general pattern with only a couple of parameters.
The linear model can help us understand how the values are associated.
bxay ˆ
b – is the slope– it is the amount by which y increases when
x increases by 1 unita – is the y-intercept
– it is the height of the line when x = 0– in some situations, the y-intercept has no
meaning
y - (y-hat) means the predicted y
Be sure to put the hat on the y
Least Squares Regression LineLSRL
• The line that gives the bestbest fit to the data set
• The line that minimizesminimizes the sum of the squares of the deviations from the line
1 - 1 - 1010
0204060
0 20 40 60
X
Y
ScattergramScattergram
1.1. Plot of All (Plot of All (XXii, , YYii) Pairs) Pairs
2.2. Suggests How Well Model Will FitSuggests How Well Model Will Fit
1 - 1 - 1111
0204060
0 20 40 60
X
Y
Thinking ChallengeThinking Challenge
How would you draw a line through the How would you draw a line through the points? How do you determine which line points? How do you determine which line ‘fits best’?‘fits best’?
1 - 1 - 1212
0204060
0 20 40 60
X
Y
Thinking ChallengeThinking Challenge
How would you draw a line through the How would you draw a line through the points? How do you determine which line points? How do you determine which line ‘fits best’?‘fits best’?
1 - 1 - 1313
0204060
0 20 40 60
X
Y
Thinking ChallengeThinking Challenge
How would you draw a line through the How would you draw a line through the points? How do you determine which line points? How do you determine which line ‘fits best’?‘fits best’?
1 - 1 - 1414
0204060
0 20 40 60
X
Y
Thinking ChallengeThinking Challenge
How would you draw a line through the How would you draw a line through the points? How do you determine which line points? How do you determine which line ‘fits best’?‘fits best’?
1 - 1 - 1515
0204060
0 20 40 60
X
Y
Thinking ChallengeThinking Challenge
How would you draw a line through the How would you draw a line through the points? How do you determine which line points? How do you determine which line ‘fits best’?‘fits best’?
1 - 1 - 1616
0204060
0 20 40 60
X
Y
Thinking ChallengeThinking Challenge
How would you draw a line through the How would you draw a line through the points? How do you determine which line points? How do you determine which line ‘fits best’?‘fits best’?
1 - 1 - 1717
0204060
0 20 40 60
X
Y
Thinking ChallengeThinking Challenge
How would you draw a line through the How would you draw a line through the points? How do you determine which line points? How do you determine which line ‘fits best’?‘fits best’?
Sum of the squares = 61.25
45 xy .ˆ
-4
4.5
-5
y =.5(0) + 4 = 4
0 – 4 = -4
(0,0)
(3,10)
(6,2)
(0,0)
y =.5(3) + 4 = 5.5
10 – 5.5 = 4.5
y =.5(6) + 4 = 7
2 – 7 = -5
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide 8 - 19
Residuals
The model won’t be perfect, regardless of the line we draw.
Some points will be above the line and some will be below.
The estimate made from a model is the predicted value (denoted as ).y
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Slide 8 - 20
Residuals (cont.)
The difference between the observed value and its associated predicted value is called the residual.
To find the residuals, we always subtract the predicted value from the observed one:
residual observed predicted y y
(0,0)
(3,10)
(6,2)
Sum of the squares = 54
33
1 xy
Use a calculator to find the line of
best fit
Find y - y
-3
6
-3
What is the sum of the deviations from the line?
Will it always be zero?
The line that minimizesminimizes the sum of the squares of the deviations from the
line is the LSRLLSRL.
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
Slope:
For each unitunit increase in xx, there is an approximate increase/decreaseincrease/decrease of bb in yy.
Interpretations
The ages (in months) and heights (in inches) of seven children are given.
x 16 24 42 60 75 102 120
y 24 30 35 40 48 56 60
Find the LSRL.
Interpret the slope and correlation coefficient in the context of the problem.
Slope:
For each unitunit increase in xx, there is an approximate increase/decreaseincrease/decrease of bb in yy.
Interpretations
Correlation coefficient:
There is a _________________
association between the________
Slope:For an increase in ______________________________, there is an approximate ____________ of ______________________________________________________________
Correlation coefficient:
There is a strong, positive, linearstrong, positive, linear association between the age and age and height of childrenheight of children.
Slope:For an increase in age of one monthage of one month, there is an approximate increaseincrease of .34 .34 inches in heights of children.inches in heights of children.
The ages (in months) and heights (in inches) of seven children are given.
x 16 24 42 60 75 102 120
y 24 30 35 40 48 56 60
Predict the height of a child who is 4.5 years old.
Predict the height of someone who is 20 years old.
ExtrapolationExtrapolation• The LSRL should notshould not be used to
predict y for values of x outside the data set.
• It is unknown whether the pattern observed in the scatterplot continues outside this range.
The ages (in months) and heights (in inches) of seven children are given.
x 16 24 42 60 75 102 120
y 24 30 35 40 48 56 60
Calculate x & y.
Plot the point (x, y) on the LSRL.
Will this point always be on the LSRL?
The correlation coefficient and the LSRL are both non-resistantnon-resistant measures.
Formulas – on chart
x
y
i
ii
s
srb
xbyb
xx
yyxxb
xbby
1
10
21
10ˆ
The following statistics are found for the variables posted speed limit and the average number of accidents.
99814818
61140
.,.,
,.,
rsy
sx
y
x
Find the LSRL & predict the number of accidents for a posted speed limit of 50 mph.
9210723 ..ˆ xy accidents2325.ˆ y
Chapter 8
Chapter 8
Chapter 8
Chapter 8
Chapter 8
Chapter 8
Chapter 8
Chapter 8
Chapter 8 R2 – Must also be interpreted when describing a regression model
“With the linear regression model, _____% of the variability in _______ (response variable) is accounted for by variation in ________ (explanatory variable)”
The remaining variation is due to the residuals
R2 = +1
Examples of Approximate R2 Values
y
x
y
x
R2 = 1
R2 = 1
Perfect linear relationship between x and y:
100% of the variation in y is explained by variation in x
Examples of Approximate R2 Values
y
x
y
x
0 < R2 < 1
Weaker linear relationship between x and y:
Some but not all of the variation in y is explained by variation in x
Examples of Approximate R2 Values
R2 = 0
No linear relationship between x and y:
The value of Y does not depend on x. (None of the variation in y is explained by variation in x)
y
xR2 = 0
Did you say 2? Wrong. Try again.
So what?
Important Note: The correlation is not given directly in this software package. You need to look in two places for it. Taking the square root of the “R squared” (coefficient of determination) is not enough. You must look at the sign of the slope too. Positive slope is a positive r-value. Negative slope is a negative r-value.
So here you should note that the slope is positive. The correlation will be positive too. Since R2 is 0.482, r will be +0.694.
Coefficient of Determination =
(0.694)2 = 0.4816
0.4816
With the linear regression model, 48.2% of the variability in airline fares is accounted for by the variation in distance of the flight.
There is an increase of 7.86 cents for every additional mile.
There is an increase of $7.86 for every additional 100 miles.
The model predicts a flight of zero miles will cost $177.23. The airline may have built in an initial cost to pay for some of its expenses.
8. Using those estimates, draw the line on the scatterplot.