1.9 comparing two data sets. revisiting go for the gold! 3a) whose slope is larger? the women’s is...
TRANSCRIPT
1.9 Comparing Two Data Sets
Revisiting Go For the Gold!
3a) Whose slope is larger?
• The women’s is growing faster (0.01875 m/year > 0.01353 m/year)
-0.6
0.0
0.6
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
7.47.67.88.08.28.48.68.89.0
Me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037
Go for the Gold! Scatter Plot
-0.6-0.20.2
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
5.6
6.0
6.4
6.8
7.2
7.6
Wo
me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692
Go for the Gold! Scatter Plot
Revisiting Go For the Gold!
3b) Residuals?• The men’s
residuals seem slightly more scattered
• The women’s residuals have more of a pattern
-0.6
0.0
0.6
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
7.47.67.88.08.28.48.68.89.0
Me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037
Go for the Gold! Scatter Plot
-0.6-0.20.2
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
5.6
6.0
6.4
6.8
7.2
7.6
Wo
me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692
Go for the Gold! Scatter Plot
Revisiting Go For the Gold!3b) r values?• The women
have a strong positive linear correlation (r = 0.83)
• The men have a weaker but still strong positive linear correlation (r = 0.70)
-0.6
0.0
0.6
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
7.47.67.88.08.28.48.68.89.0
Me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037
Go for the Gold! Scatter Plot
-0.6-0.20.2
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
5.6
6.0
6.4
6.8
7.2
7.6
Wo
me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692
Go for the Gold! Scatter Plot
Revisiting Go For the Gold!3b) r2 values?• 69% of the change
in the women’s data is due to yearly increases (r2 = 0.69)
• The men have a much less reliable fit; only 49% of the men’s increase is due to yearly increases. (r2 = 0.49)
-0.6
0.0
0.6
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
7.47.67.88.08.28.48.68.89.0
Me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037
Go for the Gold! Scatter Plot
-0.6-0.20.2
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
5.6
6.0
6.4
6.8
7.2
7.6
Wo
me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692
Go for the Gold! Scatter Plot
Revisiting Go For the Gold!
3b) Predicted values?
• Both predicted values for 2012 are off by more than 1 m!
-0.6
0.0
0.6
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
7.47.67.88.08.28.48.68.89.0
Me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037
Go for the Gold! Scatter Plot
-0.6-0.20.2
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
5.6
6.0
6.4
6.8
7.2
7.6
Wo
me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692
Go for the Gold! Scatter Plot
Neither model is extremely reliable, but the women’s seem to be a generally more
reliable, although the model is worse in the women’s case.
Revisiting Go For the Gold!3c) y-intercepts?• In year 0, the
men and women will be jumping backwards!
• Not much meaning
• Limits to predictive value of this model
-0.6
0.0
0.6
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
7.47.67.88.08.28.48.68.89.0
Me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037
Go for the Gold! Scatter Plot
-0.6-0.20.2
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
5.6
6.0
6.4
6.8
7.2
7.6
Wo
me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692
Go for the Gold! Scatter Plot
Revisiting Go For the Gold!3d) When will they jump equal distances?• Find the year when their distances are the same.
y = 0.01353x – 18.43y = 0.01875x – 30.3
0.01353x – 18.43 = 0.01875x – 30.3- 0.01875x + 0.01353 = - 30.3 + 18.43
-0.00522x = -11.87x = 2273.9
• The men and women will jump equal distances by the year 2274. Don’t wait up.
4. Comparing the two sets of data
• For each metre the men increase, the women increases by 0.95 m.
• When the men jump 0 m, the women jump -1.1 m. Backwards?
• Or just that the women’s distances will generally be less than the men’s – seems reasonable
• Remember we are comparing like quantities
• The y-intercept lowers the line
WomensDistance = 0.949MensDistance - 1.1; r2 = 0.66
-0.6
0.0
7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0
MensDistance
5.6
6.0
6.4
6.8
7.2
7.6
7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0
MensDistance
Go for the Gold! Scatter Plot
4. Comparing the two sets of data
• r = 0.81• Strong positive linear
correlation• r2= 0.66• 34% of the change in the
women’s distances is due to random fluctuations
• Residuals:• Scattered, so linear fit is a
good model.• We could probably use it to
predict women’s distances based on the men’s (or vice versa).
WomensDistance = 0.949MensDistance - 1.1; r2 = 0.66
-0.6
0.0
7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0
MensDistance
5.6
6.0
6.4
6.8
7.2
7.6
7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0
MensDistance
Go for the Gold! Scatter Plot
Knowing the men’s distance in 2012 is 8.31 m, the
women’s distance should be 6.8 m (actual is 7.12).
5. Effect of Outliers
Wom
ensD
ista
nce
5.6
6.0
6.4
6.8
7.2
7.6
Year1940 1950 1960 1970 1980 1990 2000 2010
WomensDistance = 0.01875Year - 30.3; r^2 = 0.69
-0.6-0.20.2
Res
idua
l1940 1950 1960 1970 1980 1990 2000 2010
Year
Go for the Gold Modified Scatter Plot
-0.6
0.0
0.6
Re
sid
ua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
7.47.67.88.08.28.48.68.89.0
Me
nsD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
MensDistance = 0.01353Year - 18.43; r^2 = 0.49
Go for the Gold! Scatter Plot
5. Effect of Outliers
• Be careful to only remove the data from the men’s or women’s side
• Should we remove it at all?– Typo or other human error?– Is the sample representative of the population?– Is this merely a bad regression model?
5. Effect of Outliers• Note that
removing the outlier increases the correlation for the men’s model but decreases it for the women!
Men
sDis
tanc
e
7.47.67.88.08.28.48.68.89.0
Year1940 1950 1960 1970 1980 1990 2000 2010
MensDistance = 0.01493Year - 21.25; r^2 = 0.70
-0.4
-0.1
0.2
Res
idua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
Go for the Gold Modified Scatter Plot
Wom
ensD
istan
ce
5.6
6.0
6.4
6.8
7.2
7.6
Year1940 1950 1960 1970 1980 1990 2000 2010
WomensDistance = 0.01493Year - 22.7; r^2 = 0.67
-0.6-0.20.2
Resid
ual
1940 1950 1960 1970 1980 1990 2000 2010Year
Go for the Gold Modified Scatter Plot
5. Effect of Outliers• Consider the
residuals.
• The linear model is still not a good one for this data.
• A logarithmic model is probably better.
Men
sDis
tanc
e
7.47.67.88.08.28.48.68.89.0
Year1940 1950 1960 1970 1980 1990 2000 2010
MensDistance = 0.01493Year - 21.25; r^2 = 0.70
-0.4
-0.1
0.2
Res
idua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
Go for the Gold Modified Scatter Plot
Wom
ensD
istan
ce
5.6
6.0
6.4
6.8
7.2
7.6
Year1940 1950 1960 1970 1980 1990 2000 2010
WomensDistance = 0.01493Year - 22.7; r^2 = 0.67
-0.6-0.20.2
Resid
ual
1940 1950 1960 1970 1980 1990 2000 2010Year
Go for the Gold Modified Scatter Plot
-0.6
-0.2
0.2
Res
idua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
5.6
6.0
6.4
6.8
7.2
7.6
Wom
ensD
ista
nce
1940 1950 1960 1970 1980 1990 2000 2010Year
WomensDistance = 0.01875Year - 30.3; r^2 = 0.69
Go for the Gold! Scatter Plot
5. Effect of Outliers• The point is a
likely outlier for the men’s data, but probably not for the women’s data
Men
sDis
tanc
e
7.47.67.88.08.28.48.68.89.0
Year1940 1950 1960 1970 1980 1990 2000 2010
MensDistance = 0.01493Year - 21.25; r^2 = 0.70
-0.4
-0.1
0.2
Res
idua
l
1940 1950 1960 1970 1980 1990 2000 2010Year
Go for the Gold Modified Scatter Plot
Wom
ensD
istan
ce
5.6
6.0
6.4
6.8
7.2
7.6
Year1940 1950 1960 1970 1980 1990 2000 2010
WomensDistance = 0.01493Year - 22.7; r^2 = 0.67
-0.6-0.20.2
Resid
ual
1940 1950 1960 1970 1980 1990 2000 2010Year
Go for the Gold Modified Scatter Plot
5. The Effect of Outliers
• Removing the outliers doesn’t affect the men vs women model too much
• Predicted value for women’s distance in 2012 is the same: 6.8 m
WomensDistance = 0.933MensDistance - 0.91; r2 = 0.84
6.2
6.6
7.0
7.4
MensDistance
7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8
-0.3
0.0
7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8
MensDistance
Go for the Gold Modif ied Scatter Plot