1.9 comparing two data sets. revisiting go for the gold! 3a) whose slope is larger? the women’s is...

16
1.9 Comparing Two Data Sets

Upload: angel-walsh

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

1.9 Comparing Two Data Sets

Page 2: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

Revisiting Go For the Gold!

3a) Whose slope is larger?

• The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

-0.6

0.0

0.6

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

7.47.67.88.08.28.48.68.89.0

Me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037

Go for the Gold! Scatter Plot

-0.6-0.20.2

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

5.6

6.0

6.4

6.8

7.2

7.6

Wo

me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692

Go for the Gold! Scatter Plot

Page 3: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

Revisiting Go For the Gold!

3b) Residuals?• The men’s

residuals seem slightly more scattered

• The women’s residuals have more of a pattern

-0.6

0.0

0.6

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

7.47.67.88.08.28.48.68.89.0

Me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037

Go for the Gold! Scatter Plot

-0.6-0.20.2

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

5.6

6.0

6.4

6.8

7.2

7.6

Wo

me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692

Go for the Gold! Scatter Plot

Page 4: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

Revisiting Go For the Gold!3b) r values?• The women

have a strong positive linear correlation (r = 0.83)

• The men have a weaker but still strong positive linear correlation (r = 0.70)

-0.6

0.0

0.6

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

7.47.67.88.08.28.48.68.89.0

Me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037

Go for the Gold! Scatter Plot

-0.6-0.20.2

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

5.6

6.0

6.4

6.8

7.2

7.6

Wo

me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692

Go for the Gold! Scatter Plot

Page 5: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

Revisiting Go For the Gold!3b) r2 values?• 69% of the change

in the women’s data is due to yearly increases (r2 = 0.69)

• The men have a much less reliable fit; only 49% of the men’s increase is due to yearly increases. (r2 = 0.49)

-0.6

0.0

0.6

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

7.47.67.88.08.28.48.68.89.0

Me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037

Go for the Gold! Scatter Plot

-0.6-0.20.2

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

5.6

6.0

6.4

6.8

7.2

7.6

Wo

me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692

Go for the Gold! Scatter Plot

Page 6: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

Revisiting Go For the Gold!

3b) Predicted values?

• Both predicted values for 2012 are off by more than 1 m!

-0.6

0.0

0.6

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

7.47.67.88.08.28.48.68.89.0

Me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037

Go for the Gold! Scatter Plot

-0.6-0.20.2

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

5.6

6.0

6.4

6.8

7.2

7.6

Wo

me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692

Go for the Gold! Scatter Plot

Neither model is extremely reliable, but the women’s seem to be a generally more

reliable, although the model is worse in the women’s case.

Page 7: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

Revisiting Go For the Gold!3c) y-intercepts?• In year 0, the

men and women will be jumping backwards!

• Not much meaning

• Limits to predictive value of this model

-0.6

0.0

0.6

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

7.47.67.88.08.28.48.68.89.0

Me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

MensDistance = 0.01353Year - 18.43; r^2 = 0.49; Sum of squares = 1.037

Go for the Gold! Scatter Plot

-0.6-0.20.2

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

5.6

6.0

6.4

6.8

7.2

7.6

Wo

me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

WomensDistance = 0.01875Year - 30.3; r^2 = 0.69; Sum of squares = 0.8692

Go for the Gold! Scatter Plot

Page 8: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

Revisiting Go For the Gold!3d) When will they jump equal distances?• Find the year when their distances are the same.

y = 0.01353x – 18.43y = 0.01875x – 30.3

0.01353x – 18.43 = 0.01875x – 30.3- 0.01875x + 0.01353 = - 30.3 + 18.43

-0.00522x = -11.87x = 2273.9

• The men and women will jump equal distances by the year 2274. Don’t wait up.

Page 9: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

4. Comparing the two sets of data

• For each metre the men increase, the women increases by 0.95 m.

• When the men jump 0 m, the women jump -1.1 m. Backwards?

• Or just that the women’s distances will generally be less than the men’s – seems reasonable

• Remember we are comparing like quantities

• The y-intercept lowers the line

WomensDistance = 0.949MensDistance - 1.1; r2 = 0.66

-0.6

0.0

7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0

MensDistance

5.6

6.0

6.4

6.8

7.2

7.6

7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0

MensDistance

Go for the Gold! Scatter Plot

Page 10: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

4. Comparing the two sets of data

• r = 0.81• Strong positive linear

correlation• r2= 0.66• 34% of the change in the

women’s distances is due to random fluctuations

• Residuals:• Scattered, so linear fit is a

good model.• We could probably use it to

predict women’s distances based on the men’s (or vice versa).

WomensDistance = 0.949MensDistance - 1.1; r2 = 0.66

-0.6

0.0

7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0

MensDistance

5.6

6.0

6.4

6.8

7.2

7.6

7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0

MensDistance

Go for the Gold! Scatter Plot

Knowing the men’s distance in 2012 is 8.31 m, the

women’s distance should be 6.8 m (actual is 7.12).

Page 11: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

5. Effect of Outliers

Wom

ensD

ista

nce

5.6

6.0

6.4

6.8

7.2

7.6

Year1940 1950 1960 1970 1980 1990 2000 2010

WomensDistance = 0.01875Year - 30.3; r^2 = 0.69

-0.6-0.20.2

Res

idua

l1940 1950 1960 1970 1980 1990 2000 2010

Year

Go for the Gold Modified Scatter Plot

-0.6

0.0

0.6

Re

sid

ua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

7.47.67.88.08.28.48.68.89.0

Me

nsD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

MensDistance = 0.01353Year - 18.43; r^2 = 0.49

Go for the Gold! Scatter Plot

Page 12: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

5. Effect of Outliers

• Be careful to only remove the data from the men’s or women’s side

• Should we remove it at all?– Typo or other human error?– Is the sample representative of the population?– Is this merely a bad regression model?

Page 13: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

5. Effect of Outliers• Note that

removing the outlier increases the correlation for the men’s model but decreases it for the women!

Men

sDis

tanc

e

7.47.67.88.08.28.48.68.89.0

Year1940 1950 1960 1970 1980 1990 2000 2010

MensDistance = 0.01493Year - 21.25; r^2 = 0.70

-0.4

-0.1

0.2

Res

idua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

Go for the Gold Modified Scatter Plot

Wom

ensD

istan

ce

5.6

6.0

6.4

6.8

7.2

7.6

Year1940 1950 1960 1970 1980 1990 2000 2010

WomensDistance = 0.01493Year - 22.7; r^2 = 0.67

-0.6-0.20.2

Resid

ual

1940 1950 1960 1970 1980 1990 2000 2010Year

Go for the Gold Modified Scatter Plot

Page 14: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

5. Effect of Outliers• Consider the

residuals.

• The linear model is still not a good one for this data.

• A logarithmic model is probably better.

Men

sDis

tanc

e

7.47.67.88.08.28.48.68.89.0

Year1940 1950 1960 1970 1980 1990 2000 2010

MensDistance = 0.01493Year - 21.25; r^2 = 0.70

-0.4

-0.1

0.2

Res

idua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

Go for the Gold Modified Scatter Plot

Wom

ensD

istan

ce

5.6

6.0

6.4

6.8

7.2

7.6

Year1940 1950 1960 1970 1980 1990 2000 2010

WomensDistance = 0.01493Year - 22.7; r^2 = 0.67

-0.6-0.20.2

Resid

ual

1940 1950 1960 1970 1980 1990 2000 2010Year

Go for the Gold Modified Scatter Plot

-0.6

-0.2

0.2

Res

idua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

5.6

6.0

6.4

6.8

7.2

7.6

Wom

ensD

ista

nce

1940 1950 1960 1970 1980 1990 2000 2010Year

WomensDistance = 0.01875Year - 30.3; r^2 = 0.69

Go for the Gold! Scatter Plot

Page 15: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

5. Effect of Outliers• The point is a

likely outlier for the men’s data, but probably not for the women’s data

Men

sDis

tanc

e

7.47.67.88.08.28.48.68.89.0

Year1940 1950 1960 1970 1980 1990 2000 2010

MensDistance = 0.01493Year - 21.25; r^2 = 0.70

-0.4

-0.1

0.2

Res

idua

l

1940 1950 1960 1970 1980 1990 2000 2010Year

Go for the Gold Modified Scatter Plot

Wom

ensD

istan

ce

5.6

6.0

6.4

6.8

7.2

7.6

Year1940 1950 1960 1970 1980 1990 2000 2010

WomensDistance = 0.01493Year - 22.7; r^2 = 0.67

-0.6-0.20.2

Resid

ual

1940 1950 1960 1970 1980 1990 2000 2010Year

Go for the Gold Modified Scatter Plot

Page 16: 1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

5. The Effect of Outliers

• Removing the outliers doesn’t affect the men vs women model too much

• Predicted value for women’s distance in 2012 is the same: 6.8 m

WomensDistance = 0.933MensDistance - 0.91; r2 = 0.84

6.2

6.6

7.0

7.4

MensDistance

7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8

-0.3

0.0

7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8

MensDistance

Go for the Gold Modif ied Scatter Plot