lecture13 - php.scripts.psu.edu
Post on 20-Feb-2022
8 Views
Preview:
TRANSCRIPT
9/30/13
1
Sept. 30 Statistic for the Day: (David) Justice vs. (Derek) Jeter, 1995 and 1996: Overall batting average: Jeter’s is higher (.310 vs. .270) Each year: Justice’s is higher (.253 vs. .250 in 1995, .321 vs. .314 in 1996)
Assignment: Read Chapter 10
weight calories 1 Big Montana 309 g 590 2 Giant Roast Beef 224 450 3 Regular Roast Beef 154 320 4 Beef ‘n Cheddar 195 440 5 Super Roast Beef 230 440 6 Junior Roast Beef 125 270 7 Chicken Breast Fillet 233 500 8 Chicken Bacon ‘n Swiss 209 550 9 Roast Chicken Club 228 470 10 Market Fresh Turkey Ranch Bacon 379 830 11 Market Fresh Ultimate BLT 293 780 12 Market Fresh Roast Beef Swiss 357 780 13 Market Fresh Roast Ham Swiss 357 700 14 Market Fresh Roast Turkey Swiss 357 720 15 Market Fresh Chicken Salad 322 770
Arby’s sandwiches (from a while ago)
weight calories 1 Big Montana 309 g 590 2 Giant Roast Beef Max 224281 450580 3 Regular Roast Beef Classic 154 320350 4 Beef ‘n Cheddar Classic 195 440 5 Super Roast Beef Mid 230210 440 6 Junior Roast Beef 12587 270210 7 Chicken Breast Fillet Crispy 233221 500510 8 Chicken Bacon ‘n Swiss Crispy 209205 550610 9 Roast Chicken Grand Turkey Club 228233 470490 10 Market Fresh Turkey Ranch Bacon 379344 830800 11 Market Fresh Ultimate BLT 293 780 12 Market Fresh Roast Beef Swiss 357 780 13 Market Fresh Roast Ham Swiss 357 700 14 Market Fresh Roast Turkey Swiss 357326 720700 15 Market Fresh Chicken Salad 322 770
Arby’s sandwiches (2012 update) Research Question: At Arby’s, are calories related to the weight of the sandwich?
Let’s try using tools from previous chapters first: Observational study
• Response = calories • Explanatory variable = small or large sandwich
Small sandwich means less than 225 grams (n = 6) Large sandwich means more than 225 grams (n = 4)
Arby’s Sandwiches
Large Small
200
300
400
500
600
700
800
Calories
There seems to be a difference. (Is it statistically significant? That question comes later in the course!)
This is where we consider the new topic of Chapter 10: We can refine the explanatory variable and get more information about the relationship between calories and weight:
(Note: when we do this, we can no longer think of the explanatory variable as identifying which subpopulation the observation belongs to.)
Rather than split it into small and large, keep the numerical values of the explanatory variable.
9/30/13
2
This type of plot, with two measurements per subject, is called a scatterplot (see p. 166).
100 150 200 250 300 350
200
300
400
500
600
700
800
Arby's Sandwiches
Weight
Calories
The correlation measures the strength of the linear relationship between weight and calories.
100 150 200 250 300 350
200
300
400
500
600
700
800
Arby's Sandwiches
Weight
Calories
Correlation=0.95
Facts about Correlation: • We use the letter “r” to denote the correlation coefficient. • The correlation coefficient is a measure of the strength of
the linear relationship between the two variables in a scatterplot.
• The value of r must always be between −1 and 1:
a. r=0 means no linear relationship. b. Positive r means the two variables tend to increase
together (with r=1 meaning a perfect linear relationship) c. Negative r means that one variable increases while the
other decreases (with −1 meaning a perfect linear relationship)
The best-fitting line through the data is called the regression line.
How should we describe this line?
100 150 200 250 300 350
200
300
400
500
600
700
800
Arby's Sandwiches
Weight
Calories
Formula for a regression line
Remember your algebra! The equation for a line is
y = (intercept) + (slope)(x)
or, in this case,
calories = (intercept) + (slope)(weight)
So all we need to describe the line is the intercept and the slope.
The intercept is 41 in this case and the slope is 2.1.
In this class, you don’t need to know how to calculate the slope and intercept (but see p. 195 if you like formulas). 100 150 200 250 300 350
200
300
400
500
600
700
800
Arby's Sandwiches
Weight
Calories
cal = 41 + (2.1)(wt)
9/30/13
3
------------------------------------------------- For example, if you have a 200g sandwich, on the average you expect to get about: 41 + (2.1)(200) = 41 + 420 = 461 calories --------------------------------------------------
For a 350g sandwich: 41 + (2.1)(350) = 41 + 735 = 776 calories
calories = 41 + (2.1)(weight in grams)
intercept slope
calories = 41 + (2.1)(weight in grams)
For every extra gram of weight, you expect an increase of 2.1 calories in your Arby’s sandwich.
Interpretation of slope: Expected increase in response for every unit increase (increase of one) in explanatory.
intercept slope
More scatterplots: Exercise hours vs. GPA
0 5 10 15 20 25 30
2.0
2.5
3.0
3.5
4.0
Exercise Hours
Gra
de p
oint
ave
rage
“For how many hours do you typically exercise in a typical week during the semester?”
Linear relationship with GPA?
More scatterplots: Exercise hours vs. GPA
“For how many hours do you typically exercise in a typical week during the semester?”
0 5 10 15 20 25 30
2.0
2.5
3.0
3.5
4.0
Exercise Hours
Gra
de p
oint
ave
rage
More scatterplots: TV hours vs. GPA
“For how many hours do you typically watch television during an average weekday (Monday through Friday) during the semester? ”
0 10 20 30 40 50 60
510
1520
2530
35
TV Hours
Gra
de p
oint
ave
rage
We may have a slight problem with outliers!
More scatterplots: TV hours vs. GPA
“For how many hours do you typically watch television during an average weekday (Monday through Friday) during the semester? ”
0 2 4 6 8 10
2.0
2.5
3.0
3.5
4.0
TV Hours
Gra
de p
oint
ave
rage
9/30/13
4
More scatterplots: TV hours vs. GPA
“For how many hours do you typically watch television during an average weekday (Monday through Friday) during the semester? ”
0 2 4 6 8 10
2.0
2.5
3.0
3.5
4.0
TV Hours
Gra
de p
oint
ave
rage
More scatterplots: TV hours vs. GPA
“For how many hours do you typically watch television during an average weekday (Monday through Friday) during the semester? ”
0 2 4 6 8 10
2.0
2.5
3.0
3.5
4.0
TV Hours
Gra
de p
oint
ave
rage
More scatterplots: Weight vs. Ideal Weight
Question: What is the relationship between weight and ideal weight? We’ll use SP2004 data.
100 150 200 250
100
150
200
250
Men and Women Combined
Weight
Ide
al W
eig
ht
Compare with case study 10.2, page 193
Dotted red line:
Weight = Ideal Weight
(not a regression line; rather, it’s a line for comparison purposes)
100 150 200 250
100
150
200
250
Men and Women Combined
Weight
Ideal W
eig
ht
The green line is the regression line:
Ideal weight = 25.6 + 0.78 Weight
Correlation = .867
R-squared = .752
S=15.17 100 150 200 250
100
150
200
250
Men and Women Combined
Weight
Ide
al W
eig
ht
Dotted red line: Weight = Ideal Weight
150 200 250
140
160
180
200
220
240
Men Only
Weight
Ide
al W
eig
ht
9/30/13
5
Green regression line:
Ideal weight = 66.2 + 0.61 Weight
Correlation = .850
R-squared = .723
S=12.36
What does it mean when the lines cross at 169 pounds?
150 200 250
140
160
180
200
220
240
Men Only
Weight
Ide
al W
eig
ht
Dotted red line: Weight = Ideal Weight
100 120 140 160 180 200 220 240
100
110
120
130
140
150
160
Women Only
Weight
Ideal W
eig
ht
Green regression line:
Ideal weight = 56.1 + 0.50 Weight
Correlation = .831
R-squared = .691
S=8.20
100 120 140 160 180 200 220 240
100
110
120
130
140
150
160
Women Only
Weight
Ideal W
eig
ht
The lines cross at 112 pounds.
Spring 2001 Mean Fall 2008 Mean Wt. Ideal
Wt. Diff. Wt. Ideal
Wt. Diff.
Comb. 146 138 8 154 146 8 Men 175 171 4 174 172 2 Women 132 122 10 138 126 12
This pattern remained fairly steady over many years of STAT 100: Men on average are about 0-5 pounds heavier than their ideal, whereas women on average are about 10-12 pounds heavier than their ideal.
Note, however, that the regression lines tell a more complete story!
SP 2001 Mean Weight
FA 2008 Mean Weight
Combined 146 154 Men 175 174 Women 132 138
Notice: Combined mean weight is 8 pounds heavier in 2008. But women are only 6 pounds heavier on average, and men are actually lighter. How is this possible?
A weighty puzzle: SP 2001 vs. FA 2008 in STAT 100
The answer is related to Simpson’s paradox.
Percent men 32% 43%
top related