statistics 200 - personal.psu.edupersonal.psu.edu/drh20/200/lectures/lecture21.pdfstat 200 students...
TRANSCRIPT
STATISTICS 200 Lecture #21 Tuesday, November 1, 2016 Textbook: 9.7, 9.8, 11.1, 11.2, 11.3, 11.4
• Apply sampling distribution for one sample mean to confidence intervals. • Apply sampling distribution for difference of two sample means to confidence intervals. • Apply sampling distribution for sample mean of (paired) differences to confidence intervals. • Recognize similarities between one mean and mean of paired differences.
Objectives:
We have begun a strong focus on Inference
One population proportion
Two population proportions
One population
mean Difference between Means
Mean difference
Proportions Means
This week
Example 2 from Thursday: We ask each of 31 students “how many regular ‘text’ friends do you have?”
Clicker Question: What kind of variable is this? A. Categorical B. Quantitative
Survey results: n = 31 X-bar = 6 friends s = 2.0 friends
Calculate a 95% Confidence Interval: How can we estimate the population mean number of regular “text” friends for all STAT 200 students using these data?
Confidence Interval Formula sample estimate ± (margin of error) sample estimate ± (multiplier × standard error)
Generic Formula:
Survey results: n = 31 X-bar = 6 friends s = 2.0 friends
6.00±2.04⇥ 2.0p31
= 6.00±0.73
Thus, the 95% CI is
We are 95% confident that the… a. sample mean b. sample proportion c. population mean d. population proportion e. range of values for the …number of regular “text” friends for STAT 200 students is between 5.3 and 6.7 friends.
Confidence Interval Interpretation
Calculated Interval: 6.0 ± 0.7 friends (5.3 to 6.7 friends)
Confidence Interval Conclusion
In the population, we may conclude, with 95% confidence, that on average, STAT 200 students have A. more than 6 friends. B. more than 4 friends. C. fewer than 5 friends. D. fewer than 6 friends.
95% C.I.: 5.3 to 6.7 friends
Are all sampling distributions normal? _____
When do we have to be cautious? 1. with _____ sample sizes 2. where the original population is not ______ in shape
One-Sample t procedure is valid if one of the conditions for
normality is met:
Sample data suggest a normal shape
We have a large sample size (n ≥ __)
or
Sampling distribution will look normal in shape
small
No
30
normal
Example: Compare predicted GPAs of males and females in STAT 200
Q: What do you think your actual GPA will be when you gradaute?
Students in this class: Representative sample (?) of all STAT 200 students, with nf=157 and nm=130.
Parameter of interest: µf � µm
Example: Compare predicted GPAs of males and females in STAT 200
• Parameter of interest: • Estimate of the parameter: • Statistics collected from the sample:
Xf �Xm
µf � µm
Xf = 3.470, sf = 0.286
Xm = 3.456, sm = 0.304
What do we need in order to create a CI for ? µf � µm
Formula for CI for
sample estimate ± (multiplier × standard error)
µf � µm
Xf �XmRoughly 2.0 for 95% confidence
???
What is the standard error of ? Xf �Xm
p(S.E.#1)2 + (S.E.#2)2
Example: Compare predicted GPAs of males and females in STAT 200
Xf = 3.470, sf = 0.286, nf = 157
Xm = 3.456, sm = 0.304, nm = 130
Here are the data, summarized:
Thus, • Estimate = 3.470 – 3.456, which is 0.014. • Multiplier = roughly 2 (more on this later…) • SE (estimate) =
r0.2862
157+
0.3042
130= 0.035
Example: Are smokers and non-smokers different heights on average?
Variable SmokeCig N N* Mean SE Mean StDevHeight No 264 5 67.275 0.249 4.041 Yes 19 0 68.211 0.736 3.207
Summary of class data from Minitab:
Based on these data, a 95% CI would be roughly:
(A) 264 – 19 ± 2 × sqrt(4.0412 + 3.2072) (B) 67.28 – 68.21 ± 2 × sqrt(4.0412 + 3.2072) (C) 264 – 19 ± 2 × sqrt(0.2492 + 0.7362) (D) 67.28 – 68.21 ± 2 × sqrt(0.2492 + 0.7362)
However, there is a slight problem with the multiplier of 2…
Example: Computer versus TV 25 students in a liberal arts course were given a survey that asked them how many hours per week they watched television and how many hours per week they used a computer. The goal is to determine if there is a difference in the mean number of hours spent per week on computers versus TV.
Consider the statements below: • The two samples are dependent • The experimental unit is a student • The response variable is quantitative • This is a randomized experiment
Clicker Question: How many of those
statements are TRUE? A. 0 C. 2 B. 1 D. 3
Example: Computer versus TV
student Computer TV
1 30 20
2 20 25
3 10 10
4 10 5
“ “ “ 25 20 15.0
(experimental) unit: student
response variable: Number of hours
Variation you want to… • reduce: the variation from student to student • explain: the variation due to type of screen
What if we construct a independent samples CI?
Difference = mu (Computer) - mu (TV) 95% CI for difference: (-3.29, 9.17)
Conclusion: Since the C.I. contains ____, can ______ claim that a
difference exists. 0 not
Problem: • The two samples are ___________
(paired) • ______ measurements on each unit
When the two-sample t procedure is incorrectly used,
• it captures unwanted variation found
with the two individual standard deviations
• It is less able to find significance
Instead use: _________ procedure
dependent
two
Paired t
Data used for paired analysis
What do you notice when examining the signs of the differences?
Summary Statistics for Samples
student
Computer TV Comp - TV
1 30 20 10
2 20 25 -5
3 10 10 0
4 10 5 5
… … … … 25 20 15.0 5
Mean 17.04 14.10 2.94 StDev s1 =12.36 s2 = 9.26 sd = 5.34
Mean of the differences
Confidence Interval (key: treat it like a single mean) Calculate a 95% confidence interval to estimate the population mean difference in hours spent on a computer vs watching TV
nstd d*±
2.9 5.34
25 df = n-1 = 24
Calculation: 2.9 ± [2.06 × 5.34/sqrt(25)] = 2.9 ± 2.2 =
n = 25 students
0.7 to 5.1
If you understand today’s lecture… 11.25, 11.30, 11.32, 11.33, 11.45, 11.46, 11.52, 11.53, 11.55
Objectives: • Apply sampling distribution for one sample mean to confidence intervals. • Apply sampling distribution for difference of two sample means to confidence intervals. • Apply sampling distribution for sample mean of (paired) differences to confidence intervals. • Recognize similarities between one mean and mean of paired differences.