Download - Review: Large Sample Confidence Intervals
![Page 1: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/1.jpg)
Review:Large Sample Confidence Intervals
• 1- confidence interval for a mean:x +/- z/2 s/sqrt(n)
• 1- confidence interval for a proportion:
p +/- z/2 p(1-p)/sqrt(n)
• 1- confidence interval for the difference between two means:
x1 – x2 +/- z/2 sqrt(s21/n1+s2
2/n2)
n >30 or so for means, np and n(1-p) both > 5 for proportions
![Page 2: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/2.jpg)
In General:
Estimate (that is normally distributed via the Central
Limit Theorem)+/- standard deviation
Z/2 of estimate ( )This gives an interval: (Lower Bound , Upper Bound)
Interpretation: This is a plausible range for the true value of the number that we’re estimating. is a tuning parameter
for level of plausibility: smaller = more conservative estimate.
![Page 3: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/3.jpg)
Large Sample Confidence Intervals
• 1- confidence interval for difference between two proportions:
p1-p2 +/- z/2 sqrt[(p1(1-p1)/n1)+(p2(1-p2)/n2)]
np and n(1-p) > 5 for all p’s…
![Page 4: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/4.jpg)
Designing an Experiment and Choosing a Sample Size
• Example: Compare the shrinkage in a tumor due to a “new” cancer treatment relative to standard treatment
• 100 patients randomly assigned to “new” treatment or standard treatment
xinew = reduction in tumor size for person i under new treatment
xjstd = reduction in tumor size for person j under std treatment
xnew and s2new
xstd and s2std
Mean and sample variance of the changes in size for the new and standard treatments
![Page 5: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/5.jpg)
Suppose the data are:
xnew = 25.3
snew = 2.0
xstd = 24.8
sstd = 2.3
95% Confidence Interval for difference:
x1 – x2 +/- z/2 sqrt(s21/n1+s2
2/n2)
= 0.5 +/- 0.84
What can we conclude?
![Page 6: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/6.jpg)
• There’s no difference?
• Can’t see a difference?
• There’s a difference, but it’s too small to care about?
![Page 7: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/7.jpg)
There is a difference between:
• Can’t see a difference
• There’s no difference
Situation forCancer example
(In cancer experiment, we can assume we care about small differences.)
![Page 8: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/8.jpg)
• Can’t see a difference (that is big enough to care about) = wasted experiment
• AVOID / PREVENT THE WASTE AND ASSOCIATED TEARS
USE SAMPLE SIZE PLANNING
![Page 9: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/9.jpg)
Sample Size Planning
• Length of a 1- level confidence interval is:
“2 z/2 std deviation of estimate”
2z/2s/sqrt(n)
2z/2p(1-p)/sqrt(n)
2z/2sqrt((s21/n1)+(s2
2/n2))
2z/2sqrt[(p1(1-p1)/n1)+(p2(1-p2)/n2)]
![Page 10: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/10.jpg)
1. Suppose we want a 95% confidence interval no wider than W units.
2. is fixed. Assume a value for the standard deviation (or variance) of the estimator.
3. Solve for an n (or n1 and n2) so that the width is less than W units.
4. When there are two sample sizes (n1 and n2), we often assume that n1 = n2.
![Page 11: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/11.jpg)
Cancer example
• Let W = 0.1. Want 95% CI for difference between means with width less than W.
• Suppose s2new = s2
std = 6 (conservative
guess)
W > 2z/2sqrt((s2new/n1)+(s2
std/n2))
0.1 > 2(1.96)sqrt(6/n + 6/n)
0.1 > 3.92sqrt(12/n)
0.01 > (3.922)12/n
n > 18439.68 (each group…)
Book’s B = our W/2
![Page 12: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/12.jpg)
Hypothesis testing and p-values (Chapter 9)
We used confidence intervals in two ways:1. To determine an interval of plausible values for
the quantity that we estimate. Level of plausibility is determined by 1-. 90% (=0.1) is less conservative than 95% (=0.05) is less conservative than 99% (=0.01)...
2. To see if a certain value is plausible in light of the data:
If that value was not in the interval, it is not plausible (at certain level of confidence). Zero is a common certain value to test, but not the only one.
Hypothesis tests address the second use directly
![Page 13: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/13.jpg)
Example: Dietary Folate• Data from the Framingham Heart Study
0 200 400 600 800 1000 1200
0
20
40
60
80
100
Dietary Folate (micrograms / day, calorie adjusted to 2000 calorie diet)
Count
n = 333 Elderly Men
Mean = x = 336.4
Std Dev = s = 193.4
Can we conclude that the mean is greater than 300 at 5% significance? (same as 95% confidence)
![Page 14: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/14.jpg)
Five Components of the Hypothesis test:
1. Null Hypothesis = “What we want to disprove”= “H0” = “H not”= Mean dietary folate in the population
represented by these data is <= 300.
= <= 3002. Alternative Hypothesis
= “What we want to prove”= “HA”= Mean dietary folate in the population
represented by these data is > 300.
= > 300
![Page 15: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/15.jpg)
3. Test Statistic
To test about a mean with a large sample test, the statistic is z = (x – )/(s/sqrt(n))
(i.e. How many standard deviations (of X) away from the hypothesized mean is the observed x?)
4. Significance Level of Test, Rejection Region, and P-value
5. Conclusion
Reject H0 and conclude HA if test stat is in rejection region. Otherwise, “fail to reject” (not same as concluding H0 – can only cite a “lack of evidence”
(think “innocent until proven guilty”)
(Equivalently, reject H0 if p-value is less than .)
Next page
![Page 16: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/16.jpg)
• Significance Level: =1% or 5% or 10%... (smaller is more conservative) (Significance = 1-Confidence)
• Rejection Region:– Reject if test statistic in rejection region.– Rejection region is set by:
• Assume H0 is true “at the boundary”.• Rejection region is set so that the probability of seeing the observed test
statistic or something further from the null hypothesis is less than or equal to
• P-value– Assume H0 is true “at the boundary”.– P-value is the probability of seeing the observed test statistic or
something further from the null hypothesis.– = “observed level of significance”
Note that you reject if the p-value is less than .(Small p-values mean “more observed significance”)
![Page 17: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/17.jpg)
Example:• H0: <=300, HA: >300• z (x-)/(s/sqrt(n))
= (336.4 – 300)/(193.4/sqrt(333))= 3.43
• Significance level = 0.05• When H0 is true, Z~N(0,1). As a result, the cutoff
is z0.05=1.645. (Pr(Z>1.645) = 0.05.)• P-value = Pr(Z>3.43 when true mean is 300) =
0.0003 • Reject. Mean is greater than 300.• Would you reject at significance level 0.0001?
![Page 18: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/18.jpg)
Picture
1.645Test Statisistic
De
nsi
ty
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
Rejection region
Distribution ofZ = (X – 300)/(193.4/sqrt(333))when true mean is 300.
Area to right of 1.645=0.05 = sig level
3.43Area to right of 3.43=0.0003 = p-value
Test statistic
ObservedTest Statistic
![Page 19: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/19.jpg)
One Sided versus Two Sided Tests
• Previous test was “one sided” since we’d only reject if the test statistic is far enough to “one side” (ie. If z > z0.05)
• Two sided tests are more common (my opinion):
H0: =0, HA: does not equal 0
![Page 20: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/20.jpg)
Two Sided Tests (cntd)Test Statistic (large sample test of mean)
z = (x – )/(s/sqrt(n))
Rejection Region:
reject H0 at signficance level if |z|>z/2
i.e. if z>z/2 or z<-z/2
Note that this “doubles” p-values. See next example.
![Page 21: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/21.jpg)
Example:• H0: =300, HA: doesn’t equal 300• z=(x-)/(s/sqrt(n))
= (336.4 – 300)/(193.4/sqrt(333))= 3.43
• Significance level = 0.05• When H0 is true, Z~N(0,1). As a result, the cutoff
is z0.025=1.96. (Pr(|Z|>1.96)=2*Pr(Z>1.96)=0.05• P-value = Pr(|Z|>3.43 when true mean is 300) =
Pr(Z>3.43) + Pr(Z<-3.43) = 2(0.0003)=0.0006• Reject. Mean is not equal to 300.• Would you reject at significance level 0.0005?
![Page 22: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/22.jpg)
Picture
1.96Test Statisistic
De
nsi
ty
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
Rejection region
Distribution ofZ = (X – 300)/(193.4/sqrt(333))when true mean is 300.
3.43Area to right of 3.43=0.0003
Test statistic
-3.43
Area to left of -3.43=0.0003
1.96
Rejection region
Sig level = area to right of 1.96 + area to the left of -1.96 = 0.05=
Pvalue=0.0006=Pr(|Z|>3.43)
![Page 23: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/23.jpg)
Power and Type 1 and Type 2 Errors
Truth
H0 True
HA True
Action
Fail to Reject H0 Reject H0
correct
correct
Type 1error
Type 2error
Significance level = =Pr( Making type 1 error )
Power = 1–Pr( Making type 2 error )
![Page 24: Review: Large Sample Confidence Intervals](https://reader033.vdocuments.us/reader033/viewer/2022042519/56814027550346895dab887f/html5/thumbnails/24.jpg)
• Assuming H0 is true, what’s the probability of making a type I error?
• H0 is true means true mean is 0.• This means that the test statistic has a
N(0,1) distribution. • Type I error means reject which means |
test statistic| is greater than z/2.• This has probability .