![Page 1: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/1.jpg)
Understanding P-values and Confidence Intervals
Thomas B. Newman, MD, MPH
20 Nov 08
![Page 2: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/2.jpg)
Announcements Optional reading about P-values and
Confidence Intervals on the website Exam questions due Monday 11/24/08 5:00
PM Next week (11/27) is Thanksgiving Following week Physicians and Probability
(Chapter 12) and Course Review Final exam to be distributed in SECTION 12/4
and posted on web Exam due 12/11 8:45 AM Key will be posted shortly thereafter
![Page 3: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/3.jpg)
Overview Introduction and justification What P-values and Confidence Intervals don’t
mean What they do mean: analogy between
diagnostic tests and clinical researc Useful confidence interval tips
– CI for “negative” studies; absolute vs. relative risk
– Confidence intervals for small numerators
![Page 4: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/4.jpg)
Why cover this material here?
P-values and confidence intervals are ubiquitous in clinical research
Widely misunderstood and mistaught Pedagogical argument:
– Is it important?– Can you handle it?
![Page 5: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/5.jpg)
Example: Douglas Altman Definition of 95% Confidence Intervals* "A strictly correct definition of a 95% CI is,
somewhat opaquely, that 95% of such intervals will contain the true population value.
“Little is lost by the less pure interpretation of the CI as the range of values within which we can be 95% sure that the population value lies.”
*Quoted in: Guyatt, G., D. Rennie, et al. (2002). Users' guides to the medical literature : essentials of evidence-based clinical practice. Chicago, IL, AMA Press.
![Page 6: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/6.jpg)
Understanding P-values and confidence intervals is important because It explains things which otherwise do
not make sense, e.g. the need to state hypotheses in advance and correction for multiple hypothesis testing
You will be using them all the time You are future leaders in clinical
research
![Page 7: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/7.jpg)
You can handle it because
We have already covered the important concepts at length earlier in this course– Prior probability– Posterior probability– What you thought before + new
information = what you think now We will support you through the process
![Page 8: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/8.jpg)
Review of traditional statistical significance testing
State null (Ho) and alternative (Ha) hypotheses
Choose α Calculate value of test statistic from
your data Calculate P- value from test statistic If P-value < α, reject Ho
![Page 9: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/9.jpg)
Problem: Traditional statistical significance testing
has led to widespread misinterpretation of P-values
![Page 10: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/10.jpg)
What P-values don’t mean
If the P-value is 0.05, there is a 95% probability that…– The results did not occur by chance– The null hypothesis is false– There really is a difference between the
groups
![Page 11: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/11.jpg)
So if P = 0.05, what IS there a 95% probability of?
![Page 12: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/12.jpg)
White board: 2x2 tables and “false positive confusion” Analogy with diagnostic tests (This is covered step-by-step in the
course book.)
![Page 13: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/13.jpg)
Analogy between diagnostic tests and research studies
Diagnostic Test Research StudyAbsence of DiseasePresence of diseaseSeverity of disease in the diseased groupCutoff for distinguishing positive and negative resultsTest result
![Page 14: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/14.jpg)
Analogy between diagnostic tests and research studies
Diagnostic Test Research Study
Negative result (test within normal limits)Positive resultSensitivityFalse positive rate (1- specificity)Prior probability of disease (of a given severity)Posterior probability of disease, given test result
![Page 15: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/15.jpg)
Extending the Analogy
Intentionally ordered tests and hypotheses stated in advance
Multiple tests and multiple hypotheses Laboratory error and bias Alternative diagnoses and confounding
![Page 16: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/16.jpg)
Bonferroni Inequality: If we do k different tests,
each with significance level α, the probability that one or more will be significant is less than or equal to k α
Correction: If we test k different hypotheses and want our total Type 1 error rate to be no more than alpha, then we should reject H0 only if P < α/k
![Page 17: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/17.jpg)
Derivation
Let A & B = probability of a Type 1 error for hypotheses A and B
P(A or B) = P(A) + P(B) – P(A & B) Under Ho, P(A) = P(B) = α So P(A or B) = α + α - P(A & B) = 2α - P(A & B). Of course, it is possible to falsely reject 2 different null
hypotheses, so P(A & B) > 0. Therefore, the probability of falsely rejecting either of the null hypotheses must be less than 2α.
Note that often A & B are not independent, in which case Bonferroni will be even more excessively conservative
![Page 18: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/18.jpg)
Problems with Bonferroni correction
Overly conservative (especially when hypotheses are not independent)
Maintains specificity at the expense of sensitivity
Does not take prior probability into account
Not clear when to use it BUT can be useful if results still
significant
![Page 19: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/19.jpg)
CONFIDENCE INTERVALS
![Page 20: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/20.jpg)
What Confidence Intervals don’t mean
There is a 95% chance that the true value is within the interval
If you conclude that the true value is within the interval you have a 95% chance of being right
The range of values within which we can be 95% sure that the population value lies
![Page 21: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/21.jpg)
One source of confusion: Statistical “confidence”
(Some) statisticians say: “You can be 95% confident that the population value is in the interval.”
This is NOT the same as “There is a 95% probability that the population value is in the interval.”
“Confidence” is tautologously defined by statisticians as what you get from a confidence interval
![Page 22: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/22.jpg)
Illustration If a 95% CI has a 95% chance of containing
the true value, then a 90% CI should have a 90% chance and a 40% CI should have a 40% chance.
Study: 4 deaths in 10 subjects in each group RR= 1.0 (95% CI: 0.34 to 2.9) 40% CI: 0.75 to 1.33 Conclude from this study that there is 60%
chance that the true RR is <0.75 or > 1.33?
![Page 23: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/23.jpg)
Confidence Intervals apply to a Process Consider a bag with 19 white and 1 pink
grapefruit The process of selecting a grapefruit at
random has a 95% probability of yielding a white one
But once I’ve selected one, does it still have a 95% chance of being white?
You may have prior knowledge that changes the probability (e.g., pink grapefruit have thinner peel are denser, etc.)
![Page 24: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/24.jpg)
Confidence Intervals for negative studies: 5 levels of sophistication
Example 1: Oral amoxicillin to treat possible occult bacteremia in febrile children*– Randomized, double-blind trial– 3-36 month old children with T≥ 39º C (N=
955)– Treatment: Amox 125 mg/tid (≤ 10 kg) or
250 mg tid (> 10 kg)– Outcome: major infectious morbidity
*Jaffe et al., New Engl J Med 1987;317:1175-80
![Page 25: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/25.jpg)
Amoxicillin for possible occult bacteremia 2: Results Bacteremia in 19/507 (3.7%) with amox,
vs 8/448 (1.8%) with placebo (P=0.07) “Major Infectious Morbidity” 2/19
(10.5%) with amox vs 1/8 (12.5%) with placebo (P = 0.9)
Conclusion: “Data do not support routine use of standard doses of amoxicillin…”
![Page 26: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/26.jpg)
5 levels of sophistication Level 1: P > 0.05 = treatment does not
work Level 2: Look at power for study.
(Authors reported power = 0.24 for OR=4. Therefore, study underpowered and negative study uninformative.)
![Page 27: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/27.jpg)
5 levels of sophistication, cont’d Level 3: Look at 95% CI! Authors calculated OR= 1.2 (95% CI:
0.02 to 30.4)– This is based on 1/8 (12.5%) with placebo
vs 2/19 (10.5%) with amox– (They put placebo on top)– (Silly to use OR)
With amox on top, RR = 0.84 (95% CI: 0.09 to 8.0)
This was level of TBN in letter to the editor (1987)
![Page 28: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/28.jpg)
5 levels of sophistication, cont’d Level 4: Make sure you do an “intention
to treat” analysis! – It is not OK to restrict attention to
bacteremic patients– So it should be 2/507 (0.39%) with amox vs
1/448 (0.22%) with placebo– RR= 1.8 (95% CI: 0.05 to 6.2)
![Page 29: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/29.jpg)
Level 5: the clinically relevant quantity is the Absolute Risk Reduction (ARR)!
2/507 (0.39%) with amox vs 1/448 (0.22%) with placebo
ARR = −0.17% {amoxicillin worse} 95% CI (−0.9% {harm} to +0.5% {benefit}) Therefore, LOWER limit of 95% CI for benefit
(I.e., best case) is NNT= 1/0.5% = 200 So this study suggests need to treat ≥ 200
children to prevent “Major Infectious Morbidity” in one
![Page 30: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/30.jpg)
Stata output. csi 2 1 505 447
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 2 1 | 3
Noncases | 505 447 | 952
-----------------+------------------------+----------
Total | 507 448 | 955
| |
Risk | .0039448 .0022321 | .0031414
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .0017126 | -.005278 .0087032
Risk ratio | 1.767258 | .1607894 19.42418
Attr. frac. ex. | .4341518 | -5.219315 .9485178
Attr. frac. pop | .2894345 |
+-----------------------------------------------
chi2(1) = 0.22 Pr>chi2 = 0.6369
![Page 31: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/31.jpg)
Example 2: Pyelonephritis and new renal scarring in the International Reflux Study in Children* RCT of ureteral reimplantation vs prophylactic
antibiotics for children with vesicoureteral reflux
Overall result: surgery group fewer episodes of pyelonephritis (8% vs 22%; NNT = 7; P < 0.05) but more new scarring (31% vs 22%; P = .4)
This raises questions about whether new scarring is caused by pyelonephritis
Weiss et al. J Urol 1992; 148:1667-73
![Page 32: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/32.jpg)
Within groups no association between new pyelo and new scarring
Trend goes in the OPPOSITE direction
RR=0.28; 95% CI (0.09-1.32)Weiss, J Urol 1992:148;1672
New Scarring
No New Scarring
N %
New pyelo 2 18 20 10%
No new pyelo
28 58 86 29%
Total 30 76 106
![Page 33: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/33.jpg)
Stata output to get 95% CI:
. csi 2 18 28 58 | Exposed Unexposed | Total-----------------+------------------------+------------ Cases | 2 18 | 20 Noncases | 28 58 | 86-----------------+------------------------+------------ Total | 30 76 | 106 | | Risk | .0666667 .2368421 | .1886792 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Risk difference | -.1701754 | -.3009557 -.0393952 Risk ratio | .2814815 | .069523 1.13965 Prev. frac. ex. | .7185185 | -.1396499 .930477 Prev. frac. pop | .2033543 | +----------------------------------------- chi2(1) = 4.07 Pr>chi2 = 0.0437
![Page 34: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/34.jpg)
Conclusions
No evidence that new pyelonephritis causes scarring
Some evidence that it does not P-values and confidence intervals are approximate,
especially for small sample sizes There is nothing magical about 0.05
Key concept: calculate 95% CI for negative studies– ARR for clinical questions (less generalizable)
– RR for etiologic questions
![Page 35: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/35.jpg)
Confidence intervals for small numerators
Observed numerator
Approximate Numerator for
Upper Limit of 95% CI
0 31 52 73 94 10
![Page 36: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/36.jpg)
When P-values and Confidence Intervals Disagree
Usually P < 0.05 means 95% CI excludes null value. But both 95% CI and P-values are based on
approximations, so this may not be the case Illustrated by IRSC slide above If you want 95% CI and P- values to agree, use “test-
based” confidence intervals – see next slide
![Page 37: Understanding P-values and Confidence Intervals Thomas B. Newman, MD, MPH 20 Nov 08](https://reader035.vdocuments.us/reader035/viewer/2022062307/551b78a1550346a6148b5339/html5/thumbnails/37.jpg)
Alternative Stata output: Test-based CI
.
. csi 2 18 28 58,tb
| Exposed Unexposed | Total-----------------+-----------------------+------------ Cases | 2 18 | 20 Noncases | 28 58 | 86-----------------+-----------------------+------------ Total | 30 76 | 106 | | Risk | .0666667 .2368421 | .1886792 | | | Point estimate | [95% Conf. Interval] |-----------------------+------------------------ Risk difference | -.1701754 | -.3363063 -.0040446 (tb) Risk ratio | .2814815 | .0816554 .9703199 (tb) Prev. frac. ex. | .7185185 | .0296801 .9183446 (tb) Prev. frac. pop | .2033543 | +------------------------------------------------- chi2(1) = 4.07 Pr>chi2 = 0.0437