false-positives, p-hacking, statistical power, and ... · false-positives, p-hacking, statistical...
TRANSCRIPT
![Page 1: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/1.jpg)
False-Positives, p-Hacking, Statistical Power, and Evidential Value
Leif D. Nelson University of California, Berkeley
Haas School of Business
Summer Institute June 2014
![Page 2: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/2.jpg)
Who am I?
• Experimental psychologist who studies judgment and decision making. – And has interests in methodological issues
2
![Page 3: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/3.jpg)
Who are you?
• Grad Student vs. Post-Doc vs. Faculty? • Psychology vs. Economics vs. Other? • Have you read any papers that I have written?
– Really? Which ones?
3
[not a rhetorical question]
![Page 4: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/4.jpg)
Things I want you to get out of this
• It is quite easy to get a false-positive finding through p-hacking. (5%)
• Transparent reporting is critical to improving scientific value. (5%)
• It is (very) hard to know how to correctly power studies, but there is no such thing as overpowering. (30%)
• You can learn a lot from a few p-values. (remainder %)
4
![Page 5: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/5.jpg)
This will be most helpful to you if you ask questions.
A discussion will be more interesting
than a lecture.
5
![Page 6: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/6.jpg)
SLIDES ABOUT P-HACKING
6
![Page 7: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/7.jpg)
False-Positives are Easy
• It is common practice in all sciences to report less than everything. – So people only report the good stuff. We call this
p-Hacking. – Accordingly, what we see is too “good” to be true. – We identify six ways in which people do that.
7
![Page 8: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/8.jpg)
Six Ways to p-Hack 1. Stop collecting data once p<.05
2. Analyze many measures, but report only those with p<.05.
3. Collect and analyze many conditions, but only report those with p<.05.
4. Use covariates to get p<.05.
5. Exclude participants to get p<.05.
6. Transform the data to get p<.05.
8
![Page 9: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/9.jpg)
OK, but does that matter very much?
• As a field we have agreed on p<.05. (i.e., a 5% false positive rate).
• If we allow p-hacking, then that false positive rate is actually 61%.
• Conclusion: p-hacking is a potential catastrophe to scientific inference.
9
![Page 10: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/10.jpg)
P-Hacking is Solved Through Transparent Reporting
• Instead of reporting only the good stuff, just report all the stuff.
10
![Page 11: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/11.jpg)
P-Hacking is Solved Through Transparent Reporting
• Solution 1: 1. Report sample size determination. 2. N>20 [note: I will tell you later about how this number is insanely low. Sorry. Our mistake.]
3. List all of your measures. 4. List all of your conditions. 5. If excluding, report without exclusion as well. 6. If covariates, report without.
11
![Page 12: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/12.jpg)
P-Hacking is Solved Through Transparent Reporting
• Solution 2:
12
![Page 13: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/13.jpg)
P-Hacking is Solved Through Transparent Reporting
• Implications: – Exploration is necessary; therefore replication is
as well. – Without p-hacking, fewer significant findings;
therefore fewer papers. – Without p-hacking, need more power; therefore
more participants.
13
![Page 14: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/14.jpg)
SLIDES ABOUT POWER
14
![Page 15: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/15.jpg)
Motivation • With p-hacking,
– statistical power is irrelevant, most studies work • Without p-hacking.
– take power seriously, or most studies fail • Reminder. Power analysis:
• Guess effect size (d) • Set sample size (n)
• Our question: Can we make guessing d easier? • Our answer: • Power analysis is not a practical way to take
power seriously
No
![Page 16: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/16.jpg)
How to guess d?
• Pilot
• Prior literature
• Theory/gut
![Page 17: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/17.jpg)
Some kind words before the bashing
• Pilots: They are good for:
– Do participants get it? – Ceiling effects? – Smooth procedure?
• Kind words end here.
![Page 18: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/18.jpg)
Pilots: useless to set sample size
• Say Pilot: n=20 – �̂� = .2
– �̂� = .5
– �̂� = .8
![Page 19: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/19.jpg)
• In words – Estimates of d have too much sampling error.
• In more interesting words
– Next.
![Page 20: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/20.jpg)
Think of it this way Say in actuality you need n=75 Run Pilot: n=20 What will Pilot say you need? • Pilot 1: “you need n=832” • Pilot 2: “you need n=53” • Pilot 3: “you need n=96” • Pilot 4: “you need n=48” • Pilot 5: “you need n=196” • Pilot 6: “you need n=10” • Pilot 7: “you need n=311”
Thanks Pilot!
![Page 21: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/21.jpg)
n=20 is not enough. How many subjects do you need
to know how many subjects you need?
![Page 22: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/22.jpg)
n=25
n=50
?
Need a Pilot with… n=133
![Page 23: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/23.jpg)
n=50
n=100
?
Need a Pilot with… n=276
![Page 24: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/24.jpg)
n
2n
?
Need: 5n
“Theorem” 1
![Page 25: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/25.jpg)
How to guess d?
• Pilot • Existing findings • Theory/gut
![Page 26: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/26.jpg)
Existing findings
• One hand – Larger samples
• Other hand – Publication bias – More noise
• ≠ sample • ≠ design • ≠ measures
![Page 27: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/27.jpg)
Best (im)possible case scenario
• Would guessing d be reasonable based on other studies?
![Page 28: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/28.jpg)
“Many Labs” Replication Project • Klein et al., • 36 labs • 12 countries • N=6344 • Same 13 experiments
![Page 29: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/29.jpg)
NOISE
How much TV per day?
![Page 30: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/30.jpg)
If 5 identical studies already done • Best guess: n=85 • How sure are you?
Best case scenario gives range 3:1
![Page 31: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/31.jpg)
Reality is massively worse
• Nobody runs 6th identical study. – Moderator: Fluency – Mediator: Perceived-norms – DV: ‘Real’ behavior
• Publication bias
![Page 32: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/32.jpg)
Where to get d from?
• Pilot • Existing findings • Theory/gut
![Page 33: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/33.jpg)
Say you think/feel d~.4
d=.44 ~ .4 n=83 d=.35, ~ .4 n=130 Rounding error 100 more participants
![Page 34: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/34.jpg)
Transition (key) slide
• Guessing d is completely impractical Power analysis is also. • Step back: Problem with underpowering? • Unclear what failure means. • Well, when you put it that way: Let’s power so that we know what failure means.
![Page 35: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/35.jpg)
Existing view
1. Goal: Success
2. Guess d
3. Set n: “80%” success
New View
1. Goal: Learn from results 2. Accept d is unknown If interesting 0 possible If 0 possiblevery small possible 3. Set n: 100% learning Works: keep going Fails: Go Home
![Page 36: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/36.jpg)
What is “Going Big”? A. Limited resources (most cases)
(e.g., lab studies) – What n are you willing to pay for this effect? – Run n
• Fails, too small for me. • Works, keep going, adjust n.
B. ‘Unlimited’ resources (fewest cases) (e.g., Project Implicit, Facebook) – Smallest effect you care about
![Page 37: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/37.jpg)
SLIDES ABOUT P-VALUES
37
![Page 38: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/38.jpg)
Defining Evidential Value
• Statistical significance Single finding: unlikely result of chance
Could be caused by selective reporting rather than chance • Evidential value
Set of significant findings: unlikely result of selective reporting
38
![Page 39: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/39.jpg)
Motivation: we only publish if p<.05
39
![Page 40: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/40.jpg)
Motivation
Nonexisting effects: only see false-positive evidence Existing effects: only see strongest evidence
Published scientific evidence is not
representative of reality.
40
![Page 41: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/41.jpg)
Outline
• Shape • Inference • Demonstration • How often is p-curve wrong? • Effect size estimation • Selecting p-values
41
![Page 42: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/42.jpg)
p-curve’s shape
• Effect does not exist: flat
• Effect exists: right-skew. (more lows than highs)
• Intensely p-hacked: left-skew (more highs than lows)
42
![Page 43: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/43.jpg)
Why flat if null is true?
p-value: prob(result | null is true ). Under the null: • What percent of findings p ≤.30
– 30% • What percent of findings p ≤.05
– 5% • What percent of findings p ≤.04
– 4% • What percent of findings p ≤.03
– 3% Got it. 43
![Page 44: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/44.jpg)
Why more lows than high if true? (right skew)
• Height: men vs. women • N = Philadelphia • What result is more likely?
In Philadelphia, men taller than women (p=.047) (p=.007)
• Not into intuition? Differential convexity of the density function Wallis (Econometrica, 1942)
44
![Page 45: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/45.jpg)
Why left skew with p-hacking?
• Because p-hackers have limited ambition • p=.21 Drop if >2.5 SD
• p=.13 Control for gender
• p=.04 Write Intro
• If we stop p-hacking as soon as p<.05, • Won’t get to p=.02 very often.
45
![Page 46: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/46.jpg)
Plotting Expected P-curves
• Two-sample t-tests. • True effect sizes
– d=0, d=.3, d=.6, d=.9
• p-hacking – No: n=20 – Yes: n={20,25,30,35,40}
46
![Page 47: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/47.jpg)
Nonexisting effect (n=20, d=0)
As many p<.01 as p>.04
47
![Page 48: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/48.jpg)
n=20, d=.3 / power=14% Two p<.01 for every p>.04
48
![Page 49: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/49.jpg)
n=20, d=.6 / power = 45% Five p<.01 per every one p>.04
49
![Page 50: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/50.jpg)
n=20, d=.9 / power=79% Eigtheen p<.01 per every p>.04.
50
![Page 51: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/51.jpg)
Adding p-hacking
n={20,25,30,35,40}
51
![Page 52: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/52.jpg)
d=0
52
![Page 53: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/53.jpg)
d=.3 / original power=14%
53
![Page 54: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/54.jpg)
d=.6 / original-power = 45%
54
![Page 55: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/55.jpg)
d=.9 / original-power=79%
55
![Page 56: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/56.jpg)
p-hacked findings?
Effect Exists?
NO YES
YES
NO
56
![Page 57: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/57.jpg)
Note:
• p-curve does not test if p-hacking happens. (it “always” does)
Rather:
• Whether p-hacking was so intense that it
eliminated evidential value (if any).
57
![Page 58: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/58.jpg)
Outline
• Shape • Inference • Demonstration • How often is p-curve wrong? • Effect-size estimation • Selecting p-values
58
![Page 59: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/59.jpg)
Inference with p-curve
1) Right-skewed? 2) Flatter than studies powered at 33%? 3) Left-skewed?
59
![Page 60: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/60.jpg)
Outline
• Shape • Inference • Demonstration • How often is p-curve wrong? • Effect-size estimation • Selecting p-values
61
![Page 61: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/61.jpg)
Set 1: JPSP with no exclusions nor transformations
62
![Page 62: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/62.jpg)
Set 2: JPSP result reported only with covariate
63
![Page 63: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/63.jpg)
• Next: New Example
64
![Page 64: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/64.jpg)
65
![Page 65: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/65.jpg)
66
![Page 66: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/66.jpg)
Anchoring and WTA
![Page 67: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/67.jpg)
• Bad replication ┐→ Good original
• Was original a false-positive?
68
![Page 68: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/68.jpg)
69
![Page 69: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/69.jpg)
When effect exists, how often does p-curve say “evidential value”
70
Highlights: More power at 5 Certain with 80%
![Page 70: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/70.jpg)
When effect exists, how often does p-curve say “no evidential value”
71
Highlights • P-curve is
‘never’ wrong on properly powered studies.
![Page 71: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/71.jpg)
Broad big picture applications
• Possible uses: – Meta-analyses of X on Y – Meta-analyses of X on anything – Meta-analyses of anything on Y – Relative truth of opposing findings
• X is good for Y, vs • X is bad for Y
– Is this journal, on average, true? – Universities vs. pharmaceuticals
72
![Page 72: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/72.jpg)
Everyday applications (note: 5 p-values can be plenty)
• Reader: Should I read this paper?
• Researcher: Run expensive follow-up?
• Researcher: Explain inconsistent previous finding
• Reviewer: Ask for direct replications?
73
![Page 73: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/73.jpg)
74
![Page 74: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/74.jpg)
• Next. – Simulated meta-analysis, file-drawering studies.
75
![Page 75: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/75.jpg)
76
.72 .75.79
.85.93
0.0
0.2
0.4
0.6
0.8
1.0
d=0 d=.2 d=.4 d=.6 d=.8
Esti
mat
ed e
ffec
t Siz
eCo
hen'
s d
True Effect Size
Predetermined sample size: between N=10 & N=70Fixed effect size: di=d
B
![Page 76: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/76.jpg)
• Next. – Simulated meta-analysis, p-hacking
77
![Page 77: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/77.jpg)
78
![Page 78: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/78.jpg)
• Next. Precision from few studies
79
![Page 79: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/79.jpg)
80
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
5 10 20 30 40 50
Estim
ated
effe
ct si
ze(c
ohen
-d)
Number of studies in p-curve
True d = 0
Sample size of each studyn=20 n=50
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
5 10 20 30 40 50
Estim
ated
effe
ct si
ze(c
ohen
-d)
Number of studies in p-curve
True d = .3
Sample size of each studyn=20 n=50
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
5 10 20 30 40 50
Estim
ated
effe
ct si
ze(c
ohen
-d)
Number of studies in p-curve
True d = .6
Sample size of each studyn=20 n=50
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
5 10 20 30 40 50
Estim
ated
effe
ct si
ze(c
ohen
-d)
Number of studies in p-curve
True d = .9
Sample size of each studyn=20 n=50
![Page 80: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/80.jpg)
• Next. Demonstration 1: Many Labs Replication project – Real study, participants, data – But, see all attempts
81
![Page 81: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/81.jpg)
• 36 labs • 13 “effects”
– Example 1. Sunk Cost (Significant: 50% labs) – Example 2. Asian Disease (86%)
82
![Page 82: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/82.jpg)
83
![Page 83: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/83.jpg)
• Next. Demonstration 2: Choice Overload
84
![Page 84: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/84.jpg)
A demonstration Choice Overload meta-analysis
85
Choice is bad
Choice is good
**
**
![Page 85: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/85.jpg)
86
![Page 86: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/86.jpg)
How to think about p-values
• When a study has lots of statistical power (big effect + big sample), expect to see very small p-values.
• When you see a really big p-value (p = .048), you should be concerned.
• Unexpected thought: When the p-values are really small in the absence of statistical power, you can have different (more unsettling) concerns.
87
![Page 87: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/87.jpg)
I don’t have any more slides, but I have many more thoughts and opinions. Ask.
88
![Page 88: False-Positives, p-Hacking, Statistical Power, and ... · False-Positives, p-Hacking, Statistical Power, and Evidential Value . Leif D. Nelson . University of California, Berkeley](https://reader031.vdocuments.us/reader031/viewer/2022022510/5ada53597f8b9a52528ca922/html5/thumbnails/88.jpg)
89
datacolada.org
p-curve.com