Download - On p-values
![Page 1: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/1.jpg)
On p-valuesMaarten van Smeden
Annual Julius Symposium 2016
![Page 2: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/2.jpg)
About
• statistician by training
• phd (2016): diagnostic research in absence gold standard (JC)
• post-doc: biostatistics / epidemiological methods (JC)
![Page 3: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/3.jpg)
![Page 4: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/4.jpg)
About this workshop
p-value?
ASA statement: why and what?
p-value alternatives?
![Page 5: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/5.jpg)
Go to:
pvalue.presenterswall.nl
![Page 6: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/6.jpg)
Point of departure
skeptical whenever I see a p-value
![Page 7: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/7.jpg)
The term “inference”
![Page 8: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/8.jpg)
p-value?
![Page 9: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/9.jpg)
Formally defined by
![Page 10: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/10.jpg)
The pioneers
Ronald Aylmer Fisher (1890 - 1962)
Jerzy Neyman (1894-1981)
Egon Pearson (1895-1980)
![Page 11: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/11.jpg)
p-value ≥ α
“no effect”
p-value < α
“effect!”
α = .05, unless…
![Page 12: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/12.jpg)
… the p-value fails
“arguably significant” (P = 0.07)
“direction heading to significance” (P = 0.10)
“flirting with conventional levels of significance” (P > 0.1)
“marginally significant” (P ≥ 0.1)
convenient sample from: https://mchankins.wordpress.com/2013/04/21/still-not-significant-2/ listing 509 expressions for non-significant results at α = .05 level (24 October 2016)
![Page 13: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/13.jpg)
+ 23!!! supplementary files
Wasserstein & Lazar (2016) The ASA's Statement on p-Values: Context, Process, and Purpose, The American Statistician, 70:2, 129-133
![Page 14: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/14.jpg)
A few quotes (1)
“The ASA has not previously taken positions on specific matters of statistical practice.”
nb. founded in 1839
“Nothing in the ASA statement is new.”
from the ASA Statement
![Page 15: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/15.jpg)
A few quotes (2)
“… process was lengthier and more controversial than anticipated.”
“… the statement articulates in non-technical terms a few select principles that could improve the conduct or interpretation of
quantitative science, according to widespread consensus in the statistical community."
from the ASA Statement
![Page 16: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/16.jpg)
p-value?why?
![Page 17: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/17.jpg)
Go to
pvalue.presenterswall.nl
![Page 18: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/18.jpg)
Why do we need a statement?
‘“It’s science’s dirtiest secret: The ‘scientific method’ of testing hypotheses by statistical analysis stands on a flimsy
foundation.”’
Quoting Siegfried (2010), Odds Are, It’s Wrong: Science Fails to Face the Shortcomings of Statistics, Science News, 177, 26.
from the ASA Statement: Wasserstein & Lazar (2016) The ASA's Statement on p-Values: Context, Process, and Purpose, The American Statistician, 70:2, 129-133
![Page 19: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/19.jpg)
OK, but why now?
“… highly visible discussions over the last few years”
“The statistical community has been deeply concerned about issues of reproducibility and replicability …”
from the ASA statement
![Page 20: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/20.jpg)
In popular media
http://www.vox.com/2016/3/15/11225162/p-value-simple-definition-hacking (~ 50 million unique visitors monthly)
![Page 21: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/21.jpg)
The social sciences
![Page 22: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/22.jpg)
Drastic measures…
NHST = Null hypothesis significance testing
![Page 23: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/23.jpg)
![Page 24: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/24.jpg)
P-value increasingly central in reporting
From: Chavalarias et al. JAMA. 2016;315(11):1141-1148, doi:10.1001/jama.2016.1952 Using text-mining >1.6 million abstracts
![Page 25: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/25.jpg)
In the large (‘big’) data era
“With a combination of large datasets, confounding, flexibility in analytical choices …, and superimposed selective reporting
bias, using a P < 0.05 threshold to declare “success,” …. means next to nothing.”
From ASA supplementary material, response by Ioannidis.
![Page 26: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/26.jpg)
To summarise: why?
• p-values and the P < .05 rule are at the core of inference in today’s science (social, biomedical, …)
• there is growing concern that these inference are often wrong
• perhaps, if we understand p-values better, we’ll be less often wrong
![Page 27: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/27.jpg)
p-value?why?what?
![Page 28: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/28.jpg)
The statement: 6 principles1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
from the ASA statement
![Page 29: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/29.jpg)
Statistical model?
• every method of statistical inference relies on a web of assumptions which together can be viewed as a ‘statistical model’
• the tested hypothesis is one of these assumptions. Often a ‘zero-effect’ called ‘null hypothesis’
![Page 30: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/30.jpg)
About assumptions
the calculation of p-values always relies on assumptions besides the hypothesis tested. It is easy to ignore/forget those assumptions while analysing.
Your assumptions are your windows on the world. Scrub them off every once in a while, or the light
won't come in.Alan Alda
![Page 31: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/31.jpg)
The statement: 6 principles1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
from the ASA statement
![Page 32: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/32.jpg)
From a probability point of view
p-value*: P(Data|Hypothesis)
is not: P(Hypothesis|Data)
*Somewhat simplified, correct notation would be: P(T(X) ≥ x | Hypothesis)
![Page 33: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/33.jpg)
Does it matter?
P(Death|Handgun)
= 5% to 20%*
P(Handgun|Death)
= 0.028%**
* from New York Times (http://www.nytimes.com article published: 2008/04/03/) ** from CBS StatLine (concerning deaths and registered gun crimes in 2015 in the Netherlands)
![Page 34: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/34.jpg)
If there only was a way…
P(Data|Hypothesis)
P(Hypothesis|Data)
![Page 35: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/35.jpg)
There is…
reverend Thomas Bayes(1702-1761)
P(H|D) = P(D|H) P(H)
P(D)
![Page 36: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/36.jpg)
The statement: 6 principles1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
from the ASA statement
![Page 37: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/37.jpg)
On bright-line rules
“Practices that reduce data analysis or scientific inference to mechanical “bright-line” rules (such as “p < 0.05”) for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making. A conclusion does not immediately become “true” on one side of the divide and “false” on the other.”
from the ASA statement
![Page 38: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/38.jpg)
If p ~ .05
D Colquhoun (2014). An investigation of the false discovery rate and the misinterpretation of p-values. R.Soc.opensci.1:140216.
“If you want to avoid making a fool of yourself very often, do not regard anything greater than p < 0.001 as a demonstration that you have discovered something”
![Page 39: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/39.jpg)
If p > .05
![Page 40: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/40.jpg)
The statement: 6 principles1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
from the ASA statement
![Page 41: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/41.jpg)
The issue of pre-specified hypotheses
From: http://compare-trials.org/ accessed on November 20 2016
![Page 42: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/42.jpg)
Ed Yong (2012). Replication studies: Bad copy, Nature. Data credits to: D Fanelli.
![Page 43: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/43.jpg)
Why is this enormous positivity?
If you torture the data long enough,it will confess to anything
Ronald Coase
besides journal editors requirement for p < .05
![Page 44: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/44.jpg)
Multiple (potential) comparisons
aka- p-hacking- data fishing- data dredging- multiple testing- multiplicity- significance chasing - significance questing - selective inference - etc.
![Page 45: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/45.jpg)
Selective reporting
“Whenever a researcher chooses what to present based on statistical results, valid interpretation of those results is severely compromised if the reader is not informed of the choice and its basis. Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”
from the ASA statement
![Page 46: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/46.jpg)
The statement: 6 principles1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
from the ASA statement
![Page 47: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/47.jpg)
About effect size
• statistical significance does not imply practical importance
• to understand practical importance we need information on the effect size
• Is the p-value a good measure for effect size?
![Page 48: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/48.jpg)
Dance of the p-values
https://www.youtube.com/watch?v=5OL1RqHrZQ8&t=10s
Credits to Professor Geoff Cumming
![Page 49: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/49.jpg)
The statement: 6 principles1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
from the ASA Statement
![Page 50: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/50.jpg)
P-values in isolation
“Researchers should recognize that a p-value without context or other evidence provides limited information. For example, a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively large p-value does not imply evidence in favour of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data. For these reasons, data analysis should not end with the calculation of a p-value when other approaches are appropriate and feasible.”
from the ASA statement
![Page 51: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/51.jpg)
The statement: 6 principles1. P-values can indicate how incompatible the data are with a specified statistical model.
2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
from the ASA statement
![Page 52: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/52.jpg)
Agreement reached?
“you can believe me that had it been any stronger, then all but one of the statisticians would have resigned.”
“If only the rest could have agreed with me, we would have a much stronger statement.”
from SlideShare, by Stephen Senn: P Values and the art of herding cats (accessed on Oct 30 2016)
Stephen Senn, involved in the ASA statement
![Page 53: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/53.jpg)
From a practical point of view
if you work with p-values (derived from the 6 ASA principles):
1. think carefully about the underlying assumptions
2. avoid statements about the truth of the tested hypothesis
3. avoid strong statements about effect based solely on p < .05 or absence of effect based solely on p > .05
4. report no. and sequence of analyses; avoid data torture
5. avoid statements about effect size based on p-value
6. if feasible, use additional information from other inferential tools
![Page 54: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/54.jpg)
p-value?why?what?
p-value alternatives?
![Page 55: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/55.jpg)
Other approaches
• Methods that emphasise estimation rather than testing • confidence intervals • prediction intervals • credible intervals
• Bayesian methods
• Alternative measures of evidence • likelihood ratios • Bayes factors
• Other approaches • Decision-theoretic modelling • False discovery rates
From ASA statement
![Page 56: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/56.jpg)
A too short introduction to Bayesian inference
Remember Bayes?
reverend Thomas Bayes(1702-1761)
![Page 57: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/57.jpg)
Using Bayes theorem
P(θ|D) = P(D|θ) P(θ)
P(D)
P(θ|D) ∝ P(D|θ) P(θ)
“likelihood” “prior distribution”
“posterior distribution”
![Page 58: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/58.jpg)
Rational for Bayesian inference
the posterior distribution (θ|D) is “more informative” than the likelihood (D|θ)
However: “Proponents of the “Bayesian revolution” should be wary of chasing het another chimera: an apparently universal inference procedure. A better path would be to promote both an understanding of various devices in the “statistical toolbox” and informed judgment to select among these.”
Gigerenzer and Marewski (2015), Surrogate Science: The Idol of a Universal Method for Scientific Inference. Journal of Management
![Page 59: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/59.jpg)
p-value?why?what?
p-value alternatives?some final remarks
![Page 60: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/60.jpg)
The words of the pioneer
No scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of
his evidence and his ideas.Ronald Fisher
![Page 61: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/61.jpg)
Many initiatives to improve science…
see: http://www.scienceintransition.nl/english
![Page 62: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/62.jpg)
and reduce waste
~ 85% of all health research is being avoidably “wasted”
see also: http://blogs.bmj.com/bmj/2016/01/14/paul-glasziou-and-iain-chalmers-is-85-of-health-research-really-wasted/, and: Lancet’s 2014 series on increasing value, reducing waste (incl video’s etc.): http://www.thelancet.com/series/research
![Page 63: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/63.jpg)
Conclusion
• statistical inference is inherently difficult; we should avoid making a fool of ourselves too often
• p-values can be useful tools for inference; most often, p-values should not be the ‘star of the inference show’
• bright line rules such as p < .05 give a false sense of scientific objectivity
• like to play around with data? Me too! Think twice before you publish such explorations; if you do, be honest and transparent in reporting
![Page 64: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/64.jpg)
Some random thoughts
• inference is thought as a primarily mathematical or computational problem, it should not.
• we should ban the term “significant” from scientific output for describing effects that are accompanied with p < .05.
• in applied statistics education, we should invest more time in discussing various forms of inference (e.g., Bayesian inference) and their merits and pitfalls
![Page 65: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/65.jpg)
Go to:
pvalue.presenterswall.nl
![Page 66: On p-values](https://reader034.vdocuments.us/reader034/viewer/2022052318/589cdf541a28abf86d8b4d45/html5/thumbnails/66.jpg)
Points for discussion
• is there a need for changing the way we do inference?
• if so, how and what do we change? • education? • journals?
• should we downplay the role of p < .05 in scientific output?