A special edition of The American Statistician published on 20 March presents over 40 papers from “forward-looking statisticians” on “Statistical inference in the 21st century: a world beyond P < 0.05” (1).
Statistical significance has become an infallible doctrine of scientific research. However, many scientists and statisticians argue that long-held beliefs about statistical significance have, in fact, harmed the scientific community.
In hypothesis testing, the p-value gives the probability or likelihood that the null hypothesis is true and is frequently used as a measure of “statistical significance.” This idea tends to be too rigidly enforced, often to the detriment of science. It can mean that statistically significant results are inflated while non-significant results are downgraded, even though they may still provide important―potentially crucial―insights.
The series of thought-provoking articles makes an interesting read. The overall consensus is that the thoughtless practice of testing statistical significance leads to “hyped claims” while potentially important findings are sometimes dismissed. Clearly, changes in the scientific reporting process are needed to put an end to journals publishing misleading results.
One of the articles in the issue calls for the term “statistically significant” to be completely abandoned, and the authors received the backing of 48 other statisticians (2). The problem is not with the use of p-values, but rather the misuse and misinterpretation, argues another paper (3). Prof Todd Kuffner and Prof Stephen Walker suggest that informal definitions and loose interpretations are what have led to all this controversy and confusion.
In a separate commentary published on 20 March in Nature, Prof Valentin Amrhein, Prof Sander Greenland, and Prof Blake McShane also argue for an end to the blind adherence to statistical significance. Amrhein and his co-authors received endorsements for their commentary from more than 800 signatories include statisticians, clinical and medical researchers, biologists, and psychologists in more than 50 countries.
They argue that all results should be published and pose an interesting question: “How do statistics so often lead scientists to deny differences that those not educated in statistics can plainly see?“
The experts offer some useful advice on how researchers can avoid “falling prey to these misconceptions” (1):
- Do not conclude there is “no difference” just because a P value is larger than 0.05 or some other predefined threshold.
- Do not conclude that two studies are conflicting simply because one has a statistically significant result and the other does not.
- Do not believe an association or effect exists or is absent just because it was statistically significant or not.
- Do not make conclusions on scientific or practical importance based on statistical significance (or lack thereof).
The scientists and statisticians are not suggesting that we completely abandon statistical significance altogether― it can still be useful in many ways―however, the use of p-values “to decide whether a result refutes or supports a scientific hypothesis” should end.
(1) Wasserstein, R.L., Schirm, A.L., and Lazar, N.A. Moving to a World Beyond “p < 0.05.” The American statistician (2019). DOI: 10.1080/00031305.2019.1583913
(2) Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires. The American statistician (2019). DOI: 10.1080/00031305.2018.1543616
(3) Kuffner, T.A. and Walker, S. G. Why are p-values controversial? The American statistician (2019). DOI: 10.1080/00031305.2016.1277161
(4) Amrhein, V., Greenland, S., and McShane, B. Scientists rise up against statistical significance. Nature (2019). DOI: 10.1038/d41586-019-00857-9
According to Google, this article (Dunphy, 20.03.2019) seems to be very credible. I fact, when I googled “Moving to a World Beyond “p < 0.05"” today, it came up as the first hit that was something more than a link to The American Statistician.
As I see it, a main reason for the widespread misuse of p-values, is the misunderstanding of what a p-value is.
This misunderstanding is present here (Dunphy, 20.03.2019) in one of the first sentences: “In hypothesis testing, the p-value gives the probability or likelihood that the null hypothesis is true and is frequently used as a measure of "statistical significance".”
This topic is mentioned at the beginning of the editorial of The American Statistician (https://doi.org/10.1080/00031305.2019.1583913 ): “Don’t believe that your p-value gives the probability that chance alone produced the observed association or effect or the probability that your test hypothesis is true.” Also see Wikipedia (https://en.wikipedia.org/wiki/P-value , visited 22.03.2019): “Another concern is that the p-value is often misunderstood as being the probability that the null hypothesis is true.”
It will help a lot if all scientists learn and understand what a p-value is.