should significance tests be banned? introduction to a special section exploring the pros and cons

3
Should Significance Tests Be Banned? Introduction to a Special Section Exploring the Pros and Cons Author(s): Patrick E. Shrout Source: Psychological Science, Vol. 8, No. 1 (Jan., 1997), pp. 1-2 Published by: Sage Publications, Inc. on behalf of the Association for Psychological Science Stable URL: http://www.jstor.org/stable/40062835 . Accessed: 14/06/2014 15:21 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Sage Publications, Inc. and Association for Psychological Science are collaborating with JSTOR to digitize, preserve and extend access to Psychological Science. http://www.jstor.org This content downloaded from 195.34.78.245 on Sat, 14 Jun 2014 15:21:02 PM All use subject to JSTOR Terms and Conditions

Upload: patrick-e-shrout

Post on 20-Jan-2017

215 views

Category:

Documents


1 download

TRANSCRIPT

Should Significance Tests Be Banned? Introduction to a Special Section Exploring the Pros andConsAuthor(s): Patrick E. ShroutSource: Psychological Science, Vol. 8, No. 1 (Jan., 1997), pp. 1-2Published by: Sage Publications, Inc. on behalf of the Association for Psychological ScienceStable URL: http://www.jstor.org/stable/40062835 .

Accessed: 14/06/2014 15:21

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Sage Publications, Inc. and Association for Psychological Science are collaborating with JSTOR to digitize,preserve and extend access to Psychological Science.

http://www.jstor.org

This content downloaded from 195.34.78.245 on Sat, 14 Jun 2014 15:21:02 PMAll use subject to JSTOR Terms and Conditions

SHOULD SIGNIFICANCE TESTS BE BANNED? Introduction to a Special Section Exploring the Pros and Cons

Patrick E. Shrout New York University

PSYCHOLOGICAL SCIENCE

Special Section

Abstract - Significance testing of null hypotheses is the standard epistemological method for advancing scientific knowledge in psy- chology, even though it has drawbacks and it leads to common infer- ential mistakes. These mistakes include accepting the null hypothesis when it fails to be rejected, automatically interpreting rejected null hypotheses as theoretically meaningful, and failing to consider the likelihood of Type II errors. Although these mistakes have been dis- cussed repeatedly for decades, there is no evidence that the academic discussion has had an impact. A group of methodologists is proposing a new approach: simply ban significance tests in psychology journals. The impact of a similar ban in public-health and epidemiology jour- nals is reported.

For decades, methodologists and statisticians have talked about the drawbacks of null-hypothesis testing for making inferences in behav- ioral and medical sciences. These drawbacks include common mis- takes such as accepting the null hypothesis when it fails to be rejected, automatically interpreting rejected null hypotheses as theoretically meaningful, and failing to consider the likelihood of Type II errors

(Bakan, 1966; Carver, 1978; Cohen, 1994; Henkel & Morrison, 1970; Rothman, 1986; Rozeboom, 1960; Schmidt, 1996). In the December 1996 issue of Current Directions in Psychological Science, this con- troversy is discussed by Loftus.

Although the problems of significance tests are well known, psy- chology journals continue to encourage use of such tests and to pub- lish new examples of abuse. Significance testing has become a habit that is difficult to break. Textbooks emphasize testing, professors socialize their students in the use and abuse of these tests, and re- viewers ask that authors report more rather than fewer tests.

Many people in our field see significance testing as providing the

machinery to convert observations about behavior into apparently objective scientific conclusions. Reviewers and editors use signifi- cance testing to help decide which articles are allocated precious pages in the top journals. Purveyors of statistical software know that

significance tests are demanded by the market. Alternatives to sig- nificance tests have not been in great demand.

To break this impasse, a group of methodologists is asking the American Psychological Association (APA) to consider banning sig- nificance tests altogether in its journals. This proposal is currently receiving serious consideration from APA's Board of Scientific Af- fairs. Some people believe a ban is totally justified on the basis of the

shortcomings of significance tests in general. Others believe it would be good intellectual medicine for the field. Others are totally opposed.

A ban of significance tests would not be completely novel. A decade ago, a virtual ban of significance tests was instituted in the American Journal of Public Health, the top journal in that field. The

In this Special Section, PS presents articles based on a symposium at the 1996 convention of the American Psychological Society. Readers are assured (and some will be dismayed) that statistical

significance testing will not be banned during the tenure of the current Editor.

ban was not discussed in forums such as this, but was announced in revise-and-resubmit letters sent to would-be authors. The editor be- hind this move was Kenneth Rothman, an epidemiologist at the Uni-

versity of Massachusetts. His instructions to authors have been cited

by Fleiss (1986, p. 559). They went something like this:

All references to statistical hypothesis testing and statistical significance should be removed from the paper. I ask that you delete p values as well as

comments about statistical significance. If you do not agree with my standards

(concerning the inappropriateness of significance tests), you should feel free to

argue the point, or simply ignore what you may consider to be my misguided view, by publishing elsewhere. As editor, however, I can hardly be expected to

accept papers that violate the scientific principle that I espouse.

Not surprisingly, there was a vigorous exchange about the ban's merits and the way it was implemented (DeRouen, 1987; Fleiss, 1986; Lachenbruch et al., 1987; Poole, 1987; Savitz, 1987; Thompson, 1987; Walker, 1986). One of the complaints was that, under the ban, editors chose to publish some results that had wide confidence inter- vals - results that would not have been published under traditional

significance-testing norms.

Meanwhile, the statistics courses at major schools of public health

scrambled to train their students how to publish without significance tests. Confidence intervals were given more time in graduate courses, and significance tests considerably less time. Special symposia on

significance testing educated senior faculty as well as the graduate students.

When Rothman stepped down as associated editor after about 2

years, the ban was lifted. However, in the postban period, one sees few examples of significance-test abuse in the American Journal of Public Health. Confidence intervals, although no longer mandatory by editorial fiat, are evident in most articles. New statistical software for

epidemiologists has made reporting most confidence intervals as easy as reporting p values.

Does psychology need significance-test shock therapy? Readers can form their own impression as they digest this Special Section.

REFERENCES

Bakan, D. (1966). The test of significance in psychological research. Psychological Bul- letin, 66, 423-437.

Carver, R.P. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378-399.

Cohen, J. (1994). The earch is round (p < .05). American Psychologist, 49, 997-1003.

Address correspondence to Patrick E. Shrout, Department of Psychology, New York University, 6 Washington PI. #550, New York, NY 10003; e-mail:

[email protected].

VOL. 8, NO. 1, JANUARY 1997 Copyright © 1997 American Psychological Society 1

This content downloaded from 195.34.78.245 on Sat, 14 Jun 2014 15:21:02 PMAll use subject to JSTOR Terms and Conditions

PSYCHOLOGICAL SCIENCE

Introduction to Significance Test Special Section

DeRouen, T.A. (1987). Comment on statistical testing and confidence intervals. American Journal of Public Health, 77, 237.

Fleiss, J.L. (1986). Significance tests have a role in epidemiologic research: Reactions to A. M. Walker. American Journal of Public Health, 76, 559-560.

Henkel, R.E., & Morrison, D.E. (Ed.). (1970). The significance test controversy. London: Butterworth.

Lachenbruch, P.A., Clark, V.A., Cumberland, W.G., Chang, P.C., Afifi, A.A., Rack, V.F., & Elashoff, R.M. (1987). Comment on statistical testing and confidence intervals. American Journal of Public Health, 77, 237.

Loftus, G.R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5, 161-171.

Poole, C. (1987). Beyond the confidence interval. American Journal of Public Health, 77, 195-199.

Rothman, K.J. (1986). Modern epidemiology. Boston: Little, Brown.

Rozeboom, W.W. (1960). The fallacy of the null-hypothesis significance test. Psycho- logical Bulletin, 57, 416-428.

Savitz, D. (1987). Comment on statistical testing and confidence intervals. American Journal of Public Health, 77, 237-238.

Schmidt, F.L. (1996). Statistical significance testing and cumulative knowledge in psy- chology: Implications for training of researchers. Psychological Methods, I, 115- 129.

Thompson, W.D. (1987). Statistical criteria in the interpretation of epidemiologic data. American Journal of Public Health, 77, 191-194.

Walker, A.M. (1986). Reporting the results of epidemiologic studies. American Journal of Public Health, 76, 556-558.

2 VOL. 8, NO. 1, JANUARY 1997

This content downloaded from 195.34.78.245 on Sat, 14 Jun 2014 15:21:02 PMAll use subject to JSTOR Terms and Conditions