60762096

Upload: arkantos-tirba

Post on 14-Oct-2015

8 views

Category:

Documents


0 download

TRANSCRIPT

  • ARTICLE

    Precognitionor PathologicalScience?An Analysis of Daryl Bem's Controversial"Feeling the Future " Paper

    BY NICOLAS GAUVRIT

    RENOWNED EXPERIMENTAL PSYCHOLOGIST DARYL BEMrecently published an astonishing article in the pres-tigious journal of Personality and Social Psychology (anonline version is available now: http://dbeni.ws/FeelingFuture.pdf) entitled "Feeling the Future: Ex-perimental Evidence for Anomalous Retroactive In-fluences on Cognition and Affect."' In this paper,Bem claims to prove the existence of precognition invarious situations through a series of nine experi-ments. Well before its publication, this article was al-ready being discussed, criticized, or acclaimed in themedia (e.g.. The New York Times, January lo, 2011),and was even featured on the Colbert Report.

    In most of these experiments, subjects wereseated in front of a computer. The screen first showeda picture of two curtains side by side. Subjects werethen asked to guess behind which curtain a picturewas "hidden". The target pictures to be found were ei-ther positive (e.g., smiling people), negative (e.g., caraccidents), neutral (e.g., a building), or "erotic" (e.g.,man and woman having sexual intercourse). In fact,no picture was hidden. Subjects first selected one sideof the screen. The computer then used a randommethod to determine on which side to display the pic-ture. Bem claims that the subjects knew in advancewhich side would be randomly chosen, but since thishappened after they made their selection it is an ex-ample of precognition or backwards time causality.

    The basic idea behind Bem's experiments is toproduce reversed forms of classical psychological ef-fects. For instance, a subliminal (or non-subliminal)presentation of a stimulus is known to facilitate therecognition of a subsequent related stimulus, an ef-fect called "priming"subjects are told to indicate asquickly as possible whether a series of letters (called

    a "target") displayed on a screen is an English wordor not. For instance, TABLE may appear, and subjectswould say it is a word. Unbeknownst to the subjects,a prior subliminal presentation of a word, either"CHAIR" or "APPLE", is briefly displayed on screen.The response time (delay in responding) is signifi-cantly shorter when subjects are exposed to "CHAIR"an item related to the targetthan when they areexposed to "APPLE".

    In Experiments 3 and 4, Bem conducted aretroactive version of an affective priming experi-ment: a target picture is first displayed, for whichsubjects have to say if the picture is "pleasant" or"unpleasant". A random priming word (either pleas-ant or unpleasant) appears afterwards on the screen.

    Overall, four psychological effects (priming, fa-cilitation of recognition, habituation, approach-avoidance) are reversed in time in Bem's nineexperiments. Except for retroactive induction ofboredom (Experiment 7), Bem claims that he foundsignificant statistical data in support of precogni-tion. However, criticisms have already been madeconcerning Bem's paper, based on both method-ological and statisticgJ viewpoints.

    James Aicock's Methodoiogicai CritiqueIn an article in the March/April 2011 issue ot Skepti-cal Inquirer, James Alcock highlights the extremefuzziness of Bem's methodology.^ Bem describes indetail a random generator based on physical processes,known to be more reliable than any pseudo-randomsoftware. He is indeed being cautious using such agenerator. However, for some unexplained reason, notall his experiments are carried out using it.

    Experiment 1 (detection of erotic stimuli) begins

    54 SKEPTIC MAGAZINE volume 16 number 3 2011

  • with each subject viewing a series of 12 erotic, 12 pos-itive, and 12 negative pictures.. .but only for the 40first participants. The last 60 subjects were exposedto 18 erotic pictures and 18 non-erotic but positivepictures. Alcock identifies two methodological irregu-larities among a long series. These non-conventionalmethodological techniques are puzzling. For exam-ple, Bem needs to know for classification purposes ifpictures shown on the screen are positive or negative,and emotionally strong or not. The best way toachieve this goal is to select images from a reliabledatabase, such as the International Affective PictureSystem.' This Bem didbut only for "most of the pic-tures'. We do not have any indication of the propor-tion of pictures not taken from tbe databank, nor dowe have details of the procedure used to classify theseimages as being "positive", "neutrd", or "negative".

    The question naturally arises firom this weirdmethodology: do we even know for sure that the clas-sification was made before the exjseriment? If that isnot the case, then of course data would be ad hoc. Bemalso claims that his prior hypothesis in Experiment 1(where subjects have to find on which side of thescreen a picture will appear) was that people wouldfind the picture in more than 50% of the cases forerotic pictures. If this was really his hypothesis, whywould he need to classify non-erotic pictures as eitherfHDSitive, negative, neutral, and non-erotic positive?The methodology strongly suggests that what is pre-sented as an a priori hypothesis is in fact an a posteriorihypothesis, based on the results of the experiment.

    This is not acceptable in research using statis-tics to prove a general hypothesis, for a reasonlinked with "risk". Each statistical test (i.e., amethod allowing, if it does, to "prove" a general hy-pothesis) indeed has a riskusually 0.05, or 5%.This 5% risk means that (in Bem's case) if there isno such thing as precognition, then the result ofstatistical tests will let us believe the contrary oncein 20 times. Since we can imagine at least hun-dreds of tests, choosing the test afterward may in-duce misleading results.

    Wagenmakers et al.'s CritiqueThis is, more or less, what Eric-Jan Wagenmakersand his colleagues claim in their paper that appearsin the same journal as Bem's article.'' Wagenmakerset al. argue that what is presented as a confirmatoryanalysis of some previous psi hypothesis is in a factan exploratory analysis. This may seem meaninglessat first glance, but the statistics involved in Bem'spaper are fundamentally confirmatory, and cannotbe used that way.

    Indeed, in an exploratory analysis, one doesnot know exactly which hypothesis to test. For in-stance, Bem may have had a feeling that, in somecircumstances, people will show precognition. If hedoes not know before the experiment if this will befor erotic, negative or positive pictures, then hemay proceed to a series of tests to accredit what-ever situation will appear as the "good" one for psiphenomenon to show up. But if he does so (and hedid so), since he uses several tests, the above argu-ment shows that the results obtained cannot betaken as proving precognition. The usual way todeal with such a situation is to consider any resultcoming out from this kind of study as "exploratory",i.e., preliminary studies. The right conclusionwould then not be "tbere is precognition witherotic pictures," but "if there is precognition, it ismore likely to appear with erotic pictures. Now,one has to build up an experiment to test precogni-tion with erotic pictures..."

    Wagenmakers et al. go even further, arguingthat when psi or any other improbable effect is in-volved, one should take into account the multitudeof experimental failures to prove it. This cannot beachieved through classical statistical testing (exceptthrough meta-analysis). For that reason, Wagen-makers et al. advocate the use of a method based ontotally different basis.

    Their arguments are certainly well-advised.But such a change of paradigm is not necessary todeny the link that Bem sees between his data andconclusions. Actually, as we will see, a classical sta-tistical approach suffices to show that the very dataexhibited by Bem does not support the final claimof precognition.

    Multiple TestingClassical statistical testing is based on a simpleidea. It aims at refuting a so called "null hypothe-sis." In Bem's case, the null-hypothesis is that thereis no such thing as precognition, something Bemwishes to prove false. As in any scientific experi-ment, we begin by assuming that the null hypothe-sis is true, and reject it only if the data showstatistically significant differences between condi-tions and we can safely rule out chance as an expla-nation. That is, the observed experimental resulthas a probability (the so-called "p-value") of lessthan 5%. This means that if the null hypothesis istrue that the effect is nonexistent, there is still a 5%probability that it will be observed anyway, bychance. This classical testing method is bound tofail (if, again, the null hypothesis is actually true)

    volume 16 number 3 2011 WWW.SKEPTIC.COM 55

  • Exp. Number oftests mentioned

    1 202 4 8 (2 groups, 3 types of pictures, 4 tests)3 1 6 (2 types of pictures, 4 tests)4 8 (2 cutoffs, 2 transformations)5 2 4 (2 genders, 3 types of pictures, 2 tests)6 3 2 (2 genders, 4 types of pictures, 2 tests)7 1 6 (4 tests. 2 types of pictures)8 6 (3 groups)9 6 (3 groups)

    Maximump-value (%)

    .25

    .10

    .31

    .62

    .21,16.31.82.82

    Best p-valueobserved (%)

    1.00.90.71.41.43.79.62.90.2*

    All 176 .03 0.2

    Table 1For each experiment (Exp.), the number of tests mentioned in Bem'sarticle is multiplied by a factor of 2 to correct for unilateral-testing, and thecorresponding corrected maximum p-value (or risk one should accept) and thebest p-value given by Bem are reported. Note that the corrected risk calculatedhere may well be over-estimated.

    The last line shows the same analysis considering every test in Bem'sarticle as an attempt to prove precognition.

    once in every 20 attempts. For this reason, statisticalhandbooks for researchers recommend: (1) usingonly one test for each hypothesis; and (2) choose-ing the test before the experiment begins. This isnot what Bem did. For instance, he cites using asmany as 6 tests for testing precognition in Experi-ment 1, and 8 in Experiment 2.

    Correction procedures for multiple testing doexist. Social scientists are normally taught to usesuch procedures whenever they have to use severaltests for the same hypothesisa situation thatshould rarely occur. One way of achieving multiple-testing correction is to divide the maximum admis-sible risk of each test by the number of tests beingconsidered. For instance, considering Experiment 1from Bem's article, this would imply concludingthat there is proof of precognition only if at leastone particular test (but possibly all of them) showsa p-value of less than 0.83% (5/6=0.83). No testmeets that criterion in Bem's first experiment.

    However severe this might seem, it is still notenough. The correct number to use is not the num-ber of tests actually mentioned in the article, but thenumber of tests from which the author would havederived the conclusion. In the case of Bem's first ex-periment, this is far more than the number of 6 testsacknowledged by the psychologist. First of all, everytest in the paper is made unilaterally (see below). InExperiment 5, for example, Bem conducts a retroac-tive habituation task: subjects are asked to indicate

    which of two pictures they prefer, and are afterwardexposed to a random picture chosen among the twoprior stimuli (for instance, a car accident and a bro-ken leg). In a traditional habituation task the targetis first presented several times. This leads either toan increased preference of that same target (simpleexposure effect) or to a decreased preference (bore-dom effect), depending on many factors, includingthe number of repetitions. Here, Bem says that he istesting for a simple retrograde exposure effect.

    Doing a unilateral test means that had the sub-jects chosen the target with a probability significantlyless than 50%, Bem would not have concluded thatthis was proof of precognition. But this would, ofcourse, be either an example of negative psi or an ex-ample of a boredom effect.. .thus a proof of precogni-tion in both cases. We therefore have good reasons tobelieve that such a result would have actually led thepsychologist to conclude he has proof of precognitionthrough a retrograde boredom effect, or negative psi.Conducting a unilateral test chosen a posteriori isequivalent to doing two unilateral tests. Since Bemdid not have a good reason to believe that a probabil-ity significantly less then 50% would not be a proofof precognition, we must think that he actually chosethe "right" test after the experiment had been fin-ished. This amounts to doing two tests: one to checkif the observed percentage is significantly more than50%, one to check if it is significiintly less.

    In Experiment 1, Bem tests for negative, neu-tral, positive, romantic but non-erotic, and eroticpictures. This would make 5 tests. But Bem did 2tests in each case (a so-called t-test, and a binomialtest), leading to 10 tests.

    Table 1 shows the number of tests involved foreach experiment, the least risk for which at leastone test mentioned leads to a "proof of precogni-tion," and the maximum p-value after correction formultiple-testing, considering the number of testsBem would probably have accepted, based on thenumber of tests actually mentioned.

    Bem claims to have produced 8 proofs of pre-cognition through 9 experiments (he recognizes alack of conclusive results for only Experiment 7).After a correction procedure for multiple-testing,only 1 of the nine experiments still shows whatlooks like a significant result.

    Is there, then, one proof of precognition?This is far ftom sure. Before coming to such a con-clusion, one must indeed consider (as Bem him-self admits) that Experiments 8 and 9 test thesame hypothesis of retroactive facilitation of re-callshould we then not separate them, and look

    56 SKEPTIC MAGAZINE volume 16 number 3 2011

  • at Experiments 8 and 9 together?Even more, shouldn't we consider that all nine

    experiments together test the same precognition-hy-pothesis? Doing so, we face 176 tests attempting toprove precognition, as displayed in the last line ofTable 1. The maximum admissible p-value is then.03%, far less than the best value Bem reports (.2%).From this we may reasonably conclude that there isno statistical proof of precognition in Bem's study!

    Wagenmakers et al. wisely advise distinguish-ing between exploratory studies and confirmatorystudies. A work such as Bem's is the consequence ofnot following that piece of advice. Being generous,we may say that Bem's research is an investigationof what might be true in various precognition con-ditions. Since only one situation leads to suggestiveresults, it is now time to test it, following thegolden rule of statisticians "One hypothesis, onetestchosen prior to the experiment."

    We must admit that the statistical flaw in Bem'sarticle often shows up in social science. Psycholo-gists ofren publish many tests around the same hy-pothesis. But there is, however, a big differencebetween psychological science and parapsychology.When a psychologist claims to have a proof of anunexpected strange phenomenon, many other psy-chologists attempt to replicate the experiment. Sodo parapsychologists. But when psychologists fail toreproduce a conclusion, they abandon their previ-ous hypothesis (such as the "Mozart effect"),whereas parapsychologists show a tendency to ex-plain the failure by other factors than their previouserror. They may, for instance, invoke the presence ofa skeptical mind around the laboratory (JamesRandi has been so accused). Social scientists andparapsychologists share some bad statistical habits,but social scientists usually employ efficient correc-tion methods that pjurapsychologists do not. B

    REFERENCES1. Bem. D. 2011. "Feeling the Future: Experimental

    Evidence for Anomalous Retroactive Influences onCognition and Affect." Journal of Personality andSocial Psychology.

    2. Alcock, J. 2011. "Back from the Future: Parapsy-chology and the Bem Affair." Skeptical Inquirer,March/April.

    3. Lang P. J., Greenwald M. K. 1993. International Af-fective Picture System StandardizationProcedureand Results for Affective Judgments. Gainesville,FL: University of Florida Center for Research in Psy-chophysiology.

    4. Wagenmakers E. J., Wetzels R., Borsboom D., vander Maas H. (in press). "Why Psychologists MustChange the Way They Analyze Their Data: The Caseof Psi." Journal of Personality and Social Psychology.

    How to DebateA Creationist

    ONLY $5.00 EACHPART I: BACKGROUND: What Is Evolution? What the theoryof evolution says, and does not say.; Debates about evolu-tion among scientists; Science and Religion; Debating aCreationist: An EveningWith Duane I Gish; Creation mythology around the world.PART II: OLD AND NEW CREATIONISM: 25 Creationist Arguments and 25 EvolutionistAnswers. Philosophically based arguments:l-12; Scientifically based arguments: 13-25PART III; THE NEW NEW CREATIONISM: Intelligent Design Theory. The Evolution ofCreationism: (1) The attempt to ban evolution; (2) The demand tor equal time forGenesis; (3) Presenting creationism as science; Why creationism is not science. TheU.S. Supreme Court ends the new creationism. The Rise of Intelligent Design Theory: Ten Intelligent Design Arguments andTen Answers. The Nature of the Designer; Methodological Supernaturalism; In-telligent Design Intervention; Irreducible Complexity; Inference to Design; TheUniverse and Life Fine Tuned; Explanatory Gaps; The Conservation ot Informa-tion and the Explanatory Filter; Science Education and Debating Evolution; Non-Religious Commitments

    How to DebateACreationist. Catalog No. PB007.28pages,8'/2x11 bookletAsk about discount rates for multiple copies for classroom use.

    The BaloneyDetection Kit

    is Here!This 16-page booklet is designed to hone

    critical thinking skills. What questions to ask,what traps to avoid. Specific examples ot how

    scientific method is used to test pseudoscience and paranormal claims.Includes a how-to guide in developing a class in critical thinking.

    ONLY $5.00 EACHIncludes: Sagan's Ten Tools tor Baloney Detection; Shermer's Ten QuestionsFor Baloney Detection; How Thinking Goes Wrong: The 25 Fallacies ot Thought; Eight

    Sample Syllabi: How to Teach a Course in Science & Pseudoscience; Most Recom-mended Skeptical Books; A Skeptical Manifesto.

    Baloney Detection Kit. Cat. No. PB07516 pages, 8'/2 X11 booklet.Ask about our discount rates for multiple copies for dassroom use.

    ORDER ON THE COLORED TEAR SHEET AT THE BACK OF THE MAGAZINEOR AT SKEPTIC.COM

    volume 16 number 3 2011 WWW.SKEPTIC.COM 57

  • Copyright of Skeptic is the property of Skeptics Society and its content may not be copied or emailed tomultiple sites or posted to a listserv without the copyright holder's express written permission. However, usersmay print, download, or email articles for individual use.