weak models, nil hypotheses, and decorative statistics: is there really no hope?

19
Weak Models, Nil Hypotheses, and Decorative Statistics: Is There Really No Hope? Author(s): Barry O'Neill Source: The Journal of Conflict Resolution, Vol. 39, No. 4 (Dec., 1995), pp. 731-748 Published by: Sage Publications, Inc. Stable URL: http://www.jstor.org/stable/174385 . Accessed: 08/05/2014 17:50 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Sage Publications, Inc. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Conflict Resolution. http://www.jstor.org This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PM All use subject to JSTOR Terms and Conditions

Upload: barry-oneill

Post on 08-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Weak Models, Nil Hypotheses, and Decorative Statistics: Is There Really No Hope?Author(s): Barry O'NeillSource: The Journal of Conflict Resolution, Vol. 39, No. 4 (Dec., 1995), pp. 731-748Published by: Sage Publications, Inc.Stable URL: http://www.jstor.org/stable/174385 .

Accessed: 08/05/2014 17:50

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Sage Publications, Inc. is collaborating with JSTOR to digitize, preserve and extend access to The Journal ofConflict Resolution.

http://www.jstor.org

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

Weak Models, Nil Hypotheses, and Decorative Statistics

IS THERE REALLY NO HOPE?

BARRY O'NEILL Yale University

Donald Green and Ian Shapiro contend that rational choice models have made negligible contributions to the empirical study of politics. Published tests systematically violate some basic research principles, they say, and they ascribe the problem to modelers' universalist aim of explaining all human behavior. This review critiques some of the authors' principles, which seem to derive from an extreme form of Popperianism combined with norms around null hypothesis testing. Their attribution of universalism is exaggerated; what they are seeing is actually a desire to unify different selected areas, a basic goal in theoretical explanation. One advantage of rational choice models is that they frequently make precise predictions. Models that can do this offer an escape from the uninformative ritual of null hypotheses tests.

Donald P. Green and Ian Shapiro, Pathologies of Rational Choice Theory: A Critique of Applications in Political Science. New Haven, CT: Yale University Press, 1994.

According to Donald Green and Ian Shapiro's count, about a third of the articles in the American Political Science Review are now using rational choice methods. These authors, a statistical methodologist and a political theorist, contend that the approach has produced essentially nothing of empirical importance. Tests that have claimed success have been either flawed or uninteresting. The problem, in the authors' view, is that rational

AUTHOR'S NOTE: I would like to thank David Dessler, Lynn Eden, Jonathan Mercer, and Bob Powell for their helpful comments.

JOURNAL OF CONFLICT RESOLUTION, Vol. 39 No. 4, December 1995 731-748 ? 1995 Sage Publications, Inc.

731

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

732 JOURNAL OF CONFLICT RESOLUTION

choice practitioners entertain a universalist aim. They hold the goal of explaining all human behavior and continually see weak or negative empiri- cal results as positive in hopes of keeping this goal alive (p. 6).1

The book starts by proposing a conversation between fields and promising constructive suggestions (p. x), but the rhetorical temperature soon rises, and generalizations start to flow freely. Rational choice theorists do not advocate their views but "trumpet" them (pp. 29, 65); research goals do not get adopted but become "fashionable" (p. 68); the field displays not limits or problems but "pathologies." Quotes from individual authors are portrayed as the general belief. A recurrent metaphor involves armies waging battle, and the authors intend to give up very little ground. Readers who see science as akin to battle may appreciate the book's style, but those looking for balanced criticism from outside the field may feel it is a missed opportunity.

As an applier of game theory, I agree with the authors that better tests are needed, and I would welcome good advice. However, advice is better when it is based on a realistic assessment. The authors themselves set a universalist goal, trying to show that "rational choice theories have contributed virtually nothing to the empirical study of politics" (p. 195). They stress repeatedly that rational choice empirical work is inadequate, but nowhere do they point to other bodies of work in political science as good examples of theory validated by data. The book argues that flaws in testing are characteristic of the method; among the authors' research principles is an emphasis on control groups, so it is odd that they provide none for their own argument.

Better advice also comes from an empathy with the topic of one's critique, a sense of what the other is trying to accomplish. Other critiques of rational choice have appeared from psychological, sociological, and postmodernist viewpoints, and whether one agrees with them or not, they have focused on the theoretical elements of rational choice theory that are intellectually distinctive.2 That is missing here. In the book's last chapter, the reader is offered several explanations of the method's popularity: that there are no alternative theories; that it is favored by ideological biases that "link rational choice to a particular politics" (evidently conservative, according to exam-

1. Page numbers in parentheses refer to pages in Green and Shapiro's (1994) Pathologies of Rational Choice Theory.

2. Many critiques speak to the intellectual content of rational choice theory and only a sample can be cited, but new contributions are mentioned regularly in the pages of the journals Economics and Philosophy and Rationality and Society. Denzin (1990), a postmodernist soci- ologist, argues with its treatment of emotion, and England and Kilboume (1990) give a feminist critique of its concept of self. In international relations, the notion of military deterrence as a weighing of costs and benefits was attacked from the viewpoint of decision and organizational psychology (Jervis, Lebow, and Snyder 1985). McCloskey's analysis of the metaphor and rhetoric of economic analysis is outstanding for its wit and knowledge of the subject (McCloskey 1985; Klamar, McCloskey, and Solow 1988; Nelson 1992).

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

O'Neill / WEAK MODELS 733

ples on pages 11-2); that outsiders who control academic resources are easily impressed by its formal techniques; and that the scarcity of data in some political science fields allows it to move in (p. 195). The authors also suggest that each social science discipline using rational choice does so from a belief that the method has succeeded in the other disciplines (pp. 179-81). Each of these explanations sees allegiance to the approach as basically nonintellec- tual. I find the method attractive because it combines beliefs and goals in a simple and precise way; reasoning to achieve our goals is, after all, a large part of what makes us human. How far it can succeed empirically is a discussible issue, but a critique should be open to recognizing its distinctive positive side.

In the second chapter, the authors discuss the basics of rational choice, but their description contains a number of errors and missed points. A central concept of rational choice theory is the Nash equilibrium, and their formal definition starts by specifying this as an agreement among the players (p. 25). In fact, a Nash equilibrium is generally not an agreement. In a prisoner's dilemma, for example, mutual defection is an equilibrium with no agreement involved. The authors state that at an equilibrium no coalition can unilaterally improve the welfare of its members. In fact, Nash equilibria are not stable against coalitions. Their error stems from confounding an equilibrium with the core, a different concept from a different branch of game theory. Another central rational choice idea is utility. Von Neumann and Morgenstern's treatment of it was revolutionary, initiating what has become known as "modern" utility theory, but the authors give the reader no understanding of what von Newmann and Morgenstern discovered (p. 15). Rational choice models, they also say, generally assume that tastes are similar across the players (p. 17). This is not true in my field of international relations, or in the studies of voting, or in numerous other areas. In their preface (pp. ix-x), Green and Shapiro separate themselves from other skeptics, "Critics tend to ignore or heap scorn on the rational choice approach without understanding it fully.... [T]hey get it wrong in elementary ways.... None of the fundamen- tal contentions of rational choice theory is inherently difficult to understand." I would say that the fundamentals are easy to misunderstand. They may not be complex but they are subtle, and this can engender overconfidence and these kinds of errors.

After summarizing the basics of rational choice, the Pathologies of Rational Choice lists the characteristic methodological failings of the empiri- cal work. The authors then illustrate the failings using four literatures: collective action, voter turnout, the paradox of voting, and spatial models of electoral competition. A final chapter refutes some expected criticisms. I will focus on the "methodological pathologies" section, which develops the

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

734 JOURNAL OF CONFLICT RESOLUTION

"basic requirements of sound empirical research" (p. 33) violated by rational choice tests.3

THE ORDER OF BATTLE

The philosophy behind many of the authors' principles seems to be that of Karl Popper. (Their adherence to his views or those of his student Imre Lakatos becomes explicit in the last chapter [188].) Popper is widely known, but his theories are by no means dominant among philosophers. Many see his importance more in the questions he raised than in the answers he provided. He formulated his doctrines in the 1930s and was resistant to change in spite of problems raised by other philosophers, including his followers. Critical assessments usually see him as erring on the narrow side, being too restrictive on what can count as proper science. He banished not only psychoanalysis and Marxism from the kingdom but also the theory of biological evolution. It was "not a testable theory" (Popper 1974a, 134) and "almost tautological" (Popper 1972, 241). Later, he partially recanted (Pop- per 1978), but creationists still quote him in their campaigns against teaching evolution as science in public schools (Numbers 1992, 246, 411; Awtry 1994).

Other principles endorsed by the authors are drawn from the approach of null hypothesis testing. Current methods are an amalgam of ideas from Neyman, Pearson, and Fisher and various textbook writers who interpreted them (Gigerenzer 1993). They do not have a clear philosophical basis and, in important ways, are inconsistent with Popperian principles.

Explaining one's research philosophy by critiquing an existing literature, as the authors do, has advantages and problems. The meaning is clear from the immediate examples, but less time can be spent justifying one's ideas or showing that they are mutually compatible. Some of the authors' principles of research are not mutually compatible, and many are not sound.

The principles are as follows. (They appear in order in chapter 3 as section topics and within sections; the wording and numbering are my own.)

1. Don't theorize post hoc; don't test your theory with data you used to construct it (pp. 34-6).

2. Try to construct alternative explanations for the data, and compare them with your theory (p. 37).

3. Many other aspects of this book are worth discussion. For some of them, see Critical Review (1995).

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

O'Neill / WEAK MODELS 735

3. Specify a null hypothesis clearly, and choose one that is credible (p. 37). 4. Don't make slippery or vague predictions (p. 39). 5. Avoid too many theoretical entities relative to the amount of data (p. 40). 6. Avoid a prediction of the form that the data will be "close" to some point value,

because this cannot be evaluated statistically (p. 41). 7. Try to falsify your theory; don't just assemble confirming instances (pp. 42-3). 8. In lab tests, include control groups (p. 43). 9. Don't bias your interpretation of the evidence to fit your theory (pp. 43-4).

10. Don't put arbitrary restrictions on the domain of validity you claim for your theory (pp. 44-6).

These sound convincing. If they are actually valid, one would expect to see them in the best research, but some celebrated examples seem to ignore them. A good case is Tversky and Kahneman's work on the heuristics of probability judgement and prospect theory (Tversky and Kahneman 1974; Kahneman and Tversky 1979; and the supporting articles they cite). It certainly qualifies as acclaimed research; the first article has become one of the most referenced behavioral science pieces ever, and the second has been applied in negotiation, marketing, crisis decision making, and elsewhere. Green and Shapiro themselves cite it as evidence against rational choice.

As principle 1, Greei and Shapiro state that "data that inspire a the- ory . . . can't properly be used to test it" (p. 35). Tversky and Kahneman's (1974) probability article recounts a list of results that support their theory, but many were previously known. As to the rest, they make no promise that they devised their theory first and only then examined that data. It seems likely that they followed the usual psychological procedure of letting theory and experimental design mutually influence each other in a sequential pro- cess. Principle 2 calls for comparing one's explanation with alternatives, but Tversky and Kahneman suggest no explanations other than their favored one for the violations of the standard theories of probability judgment or choice. Principle 3 calls for credible null hypotheses, but many of their null hypothe- ses are decidedly not credible (e.g., that the phrasing of a question makes no difference at all to subjects' responses, a notion long known to be false). Contrary to principle 4, some of their hypotheses are quite vague: a typical one predicts that greater mental availability of an event's instances will cause its judged probability to increase but says nothing about how much.

Regarding principle 7 on falsification, there is no suggestion that Kahne- man and Tversky chose experiments aimed at falsifying their theories. By all appearances, their practice was the normal one of designing experiments that they expected to disconfirm the standard theory and support their own. Most experiments they report used no control groups, contrary to principle 8, and

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

736 JOURNAL OF CONFLICT RESOLUTION

it is often not clear what a control group would mean in their context. As principle 10, Green and Shapiro interpret nonarbitrary domain restrictions to mean "specifying the relevant domain conditions in advance by specifying the limiting conditions" (p. 45). In other words, "For the domain restriction to be adequate, the relevant domain must be specified independently of whether the theory explains the phenomenon within it" (pp. 45-6). Tversky and Kahneman put no clear restrictions on their theories' domains. Some heuristics were found to give opposite predictions for the same context (Einhorn and Hogarth 1988, 130-1). When some were found to be wrong in some circumstances, the conclusion was not that their theories had been falsified but that further work should be done to determine their limits (e.g., Olson 1976).

PREDICTING OLD EVIDENCE

My aim is not to cast doubt on Kahneman and Tversky's work. It is some of Green and Shapiro's "basic requirements of sound empirical research" (p. 33) that I want to question. A closer look reveals their problems. Principle 1, that data used to design a theory cannot confirm it, is the authors' frequent argument against rational choice work. Popper (e.g., 1965, 241-2) stressed the general idea, as have many working scientists (e.g., Keohane, King, and Verba 1994, 46). However, Rosenkrantz (1983, 83) termed it "an old chest- nut... [Scientists] never fail to pay lip service to it and never fail to disregard it in practice." Historical studies have continually found important theories accepted mainly because they fit the known evidence (Hofmann 1988; Zandvoort 1988b, 1988a, 150; Worrall 1988; Brush 1989; Scerri 1994). Brush (1989) studied the corroboration of general relativity theory by spectral redshifts and the advance of the perihelion of Mercury's orbit (old evidence at the time of Einstein's theory), as well as the degree of bending of light from a distant star as it passed the sun (a measurement that had not yet been made). This is an especially telling case because the news that light bending fit Einstein's prediction influenced young Popper's views on corroboration (1965, 117, 339-40). However, Brush found that although the popular press stressed the new evidence, physicists themselves, including Einstein, gave more weight to the old. It made little difference to them that the advance of Mercury's orbit was apostdiction instead of a forecast, and the nondistinction in general is reflected in the language of physicists, who often talk of a theory "predicting" an already known fact.

Old evidence should count for a number of reasons. If it has repeatedly resisted explanation, a new theory that explains it deserves the credit. If

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

O'Neill/WEAK MODELS 737

scientific norms did not stress existing evidence, one would face a greater epistemic mess, with scientists looking mainly to future data and letting unexplained oddities pile up. Also, the case that influenced Popper was unusual. Apart from his theory, Einstein could not have known just how much starlight bent before Arthur Eddington led the 1919 expedition to Principe to take measurements during the solar eclipse. In many fields, completely novel evidence is the exception. A psychologist usually has a fair idea how the next experiment will come out, based on pilot runs or past experiments that were roughly similar. The data will not be entirely new or entirely old.

If it seems puzzling that a theory can be bolstered by the same evidence that inspired it, the contrary thesis would also be odd. A theory's acceptability should depend on the information available, not on the time it arrived. To assess a theory, one should look at the objective facts, not what was in the mind of the scientist who thought it up. Someone who insists on new evidence may be thinking that theorizing is mainly curve fitting. The argument that one can always draw a curve through known points is the main one given by the authors (pp. 34-5). However, a good fit is only one desideratum. A body of theory is more than a list of true general facts. A scientific theory should take diverse phenomena and unify them under a simple structure. The kinetic theory of gases, for example, elegantly unifies laws concerning how gases expand when heated, contract under pressure, and diffuse into one another- phenomena that are superficially quite separate. A unifying structure yields understanding, as opposed to mere factual knowledge. To build one that connects dissimilar phenomena, that is simple and plausible given our background knowledge, is by no means as easy as curve fitting. The challenge is to account not for novel evidence but for diverse evidence in a simple way.

FALSIFY, BUT VERIFY

Principle 7-falsify, don't verify-is the element of Popper's theory that most spread his name among scientists. Seemingly, confirming instances should count; finding the bones of a humanoid ape should support the theory of evolution. However, Popper viewed scientific laws as different in form from "the missing link exists." They are not existential statements but universal ones like "all X are Y." No finite data can fully verify them, but they can be falsified; a single contrary instance will do. Verification is patting oneself on the back, but falsification makes scientific progress.

However, if one views data as justifying a certain degree of belief in a hypothesis-as telling us more than "not yet falsified"-the verify/falsify

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

738 JOURNAL OF CONFLICT RESOLUTION

distinction starts to dissolve.4 Failing to falsify a hypothesis ipso facto increases its credibility (Jeffrey 1975). Suppose one's hypothesis H implies various observable events. One intends to check whether one of these, label it E, is true or false and will revise one's probability for H accordingly. If E turns out false, the new probability for H is P(H I -E) = 0, (read, "the probability of H given not E"), but if E is true, it becomes P(H I E) = P(H)/P(E). Because H implies E, P(E) must be at least as large as P(H), but the closer P(E) is to this minimum, the more a finding that E is true will increase H's credibility, according to this ratio formula. So to confirm the law maximally, one should choose a prediction E with the lowest P(E). The value P(E) is E's probability not assuming the hypothesis, so trying to confirm the hypothesis means testing a prediction that is highly unlikely without one's hypothesis. If one is trying to falsify the hypothesis, one should choose a test that maximizes one's chance of ending up with P(H I -E) = 0. This chance is 1 - P(E), so again, one looks at an E with minimum P(E). Verifying and falsifying call for the same strategy; therefore, stressing one over the other is pointless. Good science means checking one's theory.

Green and Shapiro are possibly suggesting that writers have deliberately avoided reporting their negative evidence (p. 43). If so, that is another argument, but it is not clear how they would substantiate it. They also criticize some rational choice writers' presentation of confirmatory instances on the grounds that these results were only to be expected. This, too, is a different argument: that the E chosen was too likely. Assuming that Green and Shapiro's judgement is accurate in the particular case, such a test would indeed add less weight, but the authors want to circumvent case-by-case judgments and institute a general ban. They criticize the "pathology [which] leads researchers to dwell on instances of successful prediction" (p. 43). This is not a pathology. Confirming instances can be quite informative and should not be dismissed.

AUXILIARY HYPOTHESES

Popper's stress on falsification was excessive for another reason. The more interesting scientific theories cannot be fully falsified, just as they

4. Popper, as well as Fisher, would have objected to assigning a probability value to H, in practice and in principle. Popper's anti-inductivism and falsificationism were closely tied, the two prongs in his attack on logical positivism. The alternative position taken here is that P(H) makes sense for general hypotheses as long as one views them as approximations, not exact statements. Scientists themselves constantly talk about the credibility or likelihood of their hypotheses; in Gigerenzer's (1993) only partly jocular metaphor, their Bayesian id is asserting itself over their Fisherian (or Popperian) superego.

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

O'Neill / WEAK MODELS 739

cannot be fully confirmed.. The assumption P(H I -E) = 0 above was unrealistic. As many philosophers have pointed out, no observable evidence could falsify Newton's theory of gravitation. To yield an observable predic- tion, the theory of gravitation must be supplemented with other propositions called auxiliary hypotheses or other names, depending on one's philosophical roots. Some auxiliary hypotheses are correspondence rules linking theoretical and observational concepts, stating, for example, which bodies have mass; some are provisos, stating the absence of other forces beyond those specified; some state initial conditions; some involve assumptions about measurement methods; some are even other major theories, like the laws of motion. This whole ensemble is testable, but gravitational theory by itself is not. Popper (1974b, 994) recognized the need for auxiliary hypotheses but de-emphasized them, sometimes trying to interpret them as simply initial conditions or trying to limit their choice through a rule prohibiting "ad hocness." The authors accept Popper's rule (p. 45 n. 11) and use it to dismiss some rational choice research, but other philosophers of science have concluded that it does not work (Grunbaum 1976, and references). Grunbaum notes Hempel's conclu- sion that there is no precise criterion for impermissible adhocness, that one must wait and watch as the theory develops.

Just as Newton's law of gravity requires a specification of what counts as mass, a game or decision model must state what the actors' goals and beliefs are. A few modelers have wanted to admit only self-interested, materialist, and hedonistic considerations, but the majority allow for goals such as duty, altruism, or the motives generated by various emotions. Is the postulate that utilities are strictly self-centered innate to the theory? Or is it a correspon- dence rule, to be used in some applications but not in others? This is a semantic question; one just has to decide what one means by rational choice theory. The book never states which it will take as its working definition, and it vacillates between the narrow and the broad meaning. In the expository chapter, it seems to admit broader utilities (p. 14), but later, it pronounces the literature on voter turnout and collective action to be rational choice failures because researchers appealed to these wider motives (p. 44). As one who believes that people's utilities involve the broadest range of goals, I see these as really failures of the narrow correspondence rule in these contexts, a rule that seemed implausible from the start.

An important scientific activity, on par with doing empirical tests, is to take a theory and take known data and look for plausible correspondence rules that make the theory consistent with the data. When rational choice theorists do this regarding the content of the actors' utility functions, the authors see it as cheating, a "vexing" attempt to rescue a theory from a decisive refutation (p. 37). Theorists who do this "exploit the ambiguity in

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

740 JOURNAL OF CONFLICT RESOLUTION

the meaning of rationality ... one must question whether the succession of theories is susceptible to empirical evaluation in any meaningful sense" (p. 36). In fact, scientific theories are typically not susceptible to empirical evaluation unless conjoined with further auxiliary hypotheses. If no plausible ones are found, the theoretical structure will be dropped, but the process is regular and necessary for science (Putnam 1974).

WEAK MODELS, NIL HYPOTHESES, DECORATIVE STATISTICS

Principle 4 calls for nonslippery, nonvague predictions. On the face of it, rational choice models should escape this charge because of their mathemati- cal nature. They are strong models (Meehl 1967) in the sense that if their parameters can be measured, they predict the exact value of a variable. Weak models are those that give purely comparative arguments (e.g., that factors are associated to some unspecified degree). Weak models have dominated the journals, and the precision of formal hypotheses should be a major advance. To paraphrase Tukey (1969), how useful would engineers find Hooke's law of elasticity if it read, "when you pull on it, it gets longer"?

The book sees only the down side. A point prediction cannot be exactly right, so the researcher feels justified in taking data that come close to the prediction as confirming evidence. But what, the authors ask, counts as close? Things get murky, statistical methods no longer give a dichotomous verdict, and rational choice theory survives. Admitting the notion of proximity permits rational choice theorists to see a failed test as successful.

The standard tests in the journals do not deal with proximity. They try to falsify a null hypothesis, which is the one that the researcher's theory does not predict. The null hypothesis is always strong (i.e., precise), stating, for example, that a certain association is zero. Under this assumption, one can derive the distribution of the test statistic, calculate the significance level, and possibly reject the null hypothesis. The alternative, or substantive, hypothe- sis represents the one supported by the researcher, the one that follows from the researcher's theory. It cannot be tested because it is weak (i.e., vague), typically saying that the parameter is greater than, less than, or simply different than zero. Rejecting the null hypothesis is taken to justify a continu- ation in entertaining the substantive hypothesis. This is the Fisherian ap- proach, currently prevalent in political science.

If the researcher's model were strong, the situation would be reversed. Then the model itself would predict an exact value of a parameter and become

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

0 'Neill / WEAK MODELS 741

testable. Researchers would be rejecting or failing to reject their substantive hypotheses.

Which is the right procedure-weak models and null hypothesis tests, or strong models and substantive hypothesis tests? In principles 3 and 6, the authors come down for the Fisherian status quo. Strong models require the notion of proximity, and Green and Shapiro express doubt that any "rigorous test" is possible for approximations to point predictions (p. 41). As a scientific principle, they call for testing a null hypothesis alternative to one's own theory and, in fact, perform one of their own in the book (p. 53). Popper would disagree. His message is to try to falsify one's own theory. The authors are tapping Popper's philosophy in principle 7 but are contradicting their pre- vious Fisherian maxims. Trying to falsify one's theory means trying to reject the substantive, not the null, hypothesis.

Here, I would side with Popper. Statistical inference5 in political science seems trapped in a loop involving weak models, null hypotheses, and irrelevant statistics.6 Rejecting null hypotheses has great appeal because it is easy to do, but having done it, one learns almost nothing. Inferential statistics were meant to guide research by measuring the strength of evidence for various explanations, but in much current political science, their role is often decorative. Many researchers present their p values in tables, include the appropriate number of asterisks, but having rejected an exact zero null hypothesis, they often make what they will of the data, unguided and unchecked by inferential statistics. I surveyed the 1994 issues of American Political Science Review, looking at those articles that were not rational choice but that used inferential statistics. I found about 20. Almost all the tests in the sample were null hypothesis tests of associations: either correla- tions, Beta values, or associations in contingency tables. In principle, one could posit a null hypothesis that predicted a correlation of .1, but all values were set at zero, what Cohen (1994) pejoratively calls nil hypotheses. The writers gave no reason to expect that, their theory aside, the associations should be zero. Relevant to principle 3, Green and Shapiro state, "we should accord explanatory power to rational choice theories in proportion to the credibility of the null hypotheses over which they triumph" (p. 37). These null hypotheses were chosen not for their credibility based on some evidence or theory but because of some attraction of zero. Examples, pronounced but

5. By the term statistical inference, I mean the use of statistics to measure degree of evidence, rather than for estimation or description.

6. Many methodologists in psychology have discussed the problems of null hypothesis testing over the years (Morrison and Henkel 1970; Greenwald 1975; Cohen 1990, 1994; Gigerenzer 1993; and their references). These sources make interesting and important reading, although they have not had much effect.

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

742 JOURNAL OF CONFLICT RESOLUTION

not too atypical, were a claim that one's party affiliation has no correlation with the candidate for whom one votes or that the degree of military control is uncorrelated with the degree of democracy.

In fact, the null hypotheses are never credible. Any statement of an exactly zero relation between two variables is surely wrong, because some associa- tion will be there, given enough data. Meehl (1990) made this point by cross-tabulating 15 routine items for 57,000 Minnesota high school students, including father's occupation, mother's education, birth order, occupational goal, and so forth. The 15 items yielded 105 chi-squares, and all 105 were significant at the p < .05 level; 101 were significant at p < .000001. Associa- tions were tiny but significant nonetheless. When Green and Shapiro call for credible null hypotheses, it is not clear to me what examples they might offer.

Popper would frown also on some of the consequences of relying on null hypothesis tests. A more powerful microscope puts biological theories to a more exacting test. However, when statistical tests become more powerful, they become more likely to reject the null hypothesis, and the substantive theory fares better (Meehl 1967). The closer one looks, the easier it is for one's theories to pass inspection. No wonder, because one is staring at the null hypothesis.

A statistical test requires some theory of the variability in the data. In many game theory models, the distribution of the test statistic arises directly from the model. In a test I ran of the minimax theory of zerosum games (O'Neill 1987), for example, the theory itself yielded the distribution for the moves observed in a large sample of plays of the game. In articles perused in the political science journal, a statistical theory was typically tacked on without discussion or examination: the existence of a numerical scale was enough to justify an assumption of normality. With two numerical scales, the writer was justified in assuming a linear relationship (unless one scale was a probability, in which case the relationship was surely logistic). One can interpret this in the weak/strong framework: weak models must be strengthened to allow statistical tests, but there is no theoretical guide for just how to do it. After these statistical-testing hypotheses were silently appended to the "theoreti- cal" null hypothesis of no association, the set was tested jointly. If the test rejected the set, there was no way to know which one was wrong, but writers regularly talked as though they had disproven the theoretical hypothesis.

Assuming that the statistical auxiliary hypotheses were known to be right and assuming that one were truly interested in whether a parameter is exactly zero, does rejecting the null hypothesis really show that it is probably false? Unfortunately not. The basic logic is the following:

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

O'Neill / WEAK MODELS 743

Given that the null hypothesis is correct, certain data will probably not occur. The data have occurred. Therefore, the null hypothesis is probably not correct.

The argument is as fallacious as this one (Pollard and Richardson 1987):

Given that a certain person is an American, that person is probably not a member of Congress.

The person is a member of Congress. Therefore, the person is probably not an American.

These arguments would be sound only if the instances of probably were deleted. The source of the problem is that instead of calculating the proba- bility of interest-that of the hypothesis given the data-the test calculates the inverse, the probability of the data given the hypothesis. The latter is reported as the significance level. Writers often talk as though the two were equal, P(H I D) = P(D I H), but this is valid only if the base rates are equal (in this case, only if Americans appear as frequently in the sampling as members of Congress). To assume that a rejection of a null hypothesis means, per se, that it is unlikely to ignore the base rates, a probability bias analyzed by Tversky and Kahnemann (1974).

Given these limits of null hypothesis tests, current inferential statistics methods do not tell what one wants to know. The results of using proper procedures are often quite different (Berger and Delampady 1987). Green and Shapiro's implied argument against precise hypotheses has the tone that theorizing should accommodate itself to current methodology, but this atti- tude reverses the true priorities: statistical methods exist for testing theories. In any case, their pessimism about testing precise substantive hypotheses overlooks existing techniques. As well as those reviewed by Berger and Delampady, Serlin and Lapsley's (1993) good-enough principle gives an- other approach to dealing with proximities.

Null hypothesis testing is also prevalent in current rational choice re- search. This is unfortunate, but at least the use of strong models offers hope for a way out. To cite an example, my laboratory experiment on a zerosum game used a matrix that was unique in that it allowed testing with only weak auxiliary hypotheses about utilities (O'Neill 1987, 1991). The crucial predic- tions were that each of two player-types would chose a certain move with frequency .400. The observed values were .362 and .426. If one assumes a uniform prior distribution on these two frequencies, it follows that, given the data with probability 95%, the true parameters lie within .05 and .04 of the predictions. The theory makes a nonobvious prediction of a point value, and

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

744 JOURNAL OF CONFLICT RESOLUTION

this prediction could be evaluated with inferential statistics. The strength of rational choice models in general is their strong predictions. If one wants to assess their validity, one must combine them with statistical methods that test these predictions rather than null hypotheses.

UNIVERSALISM OR UNIFICATION

Rational choice's original sin is seen to be universalism, trying to explain all human behavior (pp. 23-4). This is not correct. Many political science game and decision modelers view their work as drawing from and flowing into the rest of social science. The authors present universalist quotes by some researchers, but these are low in context and are assigned an interpretation in the book that is contradicted by a more complete reading of the source's work. Riker (1990) had the zeal of a pioneer and his optimism for the theory put him at an extreme position, but even so, where Riker wrote that "the major achievements in social science" spring from microeconomics (p. 177), the authors paraphrase this passage as the "only genuine advances ever to occur" are from rational choice theory (p. 2, emphasis added).

Green and Shapiro allow that many rational choice theorists believe in "partial universalism" or "segmented universalism" (p. 26). They define the first as a position that tries to explain part of every domain of human behavior. Because the authors do not specify what counts as different domains, the application of partial universalism is left indeterminate. It is surely an oxymoron, and when they speak of the "rational choice penchant for holding onto some form of universalism, no matter how qualified" (p. 30), I begin to perceive a penchant of their own for attributing universalism to rational choice theorists, no matter how far the word has to be stretched. They must maintain this position, however. Universalism, in their view, is what steers rational choice wrong. Without it, their book becomes a general critique of political science empirical research, with an "arbitrary domain restriction" to rational choice.

Given the earlier discussion of correspondence rules, universalism is a pointless position. Interesting theories become testable only when supple- mented with other theories, as Newton's law of gravitation required the laws of motion. Unless one holds that the only goals of humanity are money and pleasure and that people's beliefs in any situation are somehow obvious, one must supplement rational choice postulates with other social science theories. A model would not be based on game theory rather than, for example, cognitive psychology. It would be based on game theory and cognitive psychology. As a recommendation to the field, the authors stress "the impor-

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

O'Neill / WEAK MODELS 745

tance of keeping rational choice explanations analytically distinct from other accounts" (p. 97). I cannot imagine any reason for imposing such a limit on ourselves.

What the authors perceive as universalism is really, in my view, a desire for unification. Unification is an essential part of the explanation of general regularities. The aim of scientific, theoretical explanation is to bring diverse regularities under a common structure. The result is systematization and understanding. The kinetic theory of gases was the example given earlier, but the practice is the same in the social and behavioral sciences. Tversky and Kahneman's (1974) representativeness heuristic offers a unified explanation of the gambler's fallacy, errors concerning regression, the tendency to ignore base rates, and the tendency to draw overly strong conclusions from small data sets, the so-called law of small numbers. Their work sparked more excitement because their notion of heuristics brought probability judg- ments under the umbrella of perceptual psychology, including perceptual illusions.

Green and Shapiro offer rational choice theorists two alternative concepts of explanation to cite in justifying their work (pp. 30-2, 188). One is Milton Friedman's (1953) instrumentalist position, and the other is Hempel and Oppenheim's (1948) covering-law models. The correct answer is neither of the above. Friedman's idea seems to make a theory untestable; here, I agree with Green and Shapiro. On the other hand, the fathers of the covering-law model, Hempel and Oppenheim, specified that their explication applied only to explaining particular events (n. 33; Salmon 1988). A historian would be interested in why two specific countries did or did not fight at a certain time, but to a political scientist, this account would be a step toward understanding why democracies seldom fight each other. Browsing through political science journals, one finds that the bulk of the articles investigate general regularities, not particular events.

Social scientists need some concept of explanation other than the cover- ing-law one. The appropriate concept for theoretical explanation involves unification (Friedman 1974; Kitcher 1989; Gemes 1994). When the authors of Pathologies of Rational Choice Theory claim all explanations are causal (p. 187), this position underlines the difference between the covering-law and unification approaches: the kinetic theory of gases explains Boyle's law but does not cause it. The authors' insistence on putting prior domain restrictions on a theory also clashes with the idea of unification. For that purpose, one states a theory, then investigates to see how far its boundaries extend. As a research goal, unification is different from universalism; it is more limited and realistic. The phenomena that one tries to embrace with one's theory can be selected judiciously, under no pressure to extend it where it cannot go.

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

746 JOURNAL OF CONFLICT RESOLUTION

On the dimension of unification, game and decision models have a great advantage over most other approaches. In combination with other social science theories, they offer the possibility of bringing together very different kinds of behavior. How far that potential can be realized is yet to be revealed, but recognizing what the models are about is the first step toward a balanced discussion.

REFERENCES

Awtry, T. 1994. Is it fact, theory or hypothesis? Baton Rouge Advocate, 15 December, 14B. Berger J., and M. Delampady. 1987. Testing precise hypotheses. Statistical Science 2:317-52. Brush, S. 1989. Prediction and theory evaluation: The case of light bending. Science 246:1124-9. Cohen, J. 1990. Things I have learned (so far). American Psychologist 45:1304-12.

. 1994. The earth is round (p < .05). American Psychologist 49:997-1012. Critical Review. 1995. Vol. 18 (winter/spring). Denzin, K. 1990. Reading rational choice theory. Rationality and Society 2:172-89. Einhorn, H., and R. Hogarth. 1988. In Decision making: Descriptive, normative andprescriptive

interactions, edited by D. Bell, H. Raiffa, and A. Tversky, 113-46. New York: Cambridge University Press.

England, P., and B. Kilbourne. 1990. Feminist critiques of the separative model of self.

Rationality and Society 2:156-71. Friedman, Michael. 1974. Explanation and scientific understanding. Journal of Philosophy

71:5-19. Friedman, Milton. 1953. Essays in positive economics. Chicago: University of Chicago Press. Gemes, K. 1994. Explanation, unification and content. Nous 28:225-40. Gigerenzer, G. 1993. The superego, the ego and the id in statistical reasoning. In Handbook for

data analysis in the behavioral sciences: Methodological issues, edited by G. Seren and C. Lewis, 311-9. Hillsdale, NJ: Lawrence Erlbaum.

Greenwald, A. 1975. Consequences of prejudice against the null hypothesis. Psychological Bulletin 82:1-20.

Grunbaum, A. 1976. Ad hoc auxiliary hypotheses and falsificationism. British Journalfor the Philosophy of Science 27:329-62.

Hempel, C., and P. Oppenheim. 1948. Studies in the logic of explanation. Philosophy of Science 15:135-75.

Hofmann, J. 1988. Ampere's electrodynamics and the acceptability of guiding assumptions. In Scrutinizing science, edited by A. Donovan, L. Laudan, and R. Laudan, 201-18. Boston: Kluwer.

Jeffrey, R. 1975. Probability and falsification: Critique of the Popper program. Synthese 30:95-117.

Jervis, R., R. N. Lebow, and J. Snyder, eds. 1985. Psychology and deterrence. Baltimore: Johns

Hopkins University Press. Kahneman, D., and A. Tversky. 1979. Prospect theory: An analysis of decision under risk.

Econometrica 47:263-91. Keohane, R., G. King, and S. Verba. 1994. Designing social inquiry. Princeton, NJ: Princeton

University Press.

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

O'Neill / WEAK MODELS 747

Kitcher, P. 1989. Explanatory unification and causal structure. In Minnesota studies in the philosophy of science, vol. 13, edited by W. Salmon and P. Kitcher, 410-505. Minneapolis: University of Minnesota Press.

Klamar, A., D. McCloskey, and R. Solow. 1988. The consequences of economic rhetoric. Cambridge: Cambridge University Press.

McCloskey, D. 1985. The rhetoric of economics. Madison: University of Wisconsin Press. Meehl, P. 1967. Theory-testing in psychology and physics: A methodological paradox. Philoso-

phy of Science 34:103-15. .1990. Why summaries of research on psychological theories are often uninterpretable.

Psychological Reports 66:195-244. Morrison, D., and R. Henkel. 1970. The significance test controversy. Chicago: Aldine. Nelson, J. 1992. Gender metaphor and the definition of economics. Economics and Philosophy

8:103-125. Numbers, R. 1992. The creationists. New York: Knopf. Olson, C. 1976. Some apparent violations of the representativeness heuristic in humanjudgment.

Journal of Experimental Psychology-Human Perception and Performance 2:599-608. O'Neill, B. 1987. Nonmetric test of the minimax theory of two-person zerosum games.

Proceedings of the National Academy of Science 84:2106-9. 1991. Comments on Brown and Rosenthal's reexamination. Econometrica 59:503-7.

Pollard, P., and J. Richardson. 1987. On the probability of making Type I errors. Psychological Bulletin 102:159-63.

Popper, K. 1965. Conjectures and refutations. London: Kegan Paul. . 1972. Objective knowledge. Oxford: Clarendon. . 1974a. Autobiography. In The philosophy of Karl Popper, edited by P. Schilpp, 3-184.

La Salle, IL: Open Court. .1974b. Replies to my critics. In The philosophy of Karl Popper, edited by P. Schilpp,

961-1197. La Salle, IL: Open Court. . 1978. Natural selection and the emergence of the mind. Dialectica 32:343-55.

Putnam, H. 1974. The "corroboration" of theories. In The philosophy of Karl Popper, edited by P. Schilpp, 221-40. LaSalle, IL: Open Court.

Riker, W. 1990. Political science and rational choice. In Perspectives on positive political economy, edited by J. Alt and K. Shepsle, 163-81. Cambridge: Cambridge University Press.

Rosenkrantz, R. 1983. Why Glymour is a Bayesian. In Minnesota studies in the philosophy of science, vol. 10, edited by J. Earman, 65-90. Minneapolis: University of Minnesota Press.

Salmon, W. 1988. Four decades of scientific explanation. Minneapolis: University of Minnesota Press.

Scerri, E. 1994. Prediction of the nature of hafnium from chemistry, Bohr's theory, and quantum theory. Annals of Science 51:137-50.

Serlin, R., and D. Lapsley. 1993. Rational appraisal of psychological research and the good- enough principle. In Handbookfor data analysis in the behavioral sciences: Methodological issues, edited by G. Keren and C. Lewis, 199-228. Hillsdale, NJ: Lawrence Erlbaum.

Tukey, J. 1969. Analyzing data: Sanctification or detective work? American Psychologist 24:83-91.

Tversky, A., and D. Kahneman. 1974. Judgment under uncertainty: Heuristics and biases. Science 185:1124-31.

Worrall, J. 1988. Fresnel, Poisson and the white spot: The role of scientific prediction in the acceptance of scientific theories. In The uses of experiment, edited by D. Gooding, T. Pinch, and S. Schiffer, 135-57. Cambridge: Cambridge University Press.

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions

748 JOURNAL OF CONFLICT RESOLUTION

Zandvoort, H. 1988a. Nuclear magnetic resonance and the acceptability of guiding assumptions. In Scrutinizing science, edited by A. Donovan, L. Laudan, and R. Laudan, 337-58. Boston: Kluwer.

. 1988b. Macromolecules, dogmatism, and scientific change. Studies in History and Philosophy of Science 19:489-515.

The Foundation for the Promotion of Social Science Research on World Society-World Society Foundation-funds selected proposals for research on the structure of and changes in world society. The next deadline for applications is June 30, 1996. Selected projects may start in January 1997.

Further details are contained in World Society Studies, a series edited by the foundation and published by Transaction Publishers, New Bruns- wick, New Jersey. Information on the series and application forms are available from World Society Foundation, c/o Sociological Institute, University of Zurich, Ramistr. 69, CH-8001 Zurich, Switzerland.

This content downloaded from 169.229.32.137 on Thu, 8 May 2014 17:50:18 PMAll use subject to JSTOR Terms and Conditions