behavioral development and construct validity: the...

Psychological Bulletin Copyright 1983 by the1983, Vol 94, No. 1,18-38 American Psychological Association, Inc.

Behavioral Development and Construct Validity:The Principle of Aggregation

J. Philippe Rushton, Charles J. Brainerd, and Michael PressleyUniversity of Western Ontario, London, Ontario, Canada

Many important variables in behavioral development are presumed to be un-related because of repeated failures to obtain substantial correlations. In thisarticle, we explore the possibility that such null findings have often been due tofailures to aggregate. The principle of aggregation states that the sum of a set ofmultiple measurements is a more stable and representative estimator than anysingle measurement. This greater representation occurs because there is inevitablysome error associated with measurement. By combining numerous exemplars,such errors of measurement are averaged out, leaving a clearer view of underlyingrelationships. We illustrate the usefulness of this principle in 12 major areas ofdevelopmental research in which the issue of negligible correlations figures prom-inently: (a) the validity of judges' ratings, (b) the cross-situational consistency ofmoral character and personality, (c) the longitudinal stability of personality, (d)the coherence of stages of cognitive development, (e) metacognition, (f) the at-titude—behavior relationship, (g) the personality—behavior relationship, (h) therole-taking/altruism relationship, (i) the moral-judgment/altruism relationship,(j) the legitimacy of the construct of attachment, (k) the existence of sex differ-ences, and (1) the assessment of emotionality in animals. In a final section, wealso discuss the implications of the principle of aggregation for conducting ex-perimental research.

Imagine how unreliable assessing under-graduates' course performance with a singlemultiple-choice item would be, or measuringa person's IQ with a single item from a stan-dardized mental test, or making inferencesabout a population on the basis of resultsfrom a single subject. Yet these methods arenot unlike what is done in much construct-validation research, especially within thestudy of behavioral development.

Generally speaking, this article is a meth-odological critique of the way certain rela-tionships have been investigated in develop-mental research. Our specific concern is toargue that some lines of developmental re-search have been hampered by persistent fail-ures to take account of the principle of ag-gregation. We begin with a brief discussion

A preliminary version of this paper was presented atthe Merrill-Palmer Society, Detroit, May 1982. We wouldlike to thank Jack Block, John Borkowski, Douglas Jack-son, and Linda Siegel for discussions of the issues in-volved in this paper.

Requests for reprints should be sent to J. PhilippeRushton, Department of Psychology, University of West-ern Ontario, London, Ontario, Canada N6A 5C2.

of this principle in qualitative language. Wethen review 12 influential areas of develop-mental research from the perspective of ag-gregation. Finally, we consider the implica-tion of the principle of aggregation for ex-perimental research. The areas of researchwe cover are (a) the validity of judges' ratings,(b) the cross-situational consistency of moralcharacter and personality, (c) the longitudinalstability of personality, (d) the coherence ofstages of cognitive development, (e) meta-cognition, (f) the attitude—behavior relation-ship, (g) the personality—behavior relation-ship, (h) the role-taking/altruism relation-ship, (i) the moral-judgment/altruismrelationship, (j) the legitimacy of the con-struct of attachment, (k) the existence of sexdifferences, and (1) the assessment of emo-tionality in animals. Evidence is presentedthat the weak statistical relationships rou-tinely observed in these literatures may beconsequences of failures to aggregate.

AggregationAccording to the principle of aggregation,

the sum of a set of multiple measurements

18

THE PRINCIPLE OF AGGREGATION 19

is a more stable and unbiased estimator thanany single measurement from the set. Onereason is that there is always error associatedwith measurement. When several measure-ments are combined, these errors tend to av-erage out, thereby providing a more accuratepicture of relationships in the population.Perhaps the most familiar illustration of thiseffect is the rule in educational and person-ality testing that the reliability of an instru-ment increases as the number of items in-creases (e.g., Gulliksen, 1950; Lord & Nov-ick, 1968). For example, single items on theStanford-Binet IQ test only correlate about.15; subtests based on 4 or 5 items correlatearound .3 or .4, but the aggregated batteryof items that make up the Performance sub-scale correlates around .8 with the battery ofitems that make up the Verbal subscale.

One of the earliest illustrations of the prin-ciple of aggregation is the so-called "personalequation" in astronomy. In 1795, Maskelyne,the head of the Greenwich observatory, dis-charged an otherwise capable assistant be-cause he recorded transits of stars across a

- vertical hair line in the telescope about halfa second "too late." Maskelyne estimated theerror of his assistant's measurements by com-paring them with his own observations, whichhe naturally assumed to be correct. An ac-count of these facts in a Greenwich obser-vatory report was noted by a German as-tronomer, Bessel, some decades later, and ledhim to test astronomers against each other,with the result that no two agreed preciselyon the time of a transit. Clearly, the only sen-sible estimate of .a star's transit across thehairline was some average of many obser-vations, not one.

Personal-equation phenomena were wellknown in psychology during the nineteenthcentury. Both psychophysicists and early stu-dents of reaction time took considerablepains to remove measurement errors attrib-utable to idiosyncratic differences betweensubjects. Many early averaging techniqueswere formulated for this express purpose(Stevens, 1951; Woodworth & Schlosberg,1939). In everyday life, similar averagingtechniques are used in subjective decisionmaking situations. For example, the reliabil-ity of decisions about whom to award prizesfor cooking, handicrafts, wine making, phys-

ical beauty, and so on is enhanced by aver-aging the decisions of several judges. Thisprocedure is also routine in forms of athleticcompetition where performance criteria arepartially subjective (e.g., diving, gymnastics).When gradations in qualities to be discrim-inated are fine, the only fair procedure is toobtain many judgments.

Researchers in the psychometric traditionhave made csimilar points. For example, anearly paper by Spearman (1910) on theproper use of correlation coefficients containsthe following observations:It is the superposed accident (measurement error) thatthe present paper attempts to eliminate, herein followingthe custom of all sciences, one that appears to be anindispensable preliminary to getting at nature's laws.This elimination of the accidents is quite analogous to,and serves just the same purpose as, the ordinary processof "taking means" or "smoothing curves."

The method is as follows. Let each individual be mea-sured several times with regard to any characteristic tobe compared with another. (pp. 273-274)

Unfortunately, Spearman's advice has rarelybeen taken in many types of developmentalresearch. Although it has been known fordecades in both the experimental and differ-ential "halves" of psychology (Cronbach,1957), it is lost on much of contemporarydevelopmental psychology. Psychologists in-terested in behavioral development have of-ten assessed constructs such as altruism, role-taking ability, stage of cognitive development,metamemory, attachment, sex differences,and many others using only a single measure.It is not surprising, therefore, that relation-ships involving these constructs have beenweak. When multiple measures of each con-struct are used, relationships become moresubstantial.

The Utility of Judges' Ratings

One traditionally important source of data.in developmental psychology has been thejudgments and ratings of children made bytheir teachers and peers. In recent years,judges' ratings have been much maligned onthe ground that they are little more than"erroneous constructions of the perceive?'(e.g., Kenrick & Stringfield, 1980, p. 88). Thisview has led to much disenchantment withthe use of ratings. The main empirical reasonthat is cited for rejecting rating methods is

20 J. P. RUSHTON. C. BRAINERD, AND M. PRESSLEY

that judges' ratings only correlate, on the average, .20 to .30. However, it is questionable that correlations between any two judges' ratjngs are stable and representative. The validity of judgments is likely to increase as the number of judges becomes larger.

One early demonstration of the principle of aggregation as it applies to judges' ratings was provided by Knight ( 1921 ). Knight's subjects estimated the temperature of a class~m. Although individual estimates were reasonably accurate, ranging from 60° to 80°, the mean rating of 72.4° was virtually the same as the temperature indicated by a thermometer (72°). Shortly thereafter, Gordon (1924) had subjects rank order a series of objects by weight. The rank orderings were then correlated with true ranks, using average rank orderings from different numbers of subjects as the predictor variable. When the number of subjects increased from I to 5 to 50, the corresponding validitY correlations increased from .41 to .68 to .94.

Similar findings are available in connec.tiori with judges' ratings of socia1 variables (Stevens, 1972). For example, Eysenck ( 1939) gathered judges' estimates of both the weights of objects and the aesthetic value of pictures. He showed that the more estimates that were averaged, the higher was the correlation with the true score for both variables. As Figure I illustrates, the function relating judgmental validity to number of judges is essentially the same for highly subjective estimates of the

'J w

i (' ' ~

" I w

~

0 ' D ro "' "' IU.eER ~ ".l..CG£S

Figure 1 . . Relation ~n number of judges (square root) and COO"elation of their pooled judgments with in· dependent crileri.on. (A = aesthetic judgment; B = weight judgments. After Eysenck, 1939).

'° ,,

"' 0 ., 3 ~ •o

" "'

1 2 3 4 NW18ER OF RATERS

Figure 2. Convergent validity coefficients for dominance based on varying numbers of weeks of observations and varying numbers of raters. (After Moskowitz & Schwarz, 1982).

aesthetic value of pictures using an independent criterion (Curve A) and judgments of the weight of objects, for which there was an objective criterion (Curve B).

Another study demonstrating the aggregation effect for judges' ratings of behavioral variables was reported by Moskowitz and Schwarz (1982). These authors varied the number of raters judging a target and the number of observation periods allowed prior to the prediction of actual behavior counts of dominant behaviors ( Figure 2). The findings are clear: validity increases directly as the number of raters increases and ¥ the number of groups of observations incrCases..

Judges' ratings of various dimensions, when assessed reliably by aggregating across judges. have considerable predictive utility. For example, Eron ( 1980) found that average peer ratings of aggressiveness at age 8 years correlated .43 with the average ofa different set of peer ratings of aggressiveness at age 19. Also, those who bad been rated as aggressive at age 8 were three times as likely to have police records by age 19 than those not so rated. This Suggests that perceptions of personality can be both stable over time and predictive of behavior. Another example is a study by Rushton, Murray, and Paunonen (1983), in which it was found that ratings of university professors, made by their faculty


peers on 29 different personality traits, agreedan average of .56 with an independent set ofratings made by the professors' students. Ofeven more importance is that these ratingsof personality traits allowed significant pre-diction of the professors' research and teach-ing effectiveness as measured by independentcriteria. Effective researchers, for example,were found to be ambitious, enduring, seek-ing definiteness, dominant, showing leader-ship, aggressive, independent, nonmeek, andnonsupportive. Effective teachers, on theother hand, were found to be liberal, sociable,showing leadership, extraverted, nonanxious,and supporting. In summary, if judges' rat-ings are aggregated over numerous judges, theaggregated score is predictive of behaviorand, presumably, more reflective of psycho-logical reality.

Moral Character in Childhood: TheQuestion of Cross-Situational Consistency

in Personality Development

For several decades, there have been two_opposing viewpoints on the question ofwhether human behavior is consistent acrosssituations. The classic study of this problemis the enormous "character education in-quiry" carried out by Hartshorne and Mayin the 1920s and published in three books(Hartshorne & May, 1928; Hartshorne, May,& Mailer, 1929; Hartshorne, May, & Shut-tleworth, 1930). These investigators gave11,000 elementary and high school studentssome 33 different behavioral tests of-altruism(referred to as the "service" tests), self-con-trol, and honesty in home, classroom, church,play, and athletic contexts. Concurrently, rat-ings of the children's reputations with teach-ers and classmates were obtained. Altogethermore than 170,000 observations were col-lected. Scores on the various tests were cor-related to discover whether behavior is spe-cific to situations or consistent across them.This study will be discussed in some detailbecause it is the largest study of the questionever undertaken, it raises most of the majorpoints of interest, and it has been seriouslymisinterpreted by many investigators, asnoted by Burton ( 476), Eysenck (1970), andRushton (1980). The various tests adminis-tered to the children are summarized in Ta-ble I.

We first consider the results based on themeasures of altruism. Any one behavioraltest of altruism correlated, on the average,only .20 with any other test. But when thefive behavioral measures were aggregatedinto a battery, they correlated a much higher.61 with the measures of the child's altruisticreputation among his or her teachers andclassmates. Furthermore, the teachers' andpeers' perceptions of the students' altruismwere in close agreement (r = .80). These latterresults indicate a considerable degree of con-sistency in altruistic behavior. In this regardHartshorne et al. (1929) wrote:The correlation between the total service score and thetotal reputation score is .61. . . . Although this seemslow, it should be borne in mind that the correlationsbetween test scores and ratings for intelligence seldomrun higher than .50. (p. 107)

Similar results were obtained for the mea-sures of honesty and self-control. Any onebehavioral test correlated, on average, only.20 with any other test. If, however, the mea-sures were aggregated into batteries, thenmuch higher relationships were found eitherwith other combined behavioral measures,with teachers' ratings of the children, or withthe children's moral knowledge scores. Often,these correlations were on the order of .50to .60. For example, the battery of tests mea-suring cheating by copying correlated .52with another battery of tests measuring othertypes of classroom cheating. (See, for ex-ample, Vol. 1, Book 2, pp. 130-133, Tables97-99; Vol. 2, Book 1, p. 104, Table 20; Vol.2, Book 2, pp. 351-352; Vol. 3, pp. 166, Table19). Thus, depending on whether the focusis on the relationship between individualmeasures or on the more representative re-lationship between averaged groups of be-haviors, the notions of situational specificityand situational consistency are both sup-ported. Which of these two conclusions ismore accurate? •

Hartshorne and colleagues (1928, 1929,1930) focused on the small correlations of.20 and .30. Consequently,. they argued fora doctrine of specificity:Neither deceit nor its opposite, 'honesty' are unifiedcharacter traits, but rather specific functions of life sit-uations. Most children will deceive in certain situationsand not in others. Lying, cheating, and stealing as mea-sured by the test situations used in these studies are onlyvery loosely related. (1928, p. 411)

22 J. P. RUSHTON, C. BRAINERD, AND M. PRESSLEY

Table 1Some of the Measures Used in the Studies in the Nature of Character Investigation

Tests Nature and scoring of the task

Service testsSelf-or-class test

Money voting testLearning exercises

School-kit testEnvelopes test

Honesty testsCopying technique

Duplicating technique

Improbable achievement

Double testing technique

StealingLying

Self-control testsStory resistance tests

Puzzle mastery testsCandy testTickle testBad odor test

Bad taste test

Knowledge of moral rulesCause—effect testRecognitions test

Social—ethical vocabularyFree-response foresight test

Probability test

Reputational ratingsRecording of helpful actsThe "guess who" test

Check list

Whether the student chose to enter a competition to benefit him- or herself orthe class.

Whether the student voted to spend class money on him- or herself or charity.Whether the student learned material when performance increments led to

money going to the Red Cross.Number of items donated to charity from a pencil case given to child.Number of jokes, pictures, etc. collected for sick children in an envelope

provided.

Whether student cheated on a test by copying answers from the person next tohim or hen

Whether student cheated on a test by altering answers after his or her paper hadbeen duplicated without his or her knowledge.

Whether student cheated as indicated by an improbably high level ofperformance on a task.

Whether students' scores on an unsupervised test (e.g., number of pushups)decreased when a retest was supervised.

Whether students stole money from a puzzle-box.Whether students admitted to having cheated on any of the tasks.

Time students persisted in trying to read the climax of an exciting story whenwords ran into each other.

Time spent persisting at difficult puzzlesThe number of pieces of candy not eaten in a "resisting temptation" paradigm.The ability to keep a "wooden face" • while being tickled by a feather.The ability to keep a "wooden face" while having a bad odor placed under the

nose.The ability to keep a "wooden face" while tasting unrefined cod liver oil.

Agreement to items such as "Good marks are chiefly a matter of luck."Agreement that items such as "Copying composition out of a book but

changing some of the words" constituted cheating.Picking the best definition of words denoting moral virtue (e.g. bravery, malice).Students wrote out consequences for transgressions such as "John accidentally

broke a street lamp with a snowball."Students ranked the probability of various outcomes for such behaviors as

"John 'started across the street without looking both ways."

For 6 months teachers recorded helpful acts performed by students.Children wrote names of classmates who fitted very short descriptions (e.g.,

Here is someone who is kind to younger children . . .).Teachers rated each child on adjectives such as kind, considerate, and stingy.

Their conclusions and data have oftenbeen cited in the subsequent literature as sup-porting situational specificity. For example,Mischel (1968), in an influential review, ar-gued for specificity on the ground that theaverage correlation between behavioral in-stances of a "trait" is .20 to .30. Accordingto Mischel (1973), people exhibit "discrimi-native facility" between situations.

The specificity hypothesis has been usefulbecause it emphasizes that contexts are im-portant and that people have different meth-ods of dealing with different situations. Un-fortunately, it has sometimes been inter-preted as meaning that cross-situationalconsistency does not exist, or at least that itdoes not exist in sufficient quantity to makethe concept of traits very useful. This, how-


ever, is quite wrong. By focusing on corre-lations of .20 and .30 between any two mea-sures, a misleading impression is created. Amore accurate picture is obtained by usingthe principle of aggregation and examiningthe predictability achieved from a number ofmeasures. To reiterate, this effect occurs be-cause the randomness in any one measure(error variance) is averaged out over severalmeasures, leaving a clearer view of what aperson's true behavior is like. Correlations of.50 and .60 based on aggregated measuressupport the view that there is cross-situa-tional consistency in altruistic and honestbehavior.

Further evidence for this conclusion isfound in Hartshorne's and May's data. Ex-amination of the relationships between thebattery of altruism tests and batteries con-cerned with honesty, self-control, persi stence,and moral knowledge suggest there may bea general moral character factor (see, e.g.,Hartshorne et al., 1930, p. 230, Table 32).Mailer (1934) was one of the first to note this.Using Spearman's tetrad difference tech-nique, Mailer isolated a common factor inthe intercorrelations of the character tests ofhonesty, altruism, self-control, and persis-tence. Subsequently, Burton (1963) reana-lyzed the Hartshorne and May data andfound a general factor that accounted for35%-40% of the common variance.

Since the pioneering work of Hartshorneand colleagues (1928, 1929, 1930), manyother studies have provided data that speakdirectly to the specificity versus consistencyof moral behavior. As other reviewers havenoted (Burton, 1976; Rushton, 1980), thetypical correlation between any two behav-ioral indices is about .30. Combining mea-sures, on the other hand, normally leads togreater predictability. Failures to ' take ac-count of this fact have led to the widespreadand erroneous view that moral behavior isalmost completely situation specific. This, inturn, has led students of moral developmentto neglect research concerned with the ori-gins of general moral "traits." The fact that,judging from the aggregated correlationaldata, such traits exist, and, moreover, thatthey seem to appear early in life, poses a con-siderable challenge to moral development re-

search (Rushton, in press). The argumentpresented for the existence of moral traits,of course, also applies to the existence ofother personality traits (Epstein, 1979, 1980;Eysenck, 1970, 1981; Rushton, Jackson, &Paunonen, 1981).

Longitudinal Stability of Personality

Block (1981) has pointed out that the ques-tion of cross-situational consistency becomesa question about longitudinal consistencywhen the time dimension is introduced. Towhat extent, over both time and situation, doa person's behaviors stem from enduringtraits of character? Currently, this is a con-troversial question (Brim & Kagan, 1980;Rubin, 1981). A prerequisite for answeringit is reliable and valid measurement. Whenstudies measure personality at different agesby aggregating over many different assess-ments, longitudinal stability is usually found.But when single measurements or other lessreliable techniques are used, the longitudinalstability of personality is less marked (Block,1971; Rubin, 1981). In Block's (1971, 1981)work, for example, where the principle ofaggregation has been strictly adhered to, sub-stantial coherence of personality has beenfound over several decades.

Block analyzed extensive personality datafrom about 170 individuals. Data were firstobtained in the 1930s when the subjects werein junior high school. Further data were gath-ered when the subjects were in their lateteens, in their mid-30s, and in their mid-40s.The archival data so generated were enor-mously wide-ranging and often not in a formpermitting of direct quantification. To sys-tematize the data, Block employed clinicalpsychologists to study individual dossiers andto rate the subject's personality using the Q-sort procedure—a set of descriptive state-ments such as "is anxious," which can besorted into piles that indicate how represen-tative the statement is of the subject (Block,1961). To ensure independence, the materialsfor each subject were carefully segregated byage level, and no psychologist rated the ma-terials for the same subject at more than onetime period. The assessments by the differentraters (usually three for each dossier) were


found to agree with one, another to a signif-icant degree, and they were averaged to forman overall description of the subject atthat age.

Using this careful procedure of aggregatedratings based on diverse items of informa-tion, Block (1971, 1981) found personalitystability across the ages tested. Even the sim-ple correlations between Q-sort items overthe 30 years between early adolescence andthe mid-40s provided evidence for stability.Correlations indicating stability were, for ex-ample, for the male sample: "genuinely val-ues intellectual and cognitive matters," .58;"is self-defeating," .46; "has a high aspirationlevel," .45; and "has fluctuating moods," .40;for the female sample, "is an interesting, ar-resting person," .44; "pushes and tries tostretch limits to see what she can get awaywith," .43; "esthetically reactive," .41; and"is cheerful," .36. When the whole range ofvariables for each individual was correlatedover 30 years, the mean correlation was .31.These are lower bound estimates, uncor-rected for the inevitable presence of unreli-ability of measurement. Even more substan-tial relationships could be expected to occurif individual items were aggregated to createtypologies (see Block, 1971, 1981). Block's(1981) results, therefore, demonstrate thatwhen personality is measured adequately,longitudinal stability is found. Moreover, thisstability has important implications for otherpsychological phenomena (Eichorn, Clausen,Haan, Honzik, & Mussen, 1981). As Block(1981) points out, the problem is to identifyconditions that lead to consistency and con-ditions that lead to change. This, like the pre-vious section on moral character, poses aconsiderable challenge to current theorizing(Rushton, in press)—a problem that does noteven exist unless aggregated measures areused.

Stages of Cognitive Development

In Piagetian theory, cognitive developmentis described in terms of a sequence of qual-itatively distinct states called stages. Eachstage is defined by a unique set of cognitivestructures, the so-called structures d'ensembleprinciple. The prototypical conceptual skillsof each stage are said to be generated by thestructures in a manner that is not unlike the

way that a mathematician derives theoremsfrom a set of axioms (Brainerd, 1978a). Inthe case of the concrete-operational stage, forexample, such familiar concepts as conser-vation, seriation, classification, horizontality,and the like are all thought to be generatedby eight quasi-algebraic entities known asgrouping structures (Piaget, 1941, 1949).

To students of cognitive development, ithas seemed almost self-evidently true thatthese assumptions demand strong correla-tions between same-stage concepts. The ra-tionale is straightforward. If two same-stageconcepts are both monotonically related tothe same structure, then they should bemonotonically related to each other. Thisprediction was explored in several early stud-ies concerned with the concrete-operationalstage (e.g., Dodwell, 1961; Elkind, 1961). Asa rule, the design of these studies consistedof administering tests for a few concrete-op-erational concepts plus a standardized abilitytest of some sort (e.g., Stanford-Binet Intel-ligence Scale). The general idea was that cor-relations between tests for same-stage con-cepts, which share a common conceptualbase (e.g., grouping structures), should behigher than correlations between these testsand standardized measures. This patternfailed to emerge. Instead, the typical resultwas that all correlations, whether betweensame-stage concept tests or between thesetests and standardized measures, were, asElkind remarked in connection with one ofhis studies, "sometimes significant, and usu-ally low" (1961, p. 44). It was even found insome studies that same-stage measures cor-related better with standardized ability teststhan with each other (e.g., Dodwell, 1961).The bulk of this early stage-validation liter-ature was reviewed by Flavell (1963, Chapter11).

The situation has not changed appreciablyduring the intervening years. For example,Berzonsky (1971) reported a study in whichthree types of measures were administeredto a sample of elementary school students:(a) tests of various concrete-operational con-cepts, (b) tests of various formal-operationalconcepts, and (c) a standardized intelligencetest. The obvious predictions from Piagetiantheory are, first, that the measures in Cate-gory a should correlate more highly with each


other than with the measures in Categoriesb and c, and second, that the measures inCategory b should correlate more highly witheach other than with the measures in Cate-gories a and c. In the event, however, the cor-relations within category were generally low.Moreover, when the data were subjected tofactor analysis, a five-factor solution was ob-tained that was not related to the test's stageclassifications in any sensible way.

More recently, Ford (1979) has challengedthe validity of the egocentrism construct(concrete-operational stage) on the basis ofobserved correlations between different mea-sures of egocentrism. Ford pointed out thatcorrelations among tests for three types ofegocentrism (affective, cognitive, and spatial)have usually been "low and often nonsignif-icant" (p. 1169). In place of Piagetian stagesand structures, Ford argued that such smallcorrelations "are more parsimoniously in-terpreted as the result of other explanatoryconstructs, such as those referring to the gen-eral level of cognitive, perceptual, or linguis-tic-development of the child" (p. 1185). Theresults of a subsequent study of this problemby Scheffman (1981) appeared to bear outFord's argument. Scheffman administeredtests of four types of egocentrism (spatial,cognitive, affective, and communicative,) to

a sample of 3- to 5-year-olds. Scheffman'sdata are unpublished; we reproduce them inTable 2. The correlations range from a lowof –.34 to a high of .72, with the mean cor-relation being .06. At first glance, these re-sults seem to argue strongly against the po-sition that the tests measure a common theo-retical construct.

In accord with the view being presentedhere, however, there is considerable motiva-tion for reexamining the Piagetian stage lit-erature in the light of the principle of aggre-gation. The typical design has involved ad-ministering a single test for each targetconcept rather than multiple tests for each.If, for example, a hypothetical study is con-cerned with transitivity, seriation, and con-servation (concrete-operational stage), thenthe standard design would consist of admin-istering one transitivity test, one seriationtest, and one conservation test to a sampleof children. Worse, following the Piagetiantradition of assigning children to stages, per-formance on individual tests has simply beenscored pass/fail in most studies (cf. Brainerd,1977a, 1977b). From the standpoint of theaggregation principle, the chances of obtain-ing high correlations between single testsscored as pass or fail are not promising. Infact, the low correlations that have been re-

Table 2Correlations Based on Nonaggregated Measures of Four Types of Egocentrism(After Scheffrnan, 1981)

Egocentrism variables

Spatial Cognitive Affective CommunicativeEgocentrism

variables 1 2 3 4 5 6 7 8 9 10

Spatial41 17 01 05 04 22 —16 05 12

2 16 18 21 12 —05 —30 —19 —17Cognitive

3 11 14 —17 14 —15 10 —124 24 03 —11 02 —03 —045 00 —01 —34 —04 —16

Affective6 05 —04 —26 227 29 29 31

Communicative8 OS 109

l072

Note. Decimal points have been omitted.

• 26 J. P. RUSHTON, C. BRAINERD, AND M. PRESSLEY

Table 3Correlations Based on Aggregated Measuresof Four Types of Egocentrism(After Scheffman, 1981)

Egocentrism variable

Egocentrism Cogni- Affec- Commu- Totalvariable tive tive nicative scale

SpatialCognitiveAffectiveCommunicative

.22 .16.13

—.24.12.34

.58

.49

.61

.05

ported in most articles are about all that onehas a right to expect.

• Do same-stage correlations increase whentest scores are aggregated across two or moremeasures of individual concepts? Unfortu-nately, it is not possible to make confidentstatements on the basis of the extant litera-ture because, as we mentioned, only one testper concept has been administered in moststudies. Moreover, of those few studies withmultiple measures per concept (e.g., Hooper,Swinton, & Sipple, 1979; Hooper, Toniolo,& Sipple, 1978), aggregate correlations havetypically not been reported in published ver-sions of the research. Nonetheless, there is .some evidence available that suggests thatsame-stage correlations may be much largerfor aggregated measures than for individualones. We present two illustrations.

First, we reconsider the Scheffman (1981)study of egocentrism. In this study, two testsof spatial egocentrism, three tests of cognitiveegocentrism, two tests of affective egocen-trism, and three tests of communicative ego-centrism were administered. Aggregate cor-relations for these measures appear in Table3. These aggregate correlations are of twotypes. First, correlations between the com-bined scores for tests of particular forms ofegocentrism are reported in the first threecolumns. Note that these correlations are twoto five times larger than the .06 average inTable 2. Second, correlations between com-bined scores for each form of egocentrismand combined scores for all 10 tests are re-ported in the fourth column. Here, the ag-gregation effect is quite dramatic: The cor-relations for three types of egocentrism (spa-tial, cognitive, affective) account for a quarter

to a third of the variation. The importantpoint is that the theoretical conclusions thatemerge from the individual correlations (Ta-ble 2) and the aggregate correlations (Table3) are completely different. On the basis ofTable 2, one would be likely to conclude, inline with Ford's (1979) review, that these ego-centrism tests do not measure any commonability. But on the basis of the fourth columnof Table 3, one would probably conclude thattests for three types of egocentrism tap a com-mon ability, whereas tests for the fourth typeof egocentrism (communicative) tap someunrelated ability.

Second, we consider some data from a lon-gitudinal, construct-validation study of theconcrete-operational stage by Hooper and hisassociates (e.g., Hooper, Brainerd, & Sipple,1975; Hooper & Toniolo, 1974). In this study,tests designed to measure the compositionand reversibility operations of Piaget's eightgrouping structures were administered tosubjects in the kindergarten to Grade 6 range.The test for each grouping structure was com-posed of eight subtests, four subtests mea-suring the relevant composition operationand four subtests measuring the relevant re-versibility operation, each of which was scoredpass/fail. A key prediction under investiga-tion was that the subtests in each blockshould be highly correlated because they pre-suppose the same structure. However, lowpositive correlations were observed in mostcases. The picture was different when per-formance on individual subtests was corre-lated with the aggregated scores for theirstructure. The average same-structure cor-relation rose to .55 for Piaget's four classifi-catory structures (grouping 1-4) and to .58for his four relational structures (grouping 5-8). Once again, the message from the aggre-gate correlations is that tests that do not cor-relate very well with each other neverthelessmay measure common traits.

To avoid misinterpretation, we would liketo stress that we are not disposed to use theseaggregation effects as grounds for arguingthat Piaget's stage construct ought to be res-urrected. The list of logical and empiricalobjections to stages is far too long to be dis-missed this easily (see papers by various com-mentators in Brainerd, 1978b, 1979). Ouronly contention is that, by and large, the pre-


diction that most investigators regard as theminimum requirement of stage models, highcovariation between same-stage traits, hasnot been fairly tested in the case of Piaget's

' theory.

The Relationship Between Cognitionand Behavior

A great deal of research in psychology hassought to predict behavior from knowledgeof hypothesized processes thought to be op-erating within the organism. Five current ex-amples are metacognition, attitudes, person-ality trait, role taking ability, and level ofmoral judgment. In the main, accurate be-havioral prediction from measures of inter-nal constructs has rarely been achieved. Webelieve that this lack of success may be dueto a failure to measure a wide enough (ag-gregated) sampling of either the internal pre-dictor variables or the behavioral criteria. Wereview literature that supports this conten-tion.

Metacognition as an Explanatory Construct

Metamemory is usually defined as every-thing that one might know or come to knowabout memory, including what one can doto enhance memory, characteristics of peoplewho are good and poor memorizers, andwhat task variables enhance or inhibit mem-ory (e.g., Flavell & Wellman, 1977). A pri-mary hypothesis of metamemory- theory isthat metamemory may mediate memory be-havior (Flavell & Wellman, 1977). In morefamiliar information processing terms, meta-memory has been hypothesized to be the"executive" that directs memory function-ing. Most investigators would probably agreethat before causally oriented research onmetamemory–memory connections can becarried out, substantial correlations betweenmetamemory and memory behaviors mustbe demonstrated. The search for such cor-relations initially produced little success(Pressley, Borkowski, & O'Sullivan, in press).Our review of the research indicates that theaggregation principle was largely ignored inearly metamemory research and that it mayhave accounted for the lack of metamemory-memory correlations in those studies.

The standard design was to administer oneor a very few metamemory measures andcorrelate performance on these measureswith a single memory measure, such as strat-egy usage or amount remembered. Not sur-prisingly, from our perspective, when thesesingle metamemory measures of low reli-ability (Kurtz, Reid, Borkowski, & Cavan-augh, 1982) were used to predict single mem-ory measures (of unknown reliability), fewsubstantial correlations occurred (e.g., Brown& Campione, 1977; Kelly, Scholnick, Trav-ers, & Johnson, 1976; Salatas & Flavell,1976).

Even when multiple measures of meta-memory have been available, researchershave chosen to analyze individual rather thancomposite measures. One study is particu-larly notable in this connection. Cavanaughand Borkowski (1980) administered a meta-memory test originally devised by Kreutzer,Leonard, and Flavell (1975) that was com-posed of 13 items. The elementary schoolchildren in the study were subsequently ad-ministered three memory tasks. Performanceon each of the 13 metamemory items wasseparately correlated with performance oneach of the memory measures. The meancorrelation produced in this analysis was..1with the highest correlation being .38. Wehave no way of knowing what the relationshipwould have been had a metamemory aggre-gate measure and/or a memory–behavior ag-gregate measure been used in the analysisbecause no such analysis was conducted.What is known is that the reliabilities of theindividual metamemory items were quitelow: The mean reliability of individual itemswas found to be .38 in a later study (Kurtzet al., 1982). Borkowski (Note 1) reportedthat by aggregating several metamemoryvariables, higher metamemory–memory cor-relations are obtained than those reported byCavanaugh and Borkowski (1980). In theonly published paper in which this was done(Kurtz et al., 1982), the two metamemory-memory behavior correlations were .39 and.26, which are not large but are comparableto the highest correlations obtained in theCavanaugh and Borkowski (1980) study.

As alluded to earlier, research on meta-cognitive development encompasses morethan metamemory. One recent study of


metacognitive reading skills provides com-pelling support for our aggregation argu-ments. Forrest (1980) administered severalmeasures of knowledge of reading and Ian-

' guage to elementary schoolers. When indi-vidual measures were correlated with actualreading scores, the correlations were very lowwith a mean of .17. Forrest also aggregatedher scores into two categories (e.g., knowl-edge of decoding, knowledge of comprehen-sion skills) and correlated aggregated scoreswith actual reading achievement. The cor-relations were impressively larger (M = .40),and thus, Forrest concluded that a relation-ship between metacognition and reading ex-ists.

One argument against combining meta-memory tests into composite measures hasbeen that composite measures are not con-ceptually meaningful; the aggregate is a hodge-podge of diverse intellectual processes (seeCavanaugh & Borkowski, 1980). However,

• this argument does not preclude the con-struction of tests consisting of several itemsmeasuring individual metamemory skills. Itis to be hoped that some attention will begiven to the development of such tests. At aminimum, however, the search for meta-memory—memory correlations must be bet-ter informed about the dependence of suchcorrelations on aggregation effects. Forrest's(1980) work on metacognition and readingprovides a model of how multi-item meta-cognitive scales for particular cognitive pro-cesses can be constructed and related to ac-tual behavior, that metamemory researchersshould find instructive.

Attitude—Behavior RelationshipsIn a well-known review of attitudes and

attitude change, McGuire (1969) concludedthat "the person's verbal report of his attitudehas a rather low correlation with his actualbehavior toward the object of the attitude"(p. 156). Fishbein and Ajzen (1974) subse-quently explained this result in terms of theprinciple of aggregation. They pointed outthat although a great deal of effort had goneinto refining techniques for measuring atti-tudes, relatively little consideration had beengiven to the adequacy of measurements onthe behavioral end of the attitude—behaviorrelationship. Hence, whereas attitudes were

often measured by multi-item scales, thebehavior to be predicted was usually a sin-gle act.

Fishbein and Ajzen (1974) proposed thatmultiple-act criteria be used on the behav-ioral side to see whether, under these circum-stances, attitudes would predict behavior bet-ter. Using a variety of attitude scales to mea-sure religious attitudes and a multiple-itemself-report behavior scale to measure reli-gious behaviors, they found that attitudeswere related to multiple-act criteria but hadno consistent relationship to single-act cri-teria. Whereas the various attitude scales hada mean correlation with single behaviorsranging from .14 to .19, their correlationswith aggregated behavioral measures rangedfrom .70 to .90. Fishbein and Ajzen's (1974)paper provides another dramatic example ofthe principle of aggregation in action.

Predicting Social Behavior fromPersonality Traits

In a similar paper to Fishbein and Ajzen's,Jaccard (1974) examined why personalitytraits rarely correlate better than .3 with so-cial behavior. He suggested that, as in theattitude—behavior literature, whereas muchthought had been given to the creation ofmultiple-item personality scales, relativelylittle attention had been given to measure-ment of the to-be-predicted behaviors. Usu-ally, it was a case of a highly reliable, mul-tiple-item personality scale being correlatedwith a single item of behavior of unknownreliability and validity. Jaccard (1974) there-fore carried out an investigation to determinewhether the dominance scales from the Cal-ifornia Psychological Inventory (Gough,1957), and the Personality. Research Form(Jackson, 1967), would predict self-reporteddominance behaviors better in the aggregatethan they would at the single-item level. Theresults were in accord with expectations.Whereas both personality scales had a meancorrelation of .20 with individual behaviors,the aggregated correlations were :58 and .64.

The Role-Taking/Altruistic-BehaviorRelationship

Earlier, we mentioned research on the con-struct of egocentrism (Piaget, 1932). A topic


that is closely related to egocentrism is thenotion of role taking ability. Essentially, it hasbeen argued that around the age of 7 children"decenter" and henceforth are able to "takethe role of another." There has been extensiveresearch on the relationship between role tak-ing and social behaviors such as altruism.Typically, the correlations have been low(e.g., .2 or .3), and specific relationships haveoften proved unreplicable. This has led re-viewers to question both the adequacy of theegocentrism construct ( Ford, 1979) and theadequacy of the methods used to measure it(Krebs & Russell, 1981; Rubin, 1978).

In Ford's (1979) review, as we have alreadydiscussed, it was noted that role taking taskshave had either low or nonsignificant corre-lations with each other. This led Ford to con-clude that there was little or no support forthe construct validity of egocentrism. He pro-posed, instead, an alternative interpretationbased on task-specific and response-specificcognitive constructs. In Krebs and Russell's(1981) review of the same literature, a dif-ferent conclusion was advanced. These au-thors proposed that an improved theory ofegocentricity was needed. They stated "It islargely due to the absence of an overridingtheory, we contend, that studies have failedto conduct valid tests of the relationshipamong measures (p. 148)." They implied thatwith a better theory, improved measurementtechniques would follow.

We suggest a third and more straightfor-ward interpretation. When using role takingas a predictor variable, only one item hasusually been administered. These items varywidely in reliability (Enright & Lapsley,1980). Moreover, one item has normally beenused for each to-be-predicted variable. Undersuch circumstances, it would be surprisingto find correlations much higher than .2. Butsuppose that the role taking tasks were com-bined into a battery and an aggregated, com-posite score of role taking ability were used.A study of this sort has been reported byElder (1982).

Elder administered batteries of four roletaking tasks, five prosocial moral judgments,and several indices of altruism (including 5laboratory measures) to 89 seven-year-oldchildren. Teacher ratings and measures ofplay behavior were also obtained. The im-

portant finding for our purposes is thatwhereas the various correlations of each roletaking task with each laboratory measure ofaltruism had a mean of .10, aggregated roletaking scores correlated an average of .45with an aggregated altruism score. Onceagain, spurious and misleading conclusionsflow from a literature of single-measurementstudies.

Additional support for our interpretationcomes from the results of a meta-analysis ofthis literature by Underwood and Moore(1982). Meta-analysis refers to aggregatingover independent studies rather than, as wehave discussed so far, over independent mea-sures within the same study. The basic prin-ciple, of course, is the same (i.e., that errorsassociated with any one measurement aver-age out when measurements are combined).Underwood and Moore (1982) used a meta-analytical technique to assign exact proba-bilities to the results of a series of studies. Onthis basis, they concluded that reliable rela-tionships are to be found between role takingand altruism.

The Moral-Judgment/Altruistic-BehaviorRelationship

Finally, we might note that whereas thestudies concerned with the relationship be-tween role taking and altruism present a con-fused picture, those concerned with the re-lationship between moral judgment and al-truism appear to be a little more reliable(Blasi, 1980; Rushton, 1980).. It also happensthat the researchers who examine the rela-tionship between moral judgment and altru-ism have typically combined responses to theseveral moral measures into an overall ag-gregate score.

Attachment

In the early 1970s there was great pessi-mism about whether there is a unique typeof bond between mothers and their infants—that is, whether the intuitively appealing con-struct of infant–mother attachment was valid.This pessimism grew out of the results ofstudies that included a number of measurespresumed to assess mother–infant attach-ment (e.g., visual regard toward mother byinfant, vocalizing by infant to mother, touch-


ing of infant by mother). The general findingwas that these "attachment" behaviors werenot stable and did not correlate well witheach other (e.g., Maccoby & Masters, 1970;

,Masters & Wellman, 1974). The conclusionthat was drawn was that there is little evi-dence for attachment as a construct. The re-sults were also interpreted as consistent withsocial learning theories of socialization. Ac-cording to the social learning framework, in-dividual attachment behaviors develop in in-teraction with specific environmental contin-gencies, contingencies that vary from behaviorto behavior and from mother—infant pair tomother—infant pair. Thus, neither great con-sistency within mother—infant pairs nor be-tween mother—infant pairs would be ex-pected. ,

In contrast to this earlier research, recentstudies of mother—infant attachment haverelied on composite, categorical measures ofmother—infant bonding (e.g., Ainsworth, Ble-har, Waters, & Wall, 1978; Waters, 1978,1981). Categories that have been used includeproximity and contact seeking, contact main-

- taihing, resistance, avoidance, search, anddistance interaction, with each category com-posed of measures on several variables. Forinstance, the resistance category has includedmeasures of "pushing away, throwing away,dropping, batting away, hitting, kicking,squirming to be put down, jerking away, step-ping angrily, and resistance to being pickedup or moved or restrained" (p. 350, Ains-worth et al., 1978). The reliabilities of theseaggregate categorical variables are quite high.For example, in a comparison of the corre-lations between 12- and 18-month time sam-ples of discrete mother—infant attachmentbehaviors versus the correlations between 12-and 18-month ratings on the mother—infantinteractive categories, Waters (1978) reporteda mean correlation of .12 for the discretebehaviors and a mean of .44 for the aggre-gated behavior categories.

By examining different patterns of out-comes, it has been possible to assign infantsto reliable attachment categories. For ex-ample, a securely attached infant is high inproximity seeking, high in contact maintain-ing, low in proximity avoiding, and low incontact resisting. The development of thisreliable classification scheme has greatly in-

creased the volume of attachment research,including work on neonatal differences andsubsequent attachment (e.g., Crockenberg,1981; Waters, Vaughn, & Egeland, 1980),attachment and early maltreatment (e.g.,Egeland & Sroufe, 1981), effects of day careon attachment (e.g., Anderson, Nagle, Rob-erts, & Smith, 1981; Vaughn, Gove, & Ege-land, 1980), and the relationship betweenattachment and behavioral competence inlater childhood (e.g., Arend, Gove, & Sroufe,1979; Cohen & Beckwith, 1979; Matas, Ar-end, & Sroufe, 1978; Waters, Wippman, &Sroufe, 1979).

Substantial relationships have been re-ported in all of these investigations. However,the Arend et al. (1979) and Waters et al.(1979) studies are particularly noteworthy inthe present context. In both of those studies,attachment classifications obtained duringinfancy were correlated with ego measures(interpersonal conduct scales) taken later indevelopment. The ego scales consisted ofmultiple items with aggregate scores derivedfollowing the methodological recommenda-tions discussed here. Waters et al. (1979)found substantial relationships between qual-ity of attachment during infancy and inter-personal competence at age 31/2 years. Arendet al. (1979) also found relationships betweenattachment classification and behaviors ofchildren 3 years later (at age 4 to 5 years).These results are especially striking whencontrasted with the consistent finding of in-stability in attachment behaviors that emergedfrom research based on individual measures.,As Masters and Wellman (1974) summa-rized, there was little stability whether "theintervening time was three minutes, oneday, three months, four months, or longer"(p. 224).

Sex Differences: An Example ofAggregating Over Studies

In a major review of the sex difference lit-erature, Maccoby and Jacklin (1974) con-cluded that•the only sex differences that arefairly well established are (a) girls excel inverbal ability, (b) boys excel in visual-spatialability, (c) boys are superior in mathematicalability, and (d) males are more aggressive.However, Block (1976) subsequently argued


that this review was biased against finding sexdifferences due to inappropriate methods ofcombining data. Specifically, Block arguedthat many of the individual studies reviewed'used single-item dependent variables of un-known reliability, and hence, they were po-tentially insensitive to sex differences. To ex-amine this possibility, Block, after specifyingthe units to be combined, aggregated overstudies to determine the proportion favoringmales or females in higher mean score oneach dimension. We present here a new tab-ulation based on Block's (1976) reanalysis(see Table 4).

Block's meta-analysis led her to rather dif-ferent conclusions from Maccoby and Jack-

Tin's: Block (1976) concluded that males arenot only higher on spatial and quantitativeabilities and aggressiveness, but also are

better on insight problems requiring restructuring, andmore dominant and have a stronger, more potent self-concept, are more curious and exploring, more active.and more impulsive. (p. 307)

In addition, she suggested that females notonly score higher on tests of verbal ability hutalso

express more fear, are more susceptible to anxiety. aremore lacking in task confidence. seek more help andreassurance, maintain greater pros....... y to friends, scorehigher on social desirability, and, at the younger ages atwhich compliance has been studied, are more compliantwith adults. (p. 307)

Table 4Proportions of Studies Demonstrating Sex Differences Based on Block's (1976) Reanalysis ofMaccoby and Jacklin's (1974) Literature Review

Ratio of significant comparisons to totalnumber of comparisons

Behavior assessed

Girls and womensignificantly higher

Boys and mensignificantly higher

Ratio Proportion Ratio Proportion

Cognitive dimensionsVerbal abilities 45/160 .28 18/160 .09Spatial abilities 5/100 .05 35/100 .35Quantitative abilities 6/35 .17 14/35 .40Analytic impulsivity 6/80 .08 22/80 .28Breaking set-responses to "insight" problems 0/14 .00 12/14 .86Anagrams-breaking up words to form new words 4/10 .40 0/10 .00Descriptive, analytic sorting style 0/6 .00 1/6 .17Auditorially oriented 6/26 .23 2/26 .08

Social dimensionsAggressivenessFear, timidity, anxiety

5/9436/79

.05

.4652/940/79

.55

.00Activity level 6/109 .06 39/109 .36Competitiveness 6/50 .12 14/50 .28-Dominance. 4/89 .05 35/89 .39Compliance & rule following 26/51 .51 1/51 .02Nurturance, maternal behavior, helping, donating and

sharing 10/58 .17 7/58 .12Sociability 60/215 .28 36/215 .17Suggestibility 36/125 .29 8/125 .06Achievement orientation 5/23 .22 4/23 .17Dependency 28/88 .32 10/88 .11Curiosity and exploration 8/50 .16 . 20/50. .40Social desirability 7/9 .78 0/9 .00

Self-conceptStrength and potency of self concept 0/8 .00 7/8 .88Low self-esteem 20/84 .24 13/84 .16Confidence on task performance 0/33 .00 25/33 .76

OtherTactile sensitivity 5/13 .38 0/13 .00


Other investigators have also aggregatedstudies of sex differences (Cooper, 1979; Hall,1978; Hoffman, 1977). For example Hoff-man (1977) investigated whether femalesdemonstrate greater empathic responsivity'than males to the suffering of another person.He examined 16 such studies. Although onlya few of the 16 studies reported a statisticallysignificant sex difference, 16 of the 16 founda mean difference in favor of greater femaleresponsivity, a result highly improbable bychance. In Cooper's (1979) meta-analysis,males appeared to conform less to socialpressure than females.

These conclusions have implications forboth theory and research in child develop-ment (Rushton, in press). Once again, theappropriate use of aggregation (this time overseparate studies) establishes clear relation-ships otherwise obscured by single measure-ment methodology. Similar conclusions haverecently been advanced by authors of meta-

- analyses of experimenter-expectancy re-search (Rosenthal, 1978) and psychotherapyresearch (Smith & Glass, 1977).

The Assessment of Emotionalityin Animals

Many psychologists working on the causesof emotional development have studied lab-oratory animals because of the control thisgave over experimental variables. Classicamong these studies are those examining theeffects of early experiences (Scott, Stewart,& De Ghett, 1974). Levine (1969), for ex-ample, examined the effects on later behaviorof differing amounts and types of early stim-ulation. At first he simply handled newbornrats briefly during their first days and noticedthat such early handling subsequently led theanimals to be less emotionally reactive andto engage in more exploratory behavior thanthe nonhandled animals. In subsequent stud-ies, Levine discovered that rats given mildelectric shock early in their lives were lessemotionally disrupted when subsequentlyexposed to stressors than were rats not givenshock; the shocked rats even proved moreresistant to disease.

The research strategy used by Levine in-volves "behavior matching," in which it is

assumed that direct parallels exist betweenthe behavior of other animals and those ofhumans. Specifically, situations are studiedthat elicit emotional behaviors assumed to be"the same" (i.e., under the control of thesame underlying mechanism) in both animaland human subjects. One advantage of sucha strategy is that a match can be made be-tween the verbal reports provided by humansand the cytological, endocrinological, andneurological data obtained from animal sub-jects (Gray, 1982; Panksepp, 1982; Plutchik,1980). The ultimate fruitfulness of such astrategy is, of course, open to dispute (Van-derwolf & Goodale, 1982). For our purposes,however, the interesting thing about such re-search is that it illustrates that it is as im-portant to take account of aggregation whendependent variables are behavioral and thesubjects are animals as it is in paper and pen-cil research with humans.

The emotional trait most often researchedin the rat, for example, is fear, and the mostcommonly used measures of fear include de-fecation, ambulation (a measure of explor-atory behavior presumed to be negatively re-lated to defecation), and rearing (also a mea-sure of exploratory behavior). If an emotionaltrait of fear exists in rats, then correlationsacross individual differences on these mea-sures should be high (Gray, 1971). Some re-searchers, however, have challenged both theusefulness of the concept of fear in the ratand the particular indices used, on the groundthat the different assessments only correlatearound .2 or .3. Ossenkopp and Mazmanian(Note 2), however, have recently demon-strated that quite substantial correlationsamong measures ensue if the indices are ag-gregated over time.

In Ossenkopp and Mazmanian's study,ambulatory, rearing, and defecating behav-iors of rats were measured in The Open FieldTest over different time periods: 1-minuteperiods (4 minutes per test session), and ses-sions (four sessions at 48-hour intervals). Theresults showed that whereas the .mean cor-relations across measures on a minute byminute basis averaged .17, the mean corre-lation was .37 on a session by session basis,and the correlation was .55 over the total testperiod (see Table 5).


Experimental Studies of Social andIntellectual Development

Finally, we consider an illustration of howfailures to aggregate are capable of producingconclusions about the relative contributionsof learning to different areas of developmentthat may be incorrect. The areas in questionare social development, on the one hand, andintellectual development, on the other. Withrespect to social development, it is consideredwell established that observational learningfrom models has powerful effects on both thedevelopment of aggression (Bandura, 1973)and the development of altruism (Rushton,1980). These findings, but especially the for-mer, have prompted governmental concernabout possible inadvertent learning from tele-vision (United States Department of Healthand Human Services, 1982). Concerning in-tellectual development, it is equally wellknown that intervention programs designedto boost children's intelligence, some of thememploying observational learning, haveachieved only modest success (e.g., Detter-man & Sternberg, 1982).

The apparent difference in the relativemalleability of social and intellectual devel-opment has been explained in various waysby theoreticians. One leading interpretationis that intellectual development is controlledby variables that are "structural" and, there-fore, minimally susceptible to learning,whereas social development is controlled byvariables that are "motivational" and, there-fore, more susceptible to learning (Bandura,1977; Endler, 1981; Mischel, 1968; Rushton& Endler, 1977). An analysis of the depen-dent variables used in the two types of stud-ies, however, suggests a much simpler inter-pretation based on the aggregation principle.

In modeling studies of children's aggres-sion and altruism, a single dependent variableis typically used to measure the behavior, forexample, the number of punches deliveredto a Bo-Bo doll in the case of aggression (e.g.,Bandura, 1973) or the number of tokens do-nated to a charity in the case of altruism (e.g.,Rushton, 1980). In intellectual training stud-ies, however, multiple-item dependent vari-ables such as standardized educational tests,standardized achievement tests, and stan-

Table 5Effects of Aggregating Open Field Scores forAmbulation, Defecation, and Rearing in Rats(After Ossenkopp Mazmanian, Note 2)

Open field measure Mean r Range

Ambulation with defecationMinutes —.17 .01 to —.31Sessions —.39 .01 to —.56Total —.59

Rearing with defecationMinutes —.16 —.01 to —.32Sessions —.35 —.01 to —.63Total —.49

dardized intelligence tests are typically used(Detterman & Sternberg, 1982). Throughoutthis article we have stressed that the low re-liability of nonaggregated measures can maskstrong underlying relationships between vari-ables. In the case of learning studies, it canhave essentially the opposite effect: It is al-ways easier to produce a change in some traitas a consequence of learning when a single,less stable measure of the trait is taken thanwhen more stable, multiple measures aretaken. This fact may explain why modelingstudies of social development have generallybeen more successful than training studies ofintellectual development.

It is important to add here that we are notclaiming either (a) that social and intellectualdevelopment are equally malleable or (b) thatobservational learning does not have a majorimpact on social development. Concerningthe former, our contention is merely that noconclusion is possible until the relative de-grees of aggregation in the two types of stud-ies have been more precisely equated. Con-cerning the latter, there is no denying the im-portance of observational learning for socialdevelopment in an absolute sense (Bandura,1977). The effects of television, for example,have now been demonstrated from severalmethodological perspectives (USDHHS,1982).

The implications of the principle of aggre-gation for experimental research are as im-portant as those for correlational studies. Thenongeneralizability of experimental findingsto similar situations and, sometimes, theirnonreplicability have been sources of con-cern (Greenwald, 1975). We suggest that if


the principle of aggregation is attended to inthe construction of dependent variables,stronger empirical generalizations will occur.The necessity to sample a variety of items inthe dependent variables of experiments is anargument made long ago by Brunswik (1947)and reiterated more recently by Epstein(1980). Recently, some investigators havetaken to gathering multiple dependent vari-ables and then analyzing them using multipleanalysis of variance (MANOVA) statistical de-signs. This, however, should not be confusedwith what we recommend. We are suggestinginstead that researchers aggregate dependentvariables prior to conducting their statisticalanalyses.

Concluding Remarks

We have reviewed 12 influential areas ofdevelopmental research in which there is asubstantial but erroneous volume of opinioneither that there is little consistency in targetbehavior or that hypothesized mediating con-structs do not predict behavior. On the basis-of our review, we conclude that these opin-ions often turn out to be too pessimistic whenthe focus is shifted from single dependentvariables to more reliable aggregated mea-sures.

Fortunately, there seems to be a growingrealization that the principle of aggregationapplies to behavioral measures as well as topaper-and-pencil tests. Others who have re-cently noted that single behavioral measureshave lower predictive power than the averageof many include Hogan, DeSoto, and Solano(1977), Green (1978), Epstein (1979, 1980),Jackson and Paunonen (1980), Rushton,Jackson, and Paunonen (1981), and Wiggins(1981).

Despite the more optimistic picture ofconstruct validation research in developmen-tal psychology that emerges from this review,our position is a conservative one. Essentially,we are saying (and illustrating) only whatothers have said before in other contexts (e.g.,Campbell & Fiske, 1959), namely that dueconsideration must be given to the reliabilityof one's measurements. Single measures aretypically less reliable than multiple measures,and using less reliable measures necessarily

attenuates empirical relationships. Althoughthese points have long been recognized inpaper-and-pencil research, they have gener-ally been ignored when dependent variablesare behavioral.

Of course, there are occasions when ag-gregation is unnecessary, and even harmful.In clinical contexts, for example, when highlyrobust phenomena such as reflexes and over-learned habits are being studied, it may oftenbe more useful to focus on a specific responsethan on general characteristics of the person(Bandura, 1969; Mischel, 1968). Similarly,aggregation may not strengthen empirical re-lationships when potent, ego-involving eventsare being investigated, as in Epstein andFenz's (1965) study of anxiety in sport par-achuting. Likewise, there may be little gainfrom aggregating self-ratings or ratings byothers when these are based on impressionsgathered over several past observations (Ep-stein, 1980). Finally, there is inappropriateaggregation, as when one aggregates over un-reliable or poor items or studies. In this re-spect, for example, Eysenck (1978) charac-terized Smith and Glass's (1977) meta-anal-ysis of psychotherapy outcome studies as anexercise in "mega-silliness" because it aggre-gated over a hodge-podge of methodologicallypoor studies.

In closing, we reiterate that whenever thereis the possibility of unreliability of measures,then aggregation becomes a desideratum.This is true whether one is measuring a con-struct or behaviors that might be predictedby the construct. Aggregation can be accom-plished through multiple measures of thesame variable or measurement of multiplevariables that are believed to be related. Ag-gregation can also be accomplished by com-bining the ratings of various observers orcombining the results of various behavioralor paper-and-pencil assessments, as well asby combining the results of a number of stud-ies directed at the same issue. Examples ofall of these approaches have been providedherein. Of course, not all types of aggregationare equally appropriate for all purposes. Forexample, if one is testing for a trait it is notsufficient to aggregate over the same itemmany times (as recently done by Mischel &Peake, 1982). To provide an index of a hy-


pothesized trait, it is necessary to aggregatealternative assessments of the same under-lying concept.

The question of what to aggregate is alwaysan issue. In this review, we were limited topublished data that were amenable to anal-ysis from both single and multiple measure-ment perspectives. In future work, the logicalstep would be to move in the direction ofbatteries of tests for particular constructs.Pilot work concerned with the psychometricproperties of measurement has to be acceptedas part of development research, as it has longbeen in the field of testing. This requires sam-pling a variety of items and stimuli and thenaggregating over only those that demonstratedesired properties. Among these are hightest-retest reliabilities, high item-total cor-relations, and both convergent and divergentvalidity. In short, as developmental research-ers, we must become more concerned withthe construct validity of our measures (An-astasi, 1982; Campbell & Fiske, 1959).

Reference Notes

I. Borkowski, J. G. Personal communication to M. Pres-sky, January 1982.Ossenkopp, K.-P., & Mazmanian, D. The principleof aggregation in psychobiological correlational re-search: An example from the open field test. Unpub-lished manuscript, University of Western Ontario,1983. (Available from De K-Peter Ossenkopp, De-partment of Psychology, University of Western On-tario, London, Ontario, Canada N6A 5C2)

References

Ainsworth, M. D. S., Blehae M. C., Waters, E., & Wall,S. Patterns of attachment. New York: Erlbaum, 1978.

Anastasi, A. Psychological testing (5th ed.). New York:Macmillan, 1982.

Anderson, C. W., Nagle, R. J., Roberts, W. A., & Smith,J. W. Attachment to substitute caregivers as a functionof center quality and caregiver involvement. ChildDevelopment, 1981, 52, 53-61.

Arend, R. A., Gove, F. L, & Sroufe, L A. Continuityof individual adaptation from infancy to kindergarten:A predictive study of ego-resiliency and curiosity inpreschoolers. Child Development, 1979, 50, 950-959.

Bandura, A. Principles of behavior modification. NewYork: Holt, Rinehart & Winston, 1969.

Bandura, A. Aggression: A social learning analysis. En-glewood Cliffs, NJ.: Prentice-Hall, 1973.

Bandura, A. Social learning theory Englewood Cliffs,NJ.: Prentice-Hall, 1977.

Berzonsky, M. D. Interdependence of Inhelder and Pi-

aget's model of logical thinking. Developmental Psy-chology, 1971, 4, 469-476.

Blasi, A. Bridging moral cognition and moral action: Acritical review of the literature. Psychological Bulletin,1980, 88. 1-45.

Block, J. The Q-sort method in personality assessmentand psychiatric research. Springfield, Ill Charles CThomas, 1961.

Block, J. Lives through time. Berkeley, Calif.: BancroftBooks, 1971.

Block, J. Some enduring and consequential structuresof personality. In A. I. Rabin, J. Aronoff, A. M. Bar-clay, & R. A. Zucker (Eds.), Further explorations inpersonality. New York: Wiley, 1981.

Block, J. H. Issues, problems, and pitfalls in assessingsex differences. A critical review of The Psychology ofSex Differences. Merrill-Palmer Quarterly, 1976, 22.283-308.

Brainerd, C. J. Cognitive development and conceptlearning: An interpretative review. Psychological Bul-letin, 1977, 84, 919-939. (a)

Brainerd, C. J. Response criteria in concept developmentresearch. Child Development. 1977, 48, 360-366. (b)

Brainerd, C. J. Piaget's theory of intelligence. EnglewoodCliffs, NJ.: Prentice-Hall, 1978. (a)

Brainerd, C. J. The stage question in cognitive-devel-opmental theory. The Behavioral and Brain Sciences,1978, I. 173-213. (b)

Brainerd, C. J. Continuing commentary. The Behavioraland Brain Sciences, 1979, 2, 137-154.

Brim, 0. G., lc, & Kagan, J. (Eds.). Constancy andchange in human development. Cambridge, Mass.:Harvard University Press, 1980.

Brown, A. L, & Campione, J. C. Training strategic studytime apportionment in educable retarded children.Intelligence, 1977, 1, 94-107.

Brunswik, E. Systematic and representative design ofpsychological experiments. Berkeley, Calif.: Universityof California Press, 1947.

Burton, R. V. Generality of honesty reconsidered. Psy-chological Review, 1963, 70, 481-499.

Burton, R. V. Honesty and dishonesty. In T. Lickona(Ed.), Moral development and behavior. New York:Holt, Rinehart & Winston, 1976.

Campbell, D. T., & Falk, D. W. Convergent and dis-criminant validation by the multitrait-multimethodmatrix. Psychological Bulletin, 1959, 56. 81-105.

Cavanaugh, J. C., & Borkowski, J. G. Searching formetamemory-memory connection= A developmentalstudy. Developmental Psychology, 1980, 16, 441453.

Cohen, S. E., & Beckwith, L Preterm infant interactionwith the caregiver in the first year of life and com-petence at age two. Child Development. 1979, 50. 767-776.

Cooper, H. M. Statistically combining independent stud-ies: Meta-analysis of sex differences in conformity re-search. Journal of Personality and Social Psychology1979, 37, 131-146.

Crocicenberg, S. B. Infant irritability, mother respon-siveness, and social support influences on the securityof infant-mother attachment. Child Development,1981, 52, 857-865.


Cronbach, L .1. The two disciplines of scientific psy-chology. American Psychologist, 1957, 12, 671-684.

Detterman, D. K., & Sternberg, R. J. (Eds.), How andhow much can intelligence be increase& Norwood,NJ.: Ablex, 1982.

bodwell, P. C. Children's understanding of number con-cepts: Characteristics of an individual and a group test.Canadian Journal of Psychology, 1961, 15, 29-36.

Wand, B., & Sroufe, L A. Attachment and early mal-treatment. Child Development, 1981, 52, 44-52.

Eichorn, D. H., Clausen, J. A., Haan, N., Honzik,M. P., & Mussen, P. H. (Eds.), Present and past inmiddle life New York: Academic Press, 1981.

Eller, J. The relationship between role-taking skills, pro-social reasoning and prosocial behavior. Unpublisheddoctoral dissertation, University of Western Ontario,1982.

Elkind, D. The development of quantitative thinking: Asystematic replication of Piaget's studies. Journal ofGenetic Psychology, 1961, 98, 37-46.

Endler, N. S. Persons, situations, and their interactions.In A. L Rabin, J. Aronoff, A. M. Barclay, & R. A.Zucker (Eds.), Further explorations in personality. NewYork: Wiley, 1981.

Enright, R. D., & Lapsley, D. K. Social role-taking: Areview of the constructs, measures, and measurementproperties. Review of Educational Research. 1980, 50,647-674.

Epstein, S. The stability of behavior 1. On predictingmast of the people much of the time. Journal of Per-

- sonality and Social Psychology, 1979, 37. 1097-1126.Epstein, S. The stability of behavior: II. Implications for

psychological research. American Psychologist. 1980,35, 790-806.

Epstein, S., & Fenz, W. D. Steepness of approach andavoidance gradients in humans as a function of ex-perience. Journal of Experimental Psychology, 1965,70. 1-12.

Eron, L D. Prescription for reductions of aggression.American Psychologist. 1980, 35, 244-252.

Eysenck, H. J. The validity of judgments as a functionof the number of judges. Journal of Experimental Psy-dsology 1939, 25. 650-654.

Eysenck, H. .1. The structure of human personality NewYork Macmillan, 1970.

Eysenck, H. J. An exercise in mega-silliness. AmericanPsychologist. 1978, 33, 517.

Eysenck, H. J. (Ed.), A model for personality New York:Springer, 1981.

Fishbein, M., & Ajzen, I. Attitudes towards objects aspredictors of single and multiple behavioral criteria.Psychological Review, 1974, 81, 59-74.

Flavell, J. H. The developmental psychology of Jean Pi-art. Princeton, NJ.: Van Nostrand, 1963.

Flan* J. H., & Wellman, H. M. Metamemory. InR. V. Kail & J. W. Hagen (Eds.), Perspectives on thedevelopment of memory and cognition. Hillsdale, NJ.:Etibaum, 1977.

Ford, M. E. The construct validity of egocentrism. Psy-chological Bulletin, 1979, 86, 1169-1188.

Forrest, D. L Cognitive and meta-cognitive aspects ofreading (Doctoral dissertation, University of Waterloo,1980). Dissertation Abstracts International, 1980, 41,1134-B.

Gordon, K. Group judgments in the field of liftedweights. Journal of Experimental Psychology, 1924, 7.398-400.

Gough, H. G. California Psychological Inventory Man-ual. Palo Alto, Calif.: Consulting Psychologists Press,1957.

Gray, J. A. The psychology of fear and stress. London:Weidenfeld and Nicholson, 1971.

Gray, J. A. The neuropsychology of anxiety: An enquiryinto the functions of the septo-hippocampal system.New York: Oxford University Press, 1982.

Green, B. F., Jr: In defense of measurement. AmericanPsychologist. 1978, 33, 664-670.

Greenwald, A. G. Consequences of prejudice against thenull hypothesis. Psychological Bulletin, 1975, 82, 1-20.

Gulliksen, H. Theory of mental tests. New York: Wiley,1950.

Hall, A. Gender effects in decoding nonverbal cues. Psy-chological Bulletin, 1978, 85. 845-857.

Hartshorne, H., & May, M. A. Studies in the nature ofcharacter: Vol. I. Studies in deceit. New York: Mac-millan, 1928.

Hartshorne, H., May, M. A., & Mailer, J. B. Studies inthe nature of character: Vol. 2. Studies in self-control.New York: Macmillan 1929.

Hartshorne, H., May, M. A., & Shuttleworth, F. K. Stud-ies in the nature of character: Vol. 3. Studies in theorganization of character. New York: Macmillan,1930.

Hoffman, M. L Sex differences in empathy and relatedbehaviors. Psychological Bulletin, 1977, 84, 712-722.

Hogan, R., DeSoto, C. B., & Solano, C. Traits, tests, andpersonality research. American Psychologist, 1977, 32,255-264.

Hooper, F. H., Brainerd, C. J., & Sipple, T. S. A repre-sentative series of Piagetian concrete operations tasks.Madison: Wisconsin Research and Development Cen-ter for Cognitive Learning, 1975.

Hooper, F. H., Swinton, S. S., & Sipple, T. S. Logicalreasoning in middle childhood: A study of the Piage-tian concrete operations stage. In H. J. Klausmeier& Associates (Eds.), Cognitive learning and develop-ment Piagetian and information-processing perspec-tives. Cambridge, Mass.: Ballinger, 1979.

Hooper, F. H., & Toniolo, T. A. A longitudinal analysisof logical reasoning relationships: Conservation andtransitive inference. Madison: Wisconsin Research andDevelopment Center for Cognitive Learning, 1974.

Hooper, F. H., Toniolo, T. A., & Sipple, T. S. A longi-tudinal analysis of logical reasoning relationships:Conservation and transitive inference. DevelopmentalPsychology. 1978, 14, 674-682.

Jaccard, J. J. Predicting social behavior from personalitytraits. Journal of Research in Personality 1974, 7,358-367.

Jackson, D. N. Personality Research Form Manual.Goshen, N.Y.. Research Psychologists Press, 1967.

Jackson, D. N., & Paunonen, S. V. Personality structureand assessment. In M. R. Rosenzweig & L W. Porter(Eds.), Annual Review of Psychology (Vol. 31). PaloAlto, Calif.: Annual Reviews, 1980.

Kelly, M., Scholnick, E. F., Travers, S. H., & Johnson,J. W. Relations among memory, memory appraisal,


and memory strategies. Child Development. 1976, 47,648-659.

Kenrick, D. T., & Stringfield, D. Q Personality traits andthe eye of the beholder: Crossing some traditional phi-losophical boundaries in the search for consistency inall of the people. Psychological Review, 1980, 87, 88-104.

Knight, H. C. A comparison of the reliability of groupand individual judgments. Unpublished master's the-sis, Columbia University, 1921.

Krebs, D. L., & Rukcell, C. Role-taking and altruism:When you put yourself in the shoes of another, willthey take you to their owner's aid? In J. P. Rushton& R. M. Sorrentino (Eds.), Altruism and helping be-havior: Social, personality and developmental perspec-tives. Hillsdale, N.J.: Erlbaum, 1981.

Kreutzet M. A., Leonard, C., & Flavell, J. H. An in-terview study of children's knowledge about memory.Monographs of the Society for Research in Child De-velopment, 1975, 40(1, Serial No. 159).

Kurtz, B. E., Reid, M. K., Borkowski, J. G., & Cavan-augh, J. C. On the reliability and validity of children'smetamemory. Bulletin of Psychonomic Society, 1982,19. 137-140.

Levine, S. Infantile stimulation: A perspective. In J. A.Ambrose (Ed.), Stimulation in early infancy. NewYork: Academic Press, 1969.

Lord, R. M., & Novick, M. R. Statistical theories ofmental test scores. Reading, Mass.: Addison-Wesley,1968.

Maccoby, E. E, & Jacklin, C. J. The psychology of sexdifferences. Palo Alto, Calif.: Stanford UniversityPress, 1974.

Maccoby, E. E., & Masters, J. C. Attachment and de-pendency. In P. H. Mussen (Ed.), Carmichael's man-ual of child psychology (Vol. 2, 3rd ed.). New York:Wiley, 1970.

Mailer, J. B. General and specific factors in character.Journal of Social Psychology, 1934, 5. 97-102.

Masters, J. C., & Wellman, H. The study of human infantattachment A procedural critique. Psychological Bul-letin, 1974, 81, 218-237.

Matas, L., Arend, R. A., & Sroufe, L. A. Continuity ofadaptation in the second year: The relationship be-tween quality of attachment and later competence.Child Development, 1978, 49, 547-556.

McGuire, W. J. The nature of attitudes and attitudechange. In G. Lindzey & E. Aronson (Eds.), The hand-book of social psychology (Vol. 3, 2nd ed.). Reading,Mass.: Addison-Wesley, 1969.

Mischel, W. Personality and assessment. New York:Wiley, 1968.

Mischel, W. Toward a cognitive social learning recon-ceptualization of personality. Psychological Review.1973, 80, 252-283.

Mischel, W., & Peake, P. K. Beyond deja vu in the searchfor cross-situational consistency. Psychological Review1982, 89, 730-755.

Moskowitz, D. S., & Schwarz, J. C. Validity comparisonof behavior counts and ratings by knowledgeable in-formants. Journal of Personality and Social Psychol-olgy. 1982, 42, 518-528.

Panksepp, J. Toward a general psychobiological theory

of emotions. The Behavioral and Brain Sciences.1982, 5, 407-467.

Piaget, J. The moral judgment of the child. London:Routledge & Kegan Paul, 1932.

Piaget, J. Classes, relations et nombres: Essai sur les-groupements" de la logistique et la reversibilite de lapensee. Paris: Vrin, 1941.

Piaget, J. Traite de logique. Paris: Colin, 1949.Plutchik, R. Emotion: A psychoevolutionary synthesis.

New York: Harper & Row, 1980.Pressley, M., Borkowski, J. G., & O'Sullivan, J. Chil-

dren's metamemory and the teaching of memorystrategies. In D. Forrest-Pressley, G. E. MacKinnon& T. G. Waller (Eds.), Mew-cognition, cognition, andhuman performance. New York: Academic Press, inpress.

Rosenthal, R. Combining results of independent studies.Psychological Bulletin, 1978, 85. 185-193.

Rubin, K. H. Role-taking in childhood: Some method-ological considerations. Child Development, 1978, 49,428-433.

Rubin, Z. Does personality really change afer 20? Psy-chology Today, 1981, 15, 18-27.

Rushton, J. P. Altruism, socialization, and society. En-glewood Cliffs, NJ.: Prentice-Hall, 1980.

Rushton, J. P. Sociobiology: Toward a theory of indi-vidual and group differences in personality and socialbehavior. In J. R. Royce (Ed.), Annals of theoreticalpsychology, (Vol. 2). New York: Plenum Press, in press.

Rushton, J. P., & Endler, N. S. Person by situation in-teractions in academic achievement. Journal of Per-sonality. 1977, 45. 298-309.

Rushton, J. P., Jackson, D. N., & Paunonen, S. V. Per-sonality: Nomothetic or idiographic? A response toKenrick and Stringfield. Psychological Review, 1981,88. 582-589.

Rushton, J. P., Murray, H. G., & Paunonen, S. V. Per-sonality, research creativity and teaching effectivenessin university professors. Scientornetrics, 1983, 5, 93-116.

Salatas, H., & Flavell, J. H. Behavioral and meta-mne-monic indicators of strategic study behavior under re-member instructions in first grade. Child Develop-ment, 1976, 47, 80-89.

Scheffman, J. The effects of individual and group playexperience on preschoolers' perspective taking, refer-ential communication and number conservation abil-ities. Unpublished doctoral dissertation, University ofWestern Ontario, 1981.

Scott, J. P., Stewart, J. M., & De Ghett, V. J. Criticalperiods in the organization of systems. DevelopmentalPsychobiology. 1974, 7, 489-513.

Smith, M. L., & Glass, G. V. Meta-analysis of psycho-therapy outcome studies. American Pspzhologist, 1977,32, 752-760.

Spearman, C. Correlation calculated from faulty data.British Journal of Psychology, 1910, 3, 271-295.

Stevens, S. S. Experimental Psychology. New York:Wiley, 1951.

Stevens, S. S. Psychophysics and social scaling. Morris-town, NJ.: General Learning Press, 1972.

Underwood, B., & Moore, B. Perspective-taking and al-truism. Psychological Bulletin, 1982, 91. 143-173.

United States Department of Health and Human Ser-

38 J. P. RUSH-TON, C. BRAINERD, AND M. PRESSLEY

vices. Report. Television and behavior: Ten years ofprogress and implications for the eighties (Vol. 1, Sum-mary Report; Vol. 2, Technical Reviews). Washington,D.C.: United States Government Printing Office,1982.

Vanderwolf, C. H., & Goodale, M. A. Does introspectionhave a role in brain-behavior research? The Behavioraland Brain Sciences. 1982, 5. 448.

Vaughn, B. E., Gove, F. L., & Egeland, B. The relation-ship between out-of-home care and the quality of in-fant-mother attachment in an economically disad-vantaged population. Child Development. 1980, 51,1203-1214.

Waters, E. The reliability and stability of individual dif-ferences in infant-mother attachment. Child Devel-opment. 1978, 49, 483-494.

Waters, E. Traits, behavioral systems, and relationships:Three models of infant-adult attachment. In K.Smmelman, G. W. Barlow L. Petrinovich, & M. Main

(Eds.), Behavioral development. London: Cambridge •University Press, 1981.

Waters, E., Vaughn, B. E., & Egeland, B. R. Individualdifferences in infant-mother attachment relationshipsat age one: Antecedents in neonatal behavior in anurban, economically disadvantaged sample. ChildDevelopment, 1980, 51. 208-216.

Waters, E., Wippman, J., & Sroufe, L A. Attachment,positive affect, and competence in the peer group: Twostudies in construct validation. Child Development,1979, 50, 821-829.

Wiggins, J. S. Clinical and statistical prediction: Whereare we and where do we go from here? Clinical Psy-chology Review, 1981, 1, 3-18.

Woodworth, R. S., & Schlosberg, H. Experimental psy-chology. New York: Holt, 1939.

Received July 1, 1982Revision received January 21, 1983 ■

Editorial Consultants for This Issue: Review Articles

John R. Aiello David Fay Herbert C. Lansdell H. McDvaine ParsonsMax Allen Herman Feifel John T. Lanzetta Miles L PattersonTerry W. Allen Jack M. Feldman Gary P. Latham Thane S. PittmanThomas Andre Norma D. Feshback Paul R. Latimer Robert PlutchikHymie Anisman Fred E. Fiedler Barry Ledwidge Dean G. PruittMortimer H. Appley Donald W. Fiske Herschel W. Leibowitz Julian RappaportAlbert F. Ax Jerome D. Frank Mark R. Lepper Stephen K. ReedFern J. Azima Russell E. Glasgow Daniel J. Levinson Lance J. RipsMark A. Barnett Charles J. Golden Jerre Levy Jeffrey Z. RubinIra Belmont Gerald Goldstein Peter M. Lewinsohn Glenn SandersLeonard Berkowitz Sanford Golin Jane Loevinger Neal W. SchmittPhyllis Berman Gerald Goodman Raymond P. Lorton Carmi SchoolerKaren Linn Bierman Gerald Gratch Charles Lowe Donald P. SchwabMilton R. Blood William W. Grings Dorothy I. Marquart Alan SearlemanMilton L Blum Douglas T. Hall George W. McConkie Bernard SegalArthur P. Brief Joseph T. Hart Joseph E. McGrath Mark SeidenbergAnn L. Brown Joseph B. Hellige Douglas Media William R. Shadish, Jr.Roger B. Burton Richard Hirschmann Sarnoff A. Mednick John W. ShafferRobert B. Cairns William C. Hirst Donald H. Meichenbaum Stefanie Shattuck-HufnagelJoanne R. Cantor Larry A. Hjelle Manfred J. Meier Gordon ShulmanRichard Carlson David S. Holmes Mary Ann Metzger Ervin StaubJohn B. Carroll James S. House Herbert Meyer Eugene Frank StoneCharles S. Carver Robert J. House Beth E. Meyerowitz Lowell H. StormsWerrett W. Charters, Jt Frederick Mark Jablin Lance A. Miller Anne E. SutherlandLouis D. Cohen Durand Frank Jacobs Thomas I. Miller Samuel SuttonHarris M. Cooper Arthur G. Jago Theodore Millon John A. SwetsDavid S. Cordray Herbert M. Jenkins John B. Miner George P. TaylorCharles G. Costello Jerome Kagan John T. Monahan E. Paul TorranceJohn L. Cotton Rabindra N. Kanungo Alan Monat Leonard P. UllmanJames E. Cowden Stanislav V. Kasl Bennet B. Murdock, Jr. Robert VallerandJennifer Crocker Daniel Katz Gregory L. Murphy Herbert J. Walberg -Michael Davis Herbert Kaufman Helmer R. Myklebust Mary A. WeberRichard A. Dienstbier Verner M. Knott Bernice L. Neugarten Robert F. WeissEmanuel Donchin Eric S. Knowles Allan Newell Alan F. WilliamsCarl Eisdorfer Gerald P. Koocher Warren T. Norman Michael WoganRobert F. Eme Janet Tracy Landman Charles A. O'Reilly III Stephen R. YussenOra Fagan-Simcha Ellen J. Langer Marlene Oscar-Berman

behavioral development and construct validity: the...

Documents