differential validity: another threat to compensatory education evaluations

7
Evaluation and Program Planning, Vol. 2, pp. 25-32, 1919 Printed in the U.S.A. All rights reserved. 0149-7189/79/010025-08$02.00/O Copyright 0 1979 Pergamon Press Ltd DIFFERENTIAL VALIDITY: ANOTHER THREAT TO COMPENSATORY EDUCATION EVALUATIONS LINDA HEATH Northwestern University ABSTRACT The efjects of the changing nature of intelligence test items during the follow-up period for compensatory education program evaluations on the judgements about the efficacy oj such programs are examined. Data generated from Stanford-Binet test items, assuming known “true” underlying motor and verbal scores, as well as data from three compensa- tory education evaluations, are examined. The changing nature of the IQ test items, i.e., the differential validity of the tests, is put forth as an explanation for the often disheart- ening results of such program evaluations. Methods for dealing with the problem of differ- ential validity are suggested. Social scientists are reanalyzing the evaluations of the compensatory education programs of the 196Os, trying to understand what went wrong and looking for evidence of something that went right. Despite the wide range of goals set for the compensatory education programs, evaluators almost always chose to assess the programs’ impact on children’s scholastic skills, usually measured via standard- ized intelligence tests. Intelligence tests purport to tap what has been described by Spearman (1904) as g, or gen- eral factor, and by Terman (1916) as the individual’s capa- city to think abstractly and use abstract symbols. Intelli- gence tests administered to young children commonly use primarily motor tasks, such as picture drawing, block building, and identifying parts of a doll’s face by pointing to them (e.g., the Stanford-Binet, 1974; the California Preschool Scale [Jaffa], 1934). Other childhood intelli- gence tests involve interviewing a parent to assess such things as the child’s ability to eat with a spoon, pedal a tri- cycle, count to five, and speak in sentences (e.g., Pre- school Educational Attainment Scale; Hammond-Skipper Pre-school Achievement Rating Scale, cited in Johnson & Bonmarito, 1971). During the period of evaluation for compensatory education programs (ages 2 to lo), intelli- gence test items become increasingly verbal. That is, the test items involve verbal instructions rather than physical demonstration of the task to be done and verbal rather than motor responses. Verbal items include tests of vocab- ulary, comprehension, and memory. This gradual motor- to-verbal shift in testing items could interact with pre- existing differences between the experimental and control groups to make the compensatory education programs look less effective than they in fact were. The typical pattern of intelligence scores from evalu- ations of compensatory education programs is one of ini- tial growth during the intervention followed by a gradual decline back to or below the original levels. This pattern of intelligence score decay has been found in compensato- ry education programs which have different theoretical bases and quite different intervention strategies. For ex- ample, the Infant Education Research Project (Schaefer & Aaronson, 1972) used home-based tutors to work with infants as young as fifteen months. Cognitive and concep- tual abilities were stimulated through reading, pictures, games, and puzzles, in cooperation with other family members. In the Perry Preschool Project (Weikart, cited in Bronfenbrenner, 1974), on the other hand, children at- tended half-day classes for two years. In this program cognitive abilities were developed through a curriculum which was based on Piagetian theory. And the children in the Howard University Preschool Program (Herzog, cited in Bronfenbrenner, 1974) attended full day classes at a nursery school and were then placed in an enriched school The author would like to thank Donald Campbell, Thomas Cook, Michael Hendricks, and 3oel Moskowitz for helpful comments on earlier drafts of this paper. This research was supported in part by NSF Grant BNS76-23920. Requests for reprints should be sent to Linda Heath, Center for Urban Affairs, Northwestern University, 2046 Sheridan Road, Evans- ton, Illinois 6020 1.

Upload: linda-heath

Post on 19-Nov-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Evaluation and Program Planning, Vol. 2, pp. 25-32, 1919 Printed in the U.S.A. All rights reserved.

0149-7189/79/010025-08$02.00/O Copyright 0 1979 Pergamon Press Ltd

DIFFERENTIAL VALIDITY:

ANOTHER THREAT TO COMPENSATORY EDUCATION EVALUATIONS

LINDA HEATH

Northwestern University

ABSTRACT

The efjects of the changing nature of intelligence test items during the follow-up period

for compensatory education program evaluations on the judgements about the efficacy oj

such programs are examined. Data generated from Stanford-Binet test items, assuming

known “true” underlying motor and verbal scores, as well as data from three compensa-

tory education evaluations, are examined. The changing nature of the IQ test items, i.e.,

the differential validity of the tests, is put forth as an explanation for the often disheart-

ening results of such program evaluations. Methods for dealing with the problem of differ-

ential validity are suggested.

Social scientists are reanalyzing the evaluations of the compensatory education programs of the 196Os, trying to

understand what went wrong and looking for evidence of something that went right. Despite the wide range of goals set for the compensatory education programs, evaluators almost always chose to assess the programs’ impact on children’s scholastic skills, usually measured via standard- ized intelligence tests. Intelligence tests purport to tap what has been described by Spearman (1904) as g, or gen- eral factor, and by Terman (1916) as the individual’s capa- city to think abstractly and use abstract symbols. Intelli- gence tests administered to young children commonly use primarily motor tasks, such as picture drawing, block building, and identifying parts of a doll’s face by pointing to them (e.g., the Stanford-Binet, 1974; the California Preschool Scale [Jaffa], 1934). Other childhood intelli- gence tests involve interviewing a parent to assess such things as the child’s ability to eat with a spoon, pedal a tri- cycle, count to five, and speak in sentences (e.g., Pre-

school Educational Attainment Scale; Hammond-Skipper Pre-school Achievement Rating Scale, cited in Johnson & Bonmarito, 1971). During the period of evaluation for compensatory education programs (ages 2 to lo), intelli- gence test items become increasingly verbal. That is, the test items involve verbal instructions rather than physical demonstration of the task to be done and verbal rather

than motor responses. Verbal items include tests of vocab- ulary, comprehension, and memory. This gradual motor- to-verbal shift in testing items could interact with pre- existing differences between the experimental and control groups to make the compensatory education programs look less effective than they in fact were.

The typical pattern of intelligence scores from evalu- ations of compensatory education programs is one of ini- tial growth during the intervention followed by a gradual decline back to or below the original levels. This pattern of intelligence score decay has been found in compensato- ry education programs which have different theoretical bases and quite different intervention strategies. For ex- ample, the Infant Education Research Project (Schaefer & Aaronson, 1972) used home-based tutors to work with infants as young as fifteen months. Cognitive and concep- tual abilities were stimulated through reading, pictures, games, and puzzles, in cooperation with other family members. In the Perry Preschool Project (Weikart, cited in Bronfenbrenner, 1974), on the other hand, children at- tended half-day classes for two years. In this program cognitive abilities were developed through a curriculum which was based on Piagetian theory. And the children in the Howard University Preschool Program (Herzog, cited in Bronfenbrenner, 1974) attended full day classes at a nursery school and were then placed in an enriched school

The author would like to thank Donald Campbell, Thomas Cook, Michael Hendricks, and 3oel Moskowitz for helpful comments on earlier

drafts of this paper. This research was supported in part by NSF Grant BNS76-23920. Requests for reprints should be sent to Linda Heath, Center for Urban Affairs, Northwestern University, 2046 Sheridan Road, Evans-

ton, Illinois 6020 1.

26 LINDA HEATIi

75.03 w 4 , t I I -- 0 + 5 b 7 8 9

CHILDREN’S AGES IN YEARS

Figure 1. Patterns of results from three compensatory education evaluations: the fierzog et al., the Schaefer and Aaronson, and the Weikart, et al., all cited in Bronfenbrenner (1974). The * indicates the end of treatment.

situation for the next three years. In spite of these differ- ent compensatory education strategies, the test scores of students in all the programs showed sharp decline follow- ing the cessation of the intervention. (See Figure 1.)

This pattern of results has been interpreted as indicat- ing that intervention effects have little or only short lived effects, either due to the nature of the programs or due to the increasing impact of environment on the children’s intelligence ratings. To these possibilities we must add that of differential validity. Differential validity refers to an instrumentati~~ll problem in which the validity with which an instrunlent assesses the underlying construct changes over time. The assessment of abstract reasoning through verbal rather than motor test items presents such a problem. Since the composition of intelligence tests for

children shifts gradually from motor to verbal items (con- taining exclusively verbal c~)l?~p~~nents by age 10). one cannot be sure if changes in scores over this period reflect changes in underlying intelligence or merely changes in the test composition.

in the same way the motor-to-verbal shift in intelli- gence test items is not abrupt. the shift is not clear-cut even within a single test item. Some early clli~dll~)~~d items are purely motor (e.g.. block building. involving the child’s duplicating a three block tower constructed by the tester), and some are purely verbal (e.g., picture vocabu- lary, involving the child’s verbally identifying pictures the tester points to on a card.) Others. however fall into 3

classificatory gray area. Such items involve a verbal task with a motor response (e.g., facial parts identification, in which the experimenter says “Show me the dolly’s nose” and the child points to it.) It is unclear whether these items assess a verbal component or some pre-verbal stimu- lus-response con~porlent. AdditiorIally, a child from a mi- nority group might not experience the same language han- dicaps with items with non-verbal responses as with verbal since language processing and language generation involve separate abilities. Consequently. the pre-test would not re- flect cultural bias to the same extent as the post-test. This situation biases findings against colnpensatory interven- tion programs. These gray area items have been separated from the purely verbal or motor items in the following analyses.

The Stanford-&net (1947) is olte of the most widely used intelligence tests and is often considered the bench- mark by which other intelligence tests are evaluated. Table 1 presents the number of Verbal (involving both verbal instructions and responses), Motor (involving large- ly motel- instructions and motor responses), and Mixed (as discussed above) items (including the alternate form) in each age group up to age 10 on the Stanf~)rd-~inet. As can be seen from Table I, intelligence scores depend increas- ingly on verbal items as the follow-up period for compen- satory education evaluations progresses. But how worri- some is this changing test composition?

As Bloom ( 1964) has cautioned. one cannot be sure

Differential Validity: Another Threat to Compensatory Education Evaluations 27

TABLE I Number of verbal, mixed, and motor items on the Stanford- Binet (1947) for each age division, 2 through 10.

-- -

Age No. Verbal No. Mixed No. Motor -

2 2 3 2

2% 3 3 1

3 7 i 1 4

3%. 2 4 1

4 5 2 0

4% 4 2 1

5 2 0 5

6 5 1 1

7 6 0 1

8 7 0 0

9 5 0 2

10 7 0 0

measurements of a characte~stic at two points in time are comparable without “engaging in similar operational deli- nitions of the characteristic” at both times (p. 8). Hof- stae tter (1954) factor analyzed the composition of intelli- gence tests using data from the Berkeley Growth Study to examine the operationalization of factors within the tests. He isolated three principal factors which account for the variance in intelligence scores. The first factor, a sensori- motor alertness dimension, accounts for almost alI the va- riance in IQs prior to age 20 months and for almost none of the variance after 40 months. The second factor, some- what the opposite of the sensori-motor dimension, called by Hofstaetter the persistence or stubborness (Trotzalter - literally “in spite of adults”) dimension, accounts for

much of the variance between 20 and 40 months. The final factor, characterized by manipulation of symbols, accounts for almost all of the variance from 48 months on. Hofstaetter (1954, p. 163) concludes: “In the second decade of life it is practically only Factor III that ac- counts for the variance of the respective intelligence

scores. The predictive value of so-called ‘intelligence tests’ administered before an age of 4 is rather doubtful.” And, as Thorndike (1940) ltas cautioned, causal interpretation

should not be given to changing IQ scores at young ages, since such changes could be due to changing environments or due to the fact that “the aspects of mental functioning studied by early tests are rather different from those in- corporated in school-age tests” (p. 174).

But are the testing items different at early ages only be- cause the nature and organization of intelligence are dif- ferent at those ages? That the intellect manifests itself dif- ferently at different ages and needs to be measured differ- ently does not necessarily mean that it is not the same un- derlying construct. In addressing this issue, Piaget traced the changing intellect through various stages of cognitive development. According to him, children below two years of age are in the sensori-motor stage, followed by the pre-

operational stage until age 7 (Piaget, 1967). Piaget main- tains that in neither of the first two stages is the child capable of abstract reasoning, which is what intelligence tests are trying to measure. A young child’s intelligence must therefore be measured by proxy, using motor skills to gauge cognitive development. Granting the different nature of intelligence at very early ages, one must still es- tablish the link between that form of intelligence measured

by motor tasks and the mature form of intelligence which will influence school perfo~~ce and adult functioning,

Compensatory education programs are interested in im- proving scholastic skills, which means that early childhood intelligence scores are valuable only as they relate to later intelligence.

The literature gives very little evidence that early child- hood intelligence tests do tap the same dimension as later intelligence tests. If early childhood IQ tests did tap the same dimension as later IQ tests, they should correlate with IQs derived from results from later tests, at least to a degree, assuming that inte~igence ratings reflect a con- stant, underlying attribute. Of five different lon~tudin~ studies reported in Bloom (1964), none found strong cor- relations betwen early IQs and IQ at age 10 until around age 4 or 5, which coincides with the increasing verbal component of the IQ tests. Bayley (19.54) correlates early IQs with IQ at age 17 and finds correlations ranging from -.12 at 6 months to correlations consistently above .80 after age 7. Obviously some of the low correlation at the early ages is due to temporal erosion, but the increase in correlation corresponds surprisingly with the increasing verbal content of the tests used to assess IQ. The extreme- ly low correlations between results from the California Preschool Test (used by Bayley) and results from adult in- telligence tneasures is explained in Buro’s Mental Mea-

surement Yearbook (1941, p. 1382) as follows:

The probable explanation appears to be that the con-

tent of the tests included in the infant scales may be psychologically so dissimilar from that of tests used with older children that no clear-cut relationship could be expected . . . [the test items are largely motor and perceptual] of a kind that have never been found to correlate highly with intelligence at any age.

Similarly, if early tests were tapping the same dimen- sion as later IQ tests, and if IQ has any genetic or environ- mental base, one would expect children’s IQs to correlate somewhat with their parents’ IQs and socioeconomic standing. Studies correlating children’s IQs with their parents’ intelligence (Honzik, 1957) and with their parents’ educational levels (Honzik, 1957; Bayley, 1954; Bayley & Jones, 1937) show extremely low correlations prior to age’ four (r = .I0 to .20). Around age four, the correlations rise to .40 to SO, peaking around 5.5 at age 16. The increase in the correlations coincides with the pe- riod of transition from motor to verbal items on the IQ tests. Likewise, children’s IQs and socio-economic mea- sures show little correlation up to age two years (r = .lO

2s LINDA HEATH

to .15), then increasing correlation up to 6 years (r = .41) (Bayley & Jones, 1937). Finer measures of home environ-

ment, such as those developed by Hanson (1975) and Elardo, Bradley, and Caldwell (1975) also show low corre- lations with intelligence measures for very young children

(age 6 months to 3 years). As Loevinger (1940, pp. 201- 202) pointed out:

From birth until about 18 months the relation between developmental measures and socio-economic measures is slightly negative or zero. The increase in relation of the two types of measures from 18 months until some time between 3 and 5 years accompanies the increas- ingly verbal ccntent of the mental tests.

How are motor tests related to socio-economic mea- sures? They aren’t. Motor or performance measures show no consistent pattern of correlations with socio-economic

measures, ranging from .27 at three months to .Ol at five years (Bayley & Jones, 1937). Furthermore, motor tests show consistently lower correlations with socio-economic measures than verbal tests do (Estes, 1953; Hopkins & Bracht, 1975; Ramey, Campbell, & Nicholson, 1973). Logically, if early IQ tests are motor tests which show no consistent or sizable correlation with SES and later tests are verbal tests which do correlate with SES, one cannot be sure whether declining intelligence scores from disad- vantaged children reflect declining intellectual abilities or are merely reflecting more accurately the previous under- lying differences. Therefore, the decline of intelligence scores following the end of a compensatory education ef-

fort should not necessarily be considered indicative ei- ther of the short-term effects of the program or of the in- ability of such compensatory efforts to improve the lot of disadvantaged children.

The problems of inference associated with differential validity are exacerbated by the use of the non-equivalent control group design in compensatory education evalu- ations. The non-equivalent control group design (Camp- bell & Stanley, 1963) allows the experimental and control

groups to be assigned non-randomly from different sub- sets of what is ostensibly the same population, so the groups can differ in systematic yet undefined ways prior to and in spite of any intervention attempts. The non- equivalent control group design was often chosen for com- pensatory education evaluations because the most disad-

vantaged children were considered most deserving of the limited resources. They were therefore chosen for the ex- perimental group, and children from another, less disad’- vantaged group were chosen to be in the control group. Such selection procedures ususally result in the control group systematically out-scoring the experimental group on initial measures of intelligence, SES, and other factors relating to their “advantagedness.” When this situation is

coupled with a test which reflects underlying differences with increasing accuracy, the result is to make the com- pensatory efforts look ineffective or actually harmful. Campbell and Boruch (1975) discuss several problems

with trying to draw inferences in the educational setting from the non-equivalent control group designs. Such is-

sues as differential growth rates, differential reliability, and test-floor effects make drawing conclusions from such designs extremely precarious. In the preschool compensa-

tory education program, we must add yet another to their list of problems with the non-equivalent control group de- sign: that of differential validity.

To illustrate the problems of differential validity, let us consider what happens to intelligence scores from groups differing in underlying verbal abilities in the ab- sence of intervention. Assume that the control group is

selected from a more advantaged population than the ex- perimental group, as is usually the case in evaluations of compensatory education programs. We would expect no difference between the groups on tests of their motor

skills, since motor skills and SES do not correlate signifi- cantly (Estes, 1953; Hopkins & Bracht, 1975; Ramey et al., 1973). However, we would expect the groups to differ on verbal tests, since that ability does correlate with SES.

Let us further assume that we know the true underlying verbal and motor scores for the two groups. (Of course, in reality we would never know these.) Finally, let us assume that the underlying true scores are stable throughout the

period of evaluation, i.e., there is no increasing effect of environmental or other factors. If the experimental group has a true Verbal score of 85, a true Motor score of 100, and a Mixed score of 92.5 (from those vague items which

defy classificbtion as either pure verbal or pure motor items) whiIe the control group, a more advantaged one,

has a true Verbal score of 105, a Motor score of 100, and a Mixed score of 102.5 the pattern of results in Figure 2 is obtained. Without any intervention or increasing impact of the environment, the difference between the mean in- telligence socres of the two groups will increase by 20 points. Figure 2 illustrates this instance and instances in which the initial differences are not so large. Obviously, we cannot determine what the true underlying verbal and motor parameters are for the groups, but if the experi- mental and control groups differ on the verbal component more than they do on the motor component of the intelli- gence tests, as previous studies would lead us to expect, the gap between the groups will increase with age even in the absence of any other factors operating on.the scores. This could appear as a short-term effect or as a program’s having been ineffective, when in fact the overall score might have been raised above non-intervention levels.

Examining an evaluation of a compensatory education effort which used the non-equivalent control group design illustrates the problem further. ‘The Infant Education Re- search Project in Washington, D.C. (Schaefer, 1972: Schaefer & Aaronson, 1972) assigned trained tutors to work with 15 month old Black males from low socio- economic neighborhoods in Washington. D.C. Tutors worked on the development of verbal and conceptual abilities in the child’s home one hour daily, five days per week, until the child reached three years of age. The

differential Validity: Another Threat to Compensatory Education Evaluations 29

I I I I I I 3 t

3 .oao 5 .OOD 7 .Klc 9 .ooo

CHILDREN’S AGES IN YEARS

LEGEND_

-+9-EXP

-B- cvsa

-t&r cv95

Figure 2, IQ scores generated from Stanford-Binet test items, assuming the true verbal scores for the groups are as follows: Experimental (Exp) = 85, Control I (CV90) = 90, Control 2

(CV95) = 9.5, Control 3 (CVIOS) = 105. For each group the true motor score is assumed to be 100.

I t t 2 I

/ L34-,.liL:7874

CHILDREN’S AGES IN YEARS

0

Figure 3. Resufts from the Schaefer and Aarnnson program evduation cited in Bronfenbren- ner (1974). The * indicates the end of the treatment inte~ent~on.

control group was chosen from different neighborhoods, and comparisons of the samples revealed “only small dif-

ferences, many of which favored the control group on the family variables that might be expected to influence the child’s intellectual development” (Bronfenbrenner, 1974, p. 6). Examination of the longitudinal comparisons be- tween the experimental and control groups reveals precise- ly the pattern one would expect were the differential vali- dity problem operative. The increase in verbal items cor- responds to an increase in the mean IQ score for the con- trol group, while it correspor~ds to a decrease in mean IQ score for the experimental group.

The question now arises: how many of the compensa- tory education programs which were declared ineffective by initial evaluators were actually victims of the differen- tial validity of IQ tests? This question cannot be answered

with any degree of certainty, since the true verbal and motor IQ scores underlying different populations may vary greatly or may not vary at all. Correlational studies indicate that the verbal scores from different SES groups vary more than the motor scores, but we cannot assign a

30 LINDA HEATH

particular value to the differences. However, researchers evaluating compensatory education programs must be alert to changes in the nature of their instruments mas- querading as effects (or lack of effects) of the programs they are evaluating. Researchers could do item-by-item analysis of the intelligence tests used in their evaluations to see if the verbal items are contributing disproportion- ately to group differences. Analyses including only the verbal items could be conducted. though the small item

pool at younger ages could lead to problems of differen- tial reliability. Using randomized experiments would allow the researcher more surety in drawing conclusions about program effectiveness. Though the differential validity

problem could still make the program effects seem to die out. the presence of a true control group would allow the researcher to assess relative gains made by participants in

the experimental program. Only by long-term follow-ups, keeping in mind the vicissitudes of intelligence tests, can

researchers adequately assess the impact of compensatory education programs.

REFERENCES

BAYLEY. N. Some increasinp parent-child similarities during the

growth of children. .Journal of Educational Ps.yclzalogy. 1954,45. l-21.

BAYLEY. N., Br JONES, II. Environmental correlates of mental

and motor dcveloprnent: A cumulative study from infancy to si\

years. Child Dewlopmcnt, 1937. 8. 329-34 1.

BLOOM, B. S. Stability and chnn~c in iztrman ~j~ara(~teristic~. New York: John Wiley. 1964.

BRONF~N~R~NN~R. U. A report OH l~)~l~itz[~~it?al evali~at~(~~ls of’ preschooi progrmrs: Volume ?. is early iiitervenlio~l effective?

Washington, D.C.: Department ot Health, Education. and Welfare.

1974.

BKJROS. 0. K. Mestal measweme!zt yearbook (2nd Ed.). Highland

Park: Gryphon Press, 1941.

CAMPBELL, D. T., & BORUCH, R. Making the case for rnndom-

ixd assignment to treatments by considering the alternatives: Six

waya in which quasi-chpcrinlental evaluations in compensatory

education tend to underestimate effects. In C. A. Bennett & A.

Lum sdaine ( Eds.). Cetitrui issues in sociaf pro,qram el~a~~~ff?i~~~z. Nen York: Academic Press, 1975.

CAMPBFLL, D. T.. & STANLEY, J K.x+perimsxtal and yuasi-e.v pcrimental designs for research. Chicago: Rand McNally. 1963.

ELARDO, K.. BRADLEY. R.. & CALDWELL, B. The relation ot Infants home environment to mental test performance from six to thirty-si* months: A longitudinal analysis. Cllild Developnlcxt. 1975.46.71-76.

ESTES, B. Influence of socioeconomic status on Weolralcr Intelli- gence Salt for Children: An exploratory study. .IOUY& of‘ Co,7- sl~jfifl~~s.~~./l~lo~~~, 1953.17. 58-62.

HANSON, R. Consistency and stability of home environment mea-

sures related to IQ. Child Development. 1975.46, 470-480.

HOPSTAETTI?R, P. The changing composition of “intelligence”:

A study in T-technique. Jowrral of’(;erwtic Ps,~cholo~~y. ld.54, &i,

159-164.

HONZIK, M. Developmental studiev of lwcnt-child rcscmbiancc

in intelligence. Child Dewfopment, 1957.28, 215-228.

HOPKINS. 1;‘. & BRACHT, G. Ten-year itability of verbal and

nonverbal 10 scores. A merkarz ~dli~~tio~~~~ Research Journal.

1975.IZ.469-477.

JAFFA. A. S. Tizc Calijbtxia Preschool Mental Scaie: Ebrm A. Berkeley: University of California Press. 1934.

JOHNSON, 0. G., & BOMMARITO. J. Tests md mmwevw~ts irk

child delaelopment: A harrdbook. San Francisco: Jossey-Bass.

1971.

LOEVINGKR. J. Intelligence 2s related to socioeconomic tbctors.

In National Society for thi: Study of Education, 39rii Yeari~ook.

Bl(}olllin~ton, Florida: Public Sefmol. 1940.

RAMEY. C.. CAMPBELL, F., & NICHOLSON, J. The predictive

power of the BayIcy Scales of Infant Development & the Stanford- Binet Intelligence Test in a relatively constant environment. C/lilcl Development, 1973. 44, 190-795.

SCHAEFER. I_. S. Parcnts as Cduatow: Evidence from cross-scc-

tionai, longitudinal and intervention rescarclr. Yolrr1.r Chiidrerr.

1972.27. 717-739. ..i _

Differential Validity: Another Threat to Compensatory Education Evaluations 31

SCHAEFER, E. S., & AARO~SON, M. Infant education research

project: Implementation and implications of the home-tutoring

program. In R. K. Parker (Ed.), The preschool in action. Boston:

Allyn & Bacon, 1972.

TERMAN, L. M. The measure of intelligence: An explanation of and a compiete guide for the Stanford Revision and Extension of the &net-Simon Intelligence Scale. Boston: Houghton Mifflin,

1916.

SPEARMAN, C. General intelligence: Objectively determined and THORNDIKE, R. L. Constancy of IQ. Psychological Bulletin, measured. American Journal of Psychology, 1904,15, 201-292. 1940.37, 167-186.