meta-analysis using multilevel models with an application to the study of class size effects

Upload: sever-sava

Post on 06-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    1/15

    Meta-Analysis Using Multilevel Models with an Application to the Study of Class Size Effects

    Author(s): Harvey Goldstein, Min Yang, Rumana Omar, Rebecca Turner, Simon ThompsonReviewed work(s):Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 49, No. 3(2000), pp. 399-412Published by: Blackwell Publishing for the Royal Statistical SocietyStable URL: http://www.jstor.org/stable/2680773 .

    Accessed: 13/12/2011 18:09

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

    JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    Blackwell Publishing andRoyal Statistical Society are collaborating with JSTOR to digitize, preserve and

    extend access toJournal of the Royal Statistical Society. Series C (Applied Statistics).

    http://www.jstor.org

    http://www.jstor.org/action/showPublisher?publisherCode=blackhttp://www.jstor.org/action/showPublisher?publisherCode=rsshttp://www.jstor.org/stable/2680773?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/2680773?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=rsshttp://www.jstor.org/action/showPublisher?publisherCode=black
  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    2/15

    AppL.tatist.2000)49, Part 3, pp.399-412

    Meta-analysisusing multilevelmodels with napplicationto the studyof class size effectsHarvey Goldstein nd Min YangInstitutef Education, ondon, Kand Rumana Omar, Rebecca Turner nd Simon ThompsonImperial ollege choolofMedicine,ondon,UK[Received September 1998. Final revision eptember 1999]Summary. Meta-analysis s formulateds a special case of multilevelhierarchical ata) model nwhich he highest evel s that f he tudy ndthe owest evelthat f n observation n an individualrespondent. tudies can be combinedwithin single model wherethe responses occur at differentlevelsof he data hierarchynd efficientstimates re obtained.Anexample sgivenfrom tudies ofclass sizes and achievement n schools, wherestudydata are available at theaggregatelevel intermsof overallmean values for lasses ofdifferentizes, and also at the student evel.Keywords: Class size research; Meta-analysis;Multilevelmodelling

    1. IntroductionThe effectsf class size on achievement ave been studied incethe1920squantitativelyndqualitatively,nd have certainly eendebated formuch onger.There s a largenumber fexisting tudies, ncluding bservational urveys,matched esigns ndrandomized ontrolledtrialsRCTs). Despite henumber f tudies,he esults re oftennconclusive. lassand Smith(1979) firstpplieda meta-analysiso 77 studiesbased on 70 years'researchn morethanadozencountries. heyconcluded hat herewerepositive ffectsor lass sizes of essthan20,basedon 14 ofthese tudieswhichwere onsidered o be wellcontrolled'. heirquantitativesynthesis ethod as beenfollowed ymanymoremeta-analysesn the ametopic CarlbergandKavale, 1980;HedgesandOlkin,1985; Slavin, 1986,1990;McGiverin tal., 1989).Slavin 1990) arguedthatGlass's positivefindingwas based on onlya small number fstudiesand the resultswere largely ffected y one extreme ase (Verducci, 1969). Onreanalysis lavin reported n effect hat was much smallerthan thatof Glass. He alsoconducted n analysisofninerandomized r matched tudies.Amongthese studies omewereused by Glass and Smithn 1979 but most of themwerenew studies elected ccordingto strict nclusion criteria.The large scale Tennesseestudent/teacherchievement atio(STAR) RCT study Wordetal., 1990)was included. lavinsuggested moderate ffectizeof 0.17standard eviationSD) units f achievementcore omparingmaller lasses of15 or16 with arger lassesof25-30.*Theuseof random-effectodels nmeta-analysisas beensuggested yseveral esearchers(Hedges ndOlkin, 985;Raudenbush ndBryk, 985;Hardy ndThompson, 996;Erez et l.,

    Address or correspondence: arvey Goldstein,Department f Mathematical ciences, nstitute f Education,UniversityfLondon,20 BedfordWay, London,WC1H OAL,UK.E-mail:[email protected]? 2000 RoyalStatistical ociety 0035-9254/00/49399

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    3/15

    400 H. Goldstein, . Yang,R. Omar,R. Turnernd S. Thompson1996; Clearyand Casella, 1997). The present aper focusesmore on themethodology fmeta-analyseshanonthe ubstantivessueofclass sizeper e. Fora moredetaileddiscussionof the latter nd a consideration f the role of RCTs in such studies see Goldstein andBlatchford1998).In thispaper we tackle the problemofhow to comparedata from ifferenttudieswithvarying ummarymeasuresby usingmultilevelmodels Goldstein,1995).We also developmultilevelmodels to combinestudy evel data and individual evel data. This providesastatisticallyfficientethod or he ituationnwhich omestudies ave ndividualeveldatabut othershave only summary tatistics vailable (e.g. means and standarderrorsfrompublished apers). We first escribe,nSection2, thestudies ncluded nddata available foraddressing he ssue ofclass size effects. ection 3 introduces multilevelmodel formeta-analysis, ocusing n aggregate evel data, and Section4 describeshow the model can beextended o combinebothaggregate evel and individual eveldata in thesarme nalysis.2. Sources of data2.1. Criteria or nclusion n thestudyWe restricturselves o those studieswhichmeetthefollowingnclusion riteria.

    (a) The study s an RCT or has a matcheddesignwherethere s an attempt o matchsmaller nd larger lasses nitially yusing choolor student evel criteria.(b) Thestudy utcomes reachievementcores, .g. standardizedest cores rratingcales.(c) The study s longitudinal ith nitial nd final chievementmeasures nd at least oneschoolyearperiodforboth arger nd smaller lasses.(d) The smaller lass is not ess than15 and the arger lass is not morethan 40.These nclusion riteria re similar o those hatSlavin 1990) set out forhis analysis nd therangeofclass sizes matches hatfound neducational ystems f ndustrializedountries.2.2. Scope and strategy fliteratureearchSeveral databases were searchedusingthekeywords lass size, longitudinaltudy, choolachievement;heERIC databasefrom 961to 1997, heBritish ducation ndex 1954-1996,covering 00 ournalsofeducation), heCanadianEducation ndex 1976-1996coverage) ndtheAustralian ducation ndex 1978-1996 coverage).Psychological bstractswas searched(1985-1996)using he ubject itles lasssize, classroom,roup ize, cademic chievementndmeta-analysis.

    Nine studiesmetourcriteria, mong which even tudieswereusedbySlavin 1990).Twostudies sedbySlavincouldnotbe traced hrough urdatabase search, rbyan additionalInternet earchfor the authors' names.The data on these, s presented y Slavin,are notsufficientlyetailedforuse in our analysis.Two newstudies hatwerenot used bySlavinwere dded to our collection. nlyonestudy,heSTAR study, rovidesndividual eveldata.In the next ectionwe listsomebasic informationbout the studies elected.2.3. Studies selectedA summaryfthestatisticalnformations given n Table 1.2.3.1. Study1. Balow (1969), CaliforniaStudy1 was an experimentalbutnon-randomized) tudy n reading chievement orstu-

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    4/15

    Class Size Effects 401Table 1. Raw and adjusted data of each studyforreading coresStudy Grade Class Number f Mean + SD Adjusted Pooled SD, Standardized Effect izek j size h pupils, Ijk reported, meanAC/k SDjk adjuisted ean YS.Jk-YLjk

    XJk ikYh.jk1 1 15 251 50.9 50.9 12.01t 0.1251 30 744 48.9 48.9 12.01 -0.042 0.173 15 656 248.9 248.9 12.37 0.0123 30 602 245.6 248.6$ 12.37 -0.013 0.022 4 16.6 256 0.00+ 0.30 0.00 0.275 -0.0124 23.7 368 -0.04 + 0.30 -0.04 0.275 -0,157 0.154 30.3 450 0.02+ 0.27 0.02 0.275 0.061 -0.074 35.7 555 0.02+ 0.25 0.02 0.275 0.061 -0.073 2 15 78 2.39+ 0.809 2.671 0.620 0.2822 30 542 2.52+ 0.895 2.47 0.620 -0.041 0.323 15 156 3.16+ 0.954 3.49$ 0.588 0.2123 30 555 3.42+ 1.074 3.33 0.588 -0.061 0.274 15 57 4.38+ 1.181 4.37$ 0.661 0.1884 30 441 4.23 ? 1.400 4.23 0.661 -0.024 0.215 15 43 5.40+ 1.534 5.55$ 0.665 0.3955 30 413 5.22+ 1.680 5.21 0.665 -0.041 0.446 15 63 5.69+ 1.510 6.28T 0.700 0.3206 30 374 6.19+ 1.925 6.10 0.700 -0.037 0.364 1 15 1127 49.7+ 14.45 53.4? 9.01 0.0982 25 516 50.6+ 16.12 50.6 9.01 -0.213 0.315 2 15 57 52.0+ 9.93 52.0 8.379 0.1982 25 55 48.7 + 7.25 48.7 8.379 -0.205 0.396 1 19 368 43.2 43.2 10.46t -0.0851 31 646 44.6 44.6 10.46 0.049 -0.137 1 20 371 523.8+ 88.7 523.8 137.6 0.1911 27 350 469.8+ 175.4 469.8 137.6 -0.176 0.392 20 309 590.2+ 49.6 590.2 50.41 0.115

    2 27 313 578.7+ 51.2 578.7 50.41 -0.114 0.238 9 20 2819 70.6+ 11.2?? 70.6 13.3 0.3009 30 2543 62.6+ 15.4?? 62.6 13.3 -0.301 0.609 1 15 2644 531.0+ 57.1 529.1* 37.23 0.0701 24 1414 520.0+ 54.4 516.1* 37.23 -0.167 0.242 15 3112 591.0+ 45.6 591.1* 28.98 0.0202 24 1482 583.0+ 45.4 579.2* 28.98 -0.042 0.063 15 3353 619.0+ 38.5 619.5* 21.77 0.0083 24 1357 615.0+ 38.2 619.2* 21.77 -0.020 0.03tSD derived rom he F-test aluereported.tBoth themeanand theSD wereadjustedfor a pretreatmentcorebased on thereported orrelation oefficientbetween hepretreatmentnd post-treatmentcores.?Boththemeanand theSD were djustedfor pretreatmentcoreassuming correlationoefficientf0.8.??Both hemean nd the D were alculated nthebasisof n averagemeasure rom 0 schools vailable n thepaper.*Both themeanand the SD were djustedfor pretreatmentcoreusing three-level odel with ovariates.dents rom rades1-2 andthengrades3-4. Class sizesweredefined s 15 for mall nd 30forlargeclasses.The meansof thereading coreat grade 1 forthetwo classeswerereportedequal so that cores tgrade2 were ompared.Means atgrade4were ompared djusting orthe ntake eading r pupils' ntelligenceuotientmeasured t grade3. No standard rror oranymeasurewas reported, xceptforF-test aluesin thepaper.2.3.2. Study . Shapsonet al. (1980), Toronto ityStudy was an RCT forfour lasssizegroups: 16, 23, 30 and 37. The trialperiodwas fromgrade4 to grade 5. Efforts ere made to keepthesamegroupofpupils n thesame class

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    5/15

    402 H. Goldstein, . Yang,R. Omar,R. Turnernd S. Thompsonduring he trialyear. t was reported hat thechanges n pupils n a class were imited owithin 3 by the end of the study.Measures included est cores for composition, ocab-ulary, eading,mathematical oncepts nd mathematical roblem olving.Means and SDsadjustedfor he yearofthe study nd teachers' xperiencewerereported.2.3.3. Study : Doss and Holley (1982), Austin, exasStudy was a 5-yearmatched esign tudy rom rade2 to grade6 for chool achievementnreading, anguage nd mathematics. he class size was 15for malland 30 for argeclasses.Initialmeans and SDs of test cores t thebeginning f theyear nd those t theend of theyearwerereported or hefiveyears.Correlation oefficientsetween heprescores nd post-scoreswere lso reported y gradeand class.2.3.4. Study . Wilsberg nd Castiglione 1968), New YorkCityA totalof 1127grade1students rom 3schools nd 516 grade2 students rom even choolswereused nstudy . Grade 1 studentswere n smallclasses of15 and grade2 studentswerein large classes of25 and over. Both received he samematerials,nd helpfora year.Thestudy eportedmeans ndSDs of a reading est tentryntothe tudy,nd means nd SDs ofvocabulary nd comprehensionestsweretakenat the end of thestudy.2.3.5. Study . Wagner 1981), Toledo,OhioGrade2 studentsn one schoolassigned o small lassesof essthan15were omparedwithmatched chool with arge classes of 25 in study5. Thiswas published s a doctoralthesis.2.3.6. Study : Mazareas (1981), BostonA random ampleof 1014grade1pupils 368 frommall lassesof ess than20 and646 fromlarge classesof more than30) wereused in study . Outcomeswereadjustedfor ovariatesandF-test alueswerereportedorfive choolattainmentcores ncluding eading. his waspublished s a doctoralthesis.2.3.7. Study7: Butler nd Handley 1989), MississippiStudy wasamatched esign tudy fgrade1andgrade studentsmeasuring eading,isteningandachievementnmathematics. utcomes or tudentsnsmaller lasses size 20) ofgrade1andgrade were omparedwith he amegroup fstudentsn arger lasses size 27) followedfor2 years.Studentsn the smaller nd larger lasseswerefrom hesame school.The studymatched or actors uch s teachers' ualificationsnd an entranceest, ut tdidnot arry utcovariate djustment.Means and SDs bysubject yclassgroupwerereported.2.3.8. Study : San JuanUnifed chool District 1991), CaliforniaA total of2819 students rom 0high chools grade 9) originallynlargeclassesof30wereassigned o reduced izeclassesof 20 for year nd comparedwith hose nlarger lasses nstudy . Themeansof a reading omprehensionest ngrades9 and 10 werereported.2.3.9. Study . Word t al. (1990), TennesseeThe STAR project study9) was an RCT longitudinal tudywithchildren ollowedfrom

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    6/15

    Class Size Effects 403kindergarteno grade 3 for4 yearswithmeasurementst 1-year ntervals. maller lassesaveraged bout 15 students13-17) and larger lasses about 24 (22-25). About 4000 studentswere available fortheanalysis nd the nitial ssignmentnto kindergartenlasses was atrandom.In Table 1 the class size, number f pupils,means and SDs are takenfrom hepublishedpapers.The adjustedmeans nd pooled SDs arecomputed yusing quations 1) and 2) respec-tively elow. The standardized djustedmeans re computed y using quation 3) below.As we can see fromTable 1 severalproblems rise. The tests hatwere used to measureachievement re obviously ifferentrom tudy o study.Rescalingthe measurementso acommon cale s essential ormeta-analysis. ommonpractice s tostandardize hemean foreach classgroupwithinachstudy yusing pooled SD. For example, he onventional ffectsize measure (Glass and Smith, 1979; Hedges and Olkin, 1985) is (s - L)/SDpooled, wheretheterms sandYL indicate hemean core f maller nd larger lass groups espectively.or ourpurposeswe require, s a minimum,stimates f themeans nd pooled SDs. Some studies idnotpresent Ds for heir chievement easures. n this ase an F-test r t-test alue reported ysucha study ad tobeused to derive hepooledSD for he wo groupsunder omparison.Differencesn the ffectfclasssize between tudiesmayarisefrom arious auses. Wherecommon data are available, e.g. on socioeconomicbackground,we can see whether uchfactors xplainpartof thestudy ifferencesThompson, 1994). n thepresent ase we havethe additionalproblem hatdifferentchievement estswereused in each study nd thiswillgenerallyntroduceurther,nknown, ariation.A furtherssue sthat, partfrom heSTARstudywhere tudenteveldatawere vailable,thebetween-school ariationwithin study snotseparately eported ut shouldbe includedn ourmodels.

    2.4. Adjusting orpretreatmentcoreOur inclusion riteria ornon-RCT studies o be matched n student r class factors mplythat for each studywe can adjust for initial achievement. his is important ornon-randomized tudies oallow for ny association etweennitial chievementnd allocation oclassesof differentizes. n randomized tudies twillgenerallyncreaseprecision s well aspotentially elping o correct or ny problemswith herandomization rocedure.Giventhemeansand SDs forbothpretreatmentndpost-treatments well as thewithin-groupcorrelation oefficient(pre,post) between hepretreatmentnd post-treatmentestscores,we adjustthepost-treatmenteanofthesmalland largeclassesby equation 1) toobtain estimates of the adjusted means i,cpost and c,post' his is equivalent to applying ananalysis f covariance o the two classgroupswith hepretreatmentcoreas the covariate.

    A4h,post = Xh,post + a r(pre, ost) (Xh,pre Xpre) (1)Uprewhereh indexes heclass size and xpre is theoverallpretestmean. The symbols and x referto the pooled between-subjectD and treatmentmeans respectively.f the correlationcoefficientetween hepretreatmentnd post-treatmentcores s notprovided, n estimatemaybe available from ther tudiesforexample ee the footnote o Table 1).2.5. Adjusting nd poolingstandard deviationsGiventheresidual umofsquaresof thepost-treatmentcoreadjustedfor hepre-treatmentscore for ach class sizegroup separately, aySSs and SSL, a pooled SD is calculated s

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    7/15

    404 H. Goldstein,M. Yang,R. Omar,R. Turner nd S. ThompsonSDpooled = l Ds )D2

    whereD refers o thedegrees ffreedom sed foreach class sizegroup.The final ummary tatistics re the adjustedmeans and pooled SDs in Table 1. Thestandardizeddjustedmeans arecomputedbycalculating,or achgrade neach study, hemean over all class sizes weighted y thenumbers f students,ubtractinghisfrom achstandardizedmeanand dividing ythepooled SD, namely1hjk nhjkl'9hjkZ nh.J1

    Yh.jk SD h (3a)hJk ~~SDkk(h= 1 for a large class; h= 2 for a smallclass). The standardizeddjustedmeans are theresponses sed fortheaggregate evel data.

    On thebasisofthese heconventional ffectize can be estimated s inthe astcolumnofTable 1 byusingYS.Jk YL.Jk= ( C.k J-ULJk)/SDjk. (3b)The homogeneityest Hedges and Olkin,1985) fortheweighted nd bias-correctedffectsize estimates or he ight tudieswith ggregateeveldata indicates ignificanteterogeneitybetweenthem X25 255.5; p

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    8/15

    Class Size Effects 405YIJ (X/3)ij altij uj + eij, j=1, . ,J, i = 1, . . , ,Uj - N(O, f2), e j -N(O, of2).

    It is also possibleto allow thevarianceswithin nd between tudies o be differentor achtreatment r to varywith he valueof a continuous reatment ariable, eadingto complexvariance structuresGoldstein 1995), chapter3). We can also introduce ovariateswheredata areavailableand appropriate,nd interactions etween reatmentsndcovariates. orexample, particular reatmentontrastmay differ ccording o the covariatevalues.Wemay also relax thenormalityssumption f the evel1 residuals, .g. if fitting generalizedlinearmultilevelmodel Goldstein, 995; Turner t al., 1999).3. 1. Aggregate evel dataConsidernowthecase wheremodel 4) is the underlying odel but we onlyhave data bytreatment roupat the study evel.Aggregatingo this evelwe write hemean response s

    Yh = (X/3)j+ olvh + ulj+ eh (5)where hedot notation enotes hemean for tudy. This mplies articularonstraints,.g.var(eh.j) var(ehij)/nlj. difficultyay arise with the first erm n equation(5) since thisimplies hatthemean of the covariatefunctionX3)ij for ach study s available.The corresponding odelfor hecase of a continuous reatment ariable s

    yj = (XO3)j +catj + uj1+ ej.3.2. The two-treatmentaseConsider he pecialcase of twotreatments,= 1,2. We collapse equation 5) and, using nobvious notation, ewrite o give

    yj=yi1-Y2j= a+?u?+e'j, (6)av av1 2-This impliesthe constraint ar(u) = var(u1j)+ var(u2j) 2 cov(ulj,u2j).We can combineequations 5) and (6) intoa singlemodel for hecase where ome aggregatedesponses re nterms fseparate reatment roups nd someare in terms f contrasts f groups.

    3.3. Defining rigin nd scaleWhencombining ata from ggregateevel tudiest snecessaryo ensure hat heresponsevariable cales are thesameand that here s a commonorigin. n traditionalwo-treatmentmeta-analyses hetreatmentifferences dividedbya suitablepooled) within-treatmentDas described arlier.n ourgeneralmodel, ikewise,heresponse ariable n eachstudy anbescaledby dividingtbyan estimate f the evel1SD. Where ndividual ata are availablewemayuse an estimate fthe evel1 SD from preliminarynalysis nd for ggregate ata wemayderive hisfrom eportedummarynformation,fthis s available.In situationswhere he sameresponse ariable s used in each study, nd scalinghas beencarriedout,we can apply equations 4) and (5) directly.n many cases, however, ifferentresponse ariables reused. For example, n class size studies ifferenteading ests re used.In thiscase we would not generallyxpectthemeans forcorrespondingreatmentso beidentical.One procedurefordealing with this s to choose one treatments a reference

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    9/15

    406 H. Goldstein, . Yang,R. Omar,R. Turnernd S. Thompsontreatmentor control) nd in each study o subtractts mean from he values of the othertreatmentsnd to workwith hese ifferences.his s the tandard pproach n two-treatmentstudies. hus we chose one treatmentescribed y an interceptermwithdummy ariablesfor the remainder. he coefficientsf the interceptnd of these dummyvariables wouldgenerally e modelled s random t the study evel. n the two-treatmentase this eads tomodel 6).Wherewe have a study with ndividualdata we likewise ubtracted he mean of thereferencereatmentroupfrom he response ariable. n thefixed art of themodel,for helevel 1 unitswith hattreatment,he ntercepterm and other reatment ummy ariables)willbe 0.3.4. Variance nformationWe mayhave additional nformationbout variancesfrom tudies, .g. information romothermeta-analysistudies bout between- rwithin-studyariation. uppose,for xample,that nmodel 4) wehavean external stimate,ayrije, ofo% + a2(h2 wherewe might avea2 = 1/nlj. f we write n additional omponent o the model as an extra evel2 unit

    rhue = ulj + aeh j (7)where hefixed art s identically and we have additional onstraintsmposed s above, thisinformationsthen ncorporatedntothe stimation.Wenote,however, hat his xtra evel2unit s given he sameweight s every ther evel 2 unit n themodel,andwe may wishtoassign differenteight ependingn the ccuracy f the nformationbtained.Weightingsdiscussedn the next ection.3.5. Weighting nitsWe shall consideronly weighting f the level2 units, lthough xtensions o differentialweightingf evel1units repossible.Suppose that he th evel2 unit s assigned weightwj.These weightsmay reflect nformationbout the quality of the study or possiblynon-response.Such an analysis mightbe undertaken s a sensitivitynalysis to complementan unweighted nalysis.Note that sample size weightings already ncorporatedn theestimation ia equation 5). Assuming hat the weights re uncorrelatedwith the randomeffects, e rewritemodel 4) to include he vector fthe nverses f the square roots oftheweights s theexplanatory ariablefor he evel 2 random ffects. his gives

    Yhii = (XA)1j + iIhthij + UiljWj- + eh ij (8)and we can carry ut the tandard stimationor hismodel.Thisprocedure or arryinguta weightedmultilevelnalysis s discussed n Pfeffermanntal. (1997) and is equivalent otheirstepA only'method. hey lso discussed he ase where heweightsre correlated iththe random ffects.3.6. Modelling lass sizeIn ouranalysis lass size s treated s a continuous ariable entred t a value of 15. n all thestudies, s is clearfrom able 1, onlytheaverage lass sizesfor small'or large'classesarereported. hese values are thereforehe valuesusedin theanalysis.One of ouraggregateevel tudies Doss andHolley, 1982) sampled eparategradeswithinschools. In principle hisprovidesa furtherevel between the class and the school. A

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    10/15

    Class Size Effects 407preliminarynalysis, owever, etected ariation t this evel nly or he implestmodel, o wedo not nclude t n further odels, lthough rade evel tselfs incorporateds a fixed actor.

    3.7. Aggregateevelmodelsfor lass size dataFor theaggregate evel studieswe can write basic model asY.jk = iVOjk+ OtIYkCjk +? /AG,jk + ejk,

    aOjk = 0aO + Uojk + VOk, OZlk = OZI + Vlk, (9)Uojk - N(O, 2o), ejk - N(O, /2lk), J

    VOk N(O, CVO) Vlk N(O, (T), cov(vok, Vlk) = (vol,wherej nd k now ndex hegrade ndstudy espectively.heparametero estimates hemeanscore for class size of 15.The term Ojkis the randomdepartureresidual)of the th-grademean from hekth tudy nd/3 hefixed ffect orgrade1,with heGljk beinggradedummyvariables, hesebeing ovariatesn themodel as described nequation 5). The termVOks theresidual or hekth tudy. hevariableCjk is the lass size andtheparameter, estimates heoverallclass size effect er additional tudent. he termVlk stimatesheadditional andomdeparture or hekth tudy f theoverall lass size effect. urther ovariates ould of coursebe added, ifavailable.Not all thestudies ampledmorethanone grade evel and in somestudies everalgrades resampledwithin achschool,whereas nothers ifferentrades resampled n differentchools. n the atter ase gradedifferencesre confoundedwith chooldifferenceso an interpretationfbetween-gradeariation s difficult.or thisreasonwe donotfitgrade as a level n thefollowing nalysis, lthoughwe do study ixed radeeffects.

    Since all our data have beenstandardized,heunderlyingevel1variance s equal to 1. Wethereforeefinehe xplanatory ariable jk = 1/VfIk nd we canwrite hefirstineofmodel(9) fortheaggregatedmodel asYjk = aVOjk aElkCjk+ Z f3Gl,jk+ WjkZjk, (10)

    Wjk N(O,1).In practice, orclasses of a given ize in a study, ypically e onlyhave available the meanoverall classes, o, although hecontributiono thevariancefrom hese lasses for hekthstudy s EJ nji , thedata that are availableprovideonlythe value of (j njk)-Y'. Whentheseclass sizes areconstant, owever,hefirstxpressionan be obtainedfrom he econdwherethe number f classes s known.3.8. ResultsWe first resent esults or heaggregateevelstudies nly nd follow hiswithresults romboththe ndividual evelstudy nd thecombined ndividual nd aggregateevelstudies.Table 2 presents heresults ffitting odels 9) and (10) fortheaggregate ata studies(numbers -8 inTable 1), usingmaximum ikelihood stimationor hreemodelsas shown,together ith 95% confidencenterval orthe estimates ased on a parametric ootstrapwith1000replicationsGoldstein,1995).Model A allows the class size effect o varyacross studies,model B allows no suchvariation nd modelC includes quadratic ffectfclass size. As can be seenfrom he og-likelihoods,model A fits hedata substantiallyetter hanmodelB, so there s substantial

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    11/15

    408 H. Goldstein, . Yang,R. Omar,R. Turnernd S. ThompsonTable 2. Model estimatesfor he aggregated study data using model (9)tParameter Estimatesor the ollowinig odels:

    Model A ModelB Model CFixed effectsIntercept 0.163 (0.028, 0.308) 0.207 (0.149, 0.261) 0.224 (0.053, 0.393)Class size, inear -0.020 (-0.036, -0.004) -0.022 (-0.025, -0.019) -0.048 (-0.072, -0.025)Class size, quadratic 0.002 (0.001, 0.003)Random between-stuidy)ffects072 0.060 (0.0, 0.101) 0.004 (0.0, 0.014) 0.067 (0, 0.135)07vol -0.006 (-0.010, -0.001) -0.006 (-0.013, 0.004)072 0.0006 0.0, 0.0010) 0.0006 0, 0.0010)-2 log-likelihood -46.1 266.3 -54.1IThe constrained arametert class level s omitted. 5% bootstrapntervals regiven nparentheses.

    evidence fheterogeneityn the class size effectcrossstudies.Model A estimates he effectonreading cores s a decrease f0.02SD units eradditional tudent. his s slightly reaterthan the 0.17 units estimated y Slavin (1990) comparing lasses of 15 or 16 with argerclassesof25-30. Model C indicates quadratic ffectfclass sizewherebyrom classsize of15 to a sizeof30 there s a continuingecrease n achievement, ut an increase n achieve-ment hereafter.hisresult, owever,s influenced y study with he argeclasses over 30.A testfor quality fgradeeffectss not significantX2 = 1.8) so thesehave been omittedfrom hesemodels.The likelihood atiotest tatisticuggests hattheclass sizeeffect ariesacross studies.However, here re only eight tudies n the data setso inferences ased onlarge ampleresultshouldbeviewedwith aution.Also thesemodels gnore etween-schoolvariationwithin tudies nd between-gradeariation s pointedout above. If formodelA,however,we allow the evel1 variance o be estimatedwe obtain an estimate f 1.81withlikelihood atio est tatistic,or omparisonwithmodelA, of 3.0 with1degree f freedom othere s onlyratherweak evidence or value differentrom .0. fwedo the ameformodelB the evel1 variance stimate s 16.6 and the test tatistics 285.5. The analysisutilizes llthe nformationhat s availableforthepublished ggregate tudies. ince we areworkingwith tandardized ata theonly flexibilityies in themodelling f theclass size effectndthebetween-studyariation.n comparisonwiththe nclusion f individual evel data theanalysis llustrates he imitationsfusing ggregateevel data.4. Models forcombining individual level data withstudy level dataAlthough heSTAR individual eveldata sethas covariates vailable, he ggregateeveldatahave not been adjustedforcovariatesn a consistent ashion, ther han forclass size andinitial est cores as discussed bove. Somestudies,however,uch as thatofShapsonetal.(1980), reported heir esults djustedforotherfactors,nd some of the studies arried utinitialmatching.n thefollowing nalysiswe shallignore hisvariation, utitneeds to beborne nmindwhen the results re interpreted.The STAR studyhas threeevels: chool,class and student. hildrenwererecruited henthey ntered indergartenhere heywererandomly ssigned o three izesofclass; a smallclassof13-17,a regular lass of 22-25 and a regular lassof 22-25 with teachingide. Thelast two categories re combined ince n the STAR study hey how no differences.he

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    12/15

    Class Size Effects 409studentswerefollowed or years o theendofgrade 3,and for hepresent urposesweusethereading est coredata at the nd of grade 1, adjustedfor eading est cores t the nd ofkindergarten,.e. a study xtending ver1 year.The study ttemptedo retain heoriginalclass compositions, ut this was not entirely ossible. A discussion of the problemsofinterpretingata from his tudy s givenby Goldstein nd Blatchford1998).The followingmodel s a combinedmodelfor heSTAR study nd thepreviously nalysedaggregate evelstudies.We omit theeffect f grade since this was not significantor theaggregateevelanalysis.

    Yijkl= (OI+ ?allCi kl) + e.jkl(l ZI) + (cvOijkl + ?V2X2ijkl + VlklCilkl)Zl,oO= oo + W01n a1ll = ?l + W1i, aVoijkl VOkl UOjkl eijkl,z, = 1 if ndividual ata study, 1 = 0 otherwise,N(O, WO), w11 N(O, uv1), cov(wO1,w1O)= Twol, (11)

    VOk, ' N(O, vo), Vlkl N(O, vi), cov(vokl, Vlkl) = Uvol,UOJkl NA(UO,vo), eijkl NO e) ejkl e(O ?njkl),whereX2ijkl is the end of kindergartencore,withthe standardassumptionthat it isindependent f therandom ffects,nd C is the class size. The parameter21 representshebetween-studyariance n the class size effect,nd a21 thebetween-school ariance n theclass size effect.This modelutilizes notation imilar o thatused before nd is now a four-levelmodelwith tudents groupedwithin lasses within choolsk within tudies .The STAR data arestandardized yusingthe residualvariancefrom preliminaryhree-level odelwithonlytheSTAR data. For the combineddata analyses n Table 3 the random-effectsarameterestimates t levels1-3 are derived rom heSTAR data and at the lass evel 2) the ggregatelevelvariance,which s notshown, s constrained o be 1. Thebetween-studyevel 4) inter-ceptand class size coefficientandomparameters re estimated rom hecomplete ata set.4. 1. ResultsIn Table 3 the evel4 (between-study)ariation s somewhat maller han hat stimated romtheaggregate ata studies nly.We see thatthe class size effect or he STAR data and thecombined stimate s little ifferentrom hat n theanalysisusingonly ggregateevel data(Table 2) and thequadraticeffects nownegligible.n fact he inearclass size effectn thecombinedmodel is less precisethanfor the STAR study lone because of thesubstantialheterogeneityetween tudies ntheclass sizeeffect. he STAR data showonly small andnotsignificantX2= 1.5) variationnthe class size effect etween chools. n factGoldsteinand Blatchford1998) show that for mathematicsest scores there s a marked variationbetween chools.A studyof the shrunken) stimated esiduals t thestudy evel does notrevealanyoutliers.5. DiscussionWe have shown how a seriesof studies,with results eported t differentevels of aggre-gation,can be combinedefficientlyithin singlemultilevelmodel to provideeffect izeestimates. incetheanalysis s based on maximum ikelihood stimationwithin n explicitmodel it can be expected o yieldmoreefficientstimates han traditional pproachesto

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    13/15

    410 H. Goldstein, . Yang,R. Omar,R. Turnernd S. ThompsonTable 3. Parameterestimatesformodel (11)tParameter Estimates or the ollowing ypes fdata:

    STAR data only Combined ata (linear) Combined ata (quadratic)Fixed effectscxo 0.078 0.184 0.175al (class size, inear) -0.024 (0.006) -0.022 (0.007) -0.017 (0.011)CX3 class size, quadratic) -0.0003 (0.0006)a2 (pretest) 0.907 (0.018) 0.907 (0.018) 0.907 (0.018)Random ffectsLevel4 (between tudy)

    072 0.038 (0.020) 0.037 (0.019)wov.l -0.004 (0.002) -0.004 (0.002)072 ~~~~~~~~~~~~~0.00040.0002) 0.00040.0002)Level 3 (between chool)

    072 0.305 (0.064) 0.305 (0.064) 0.305 (0.064)Jvo0vo1 0.00014 0.004) 0.00012 0.004) 0.00013 0.004)

    072 0.0006 0.0006) 0.0006 0.0006) 0.0006 0.0006)Level2 (between lass)02o 0.139 (0.023) 0.138 (0.023) 0.138 (0.023)Level 1 (between tudent)0S2 1.000 0.023) 1.000 0.023) 1.000 0.023)

    -2 log-likelihood 11996.5 11948.3 11948.1IStandarderrors regiven nparentheses.meta-analysis. hese traditionalmodels also havebeenunableto combine tudieswithbothindividual nd aggregateevelresponses. urapproachdoesnotrequire alanceddata,but tdoes require hatthereportingf studiesfor nclusion n the model conforms o certainminimum equirements.s we have illustrated,heserequirementsre suchthat t shouldbe possibleto carry ut a suitable tandardizationormeansand variances, fter djustingfor relevant ovariates.One of theproblemswithobservational tudies, specially hoseinvolving nstitutions uch as schools, is that (multilevel)modelling ncorporatingnsti-tutional and other)differencess absentand this can result n biased inferences.n thepresent ase (Table 3) the intraclass nd intraschool evel correlations re sizablewhichimplies that some of the inferences romthe aggregate evel studies may overestimatestatisticalignificance.he estimates hemselves, owever, hould be relatively naffected,and this s consistent ithouranalysis.

    A remaining roblemwhichwe have not nvestigatedndetailoccurswhere tudies djusteffectsy usingdifferentets of explanatory ariables. n the normal distributionase, ifinformations available about the covariance matrixof all such covariatesthenfor theaggregateevel tudies ommon djustmentsan be carried utas wehavedone n model 1).The model can be extended eadily o themultivariateasewheremorethanone outcomeis considered, .g. in the bivariate nalysisofmathematics nd reading chievementcores.This approach can also be used wherenot all studiesmeasureall responses o the ointanalysiswithin singlemodel will providemore efficientstimates han analysing achresponse eparately.Since we have adopteda model-based pproach t is possible n principle o incorporatefurthermodel components.An important omponent s themodelling fpublicationbias(Copas, 1999), although uch modelsmaynot lead to improved stimates nlessthe bias islarge HedgesandVevea, 1996). n thepresent asewewouldargue hatpublication iasmay

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    14/15

    Class Size Effects 411not be a serious ssue.The criteria or tudy election ave been quite tringento therelevantstudies re carefully xecuted ong-termtudieswhich re unlikely o remain npublished.It shouldbe noted that n combiningtudiesformodelling urposeswe are making nassumption hat the responses sed in thevarious studies re indeedmeasuring he samething. n social science pplications f meta-analysishis s more problematic han n, say,clinicaltrials nd needs to be borne nmindwhen nterpretingesults.Finally, lthough he thrust f thispaper is methodological,t is of interest hat theonelarge RCT gives an estimate orthe class size effect hich s very imilar o that from heobservational tudies. hispoint spursued urther y Goldstein nd Blatchford1998) whoalso discussthe usefulnessfRCTs in thiskind of research.AcknowledgementsThe researchwas fundedby the Economic and Social Research Council under the pro-grammefor the Analysis of Large and ComplexDatasets. We are most grateful o thereferees nd the Joint ditorfor heirhelpful omments.ReferencesBalow, I. H. (1969) A longitudinal valuation f reading chievementn smallclasses. Elemen.Engl.,46, 184-187.Butler,J. M. and Handley,H. M. (1989) Differencesn achievement orfirst nd second graders ssociated withreduction n class size. 18thMid-south ducationalResearchAssociation . Conf., ittleRock, Nov. 8th-JOth.Carlberg, . and Kavale, K. (1980) The efficacyf specialversus egular lass placement or xceptional hildren:meta-analysis. . Specl Educ., 14,295-309.Cleary,R. and Casella, G. (1997) An application f Gibbs sampling o estimationn meta-analysis:ccounting orpublication ias. J. Educ. Behav.Statist., 2, 141-154.Copas, J. 1999) What works?: electivity odels and meta-analysis. .R. Statist. oc. A, 162, 95-109.Doss, D. and Holley,F. (1982) A Causefor NationalPause: Title Schoolwide rojects.Austin:Office f Researchand Evaluation.Erez, A., Bloom, M. C. and Wells,M. T. (1996) Using randomrather han fixed ffects odels n meta-analysis:implications or ituational pecificitynd validity eneralisation. ersnlPsychol., 6, 277-306.Glass, G. V. and Smith,M. L. (1979) Meta-analysis f research n class size and achievement. duc. Evaln PolyAnal., 1, 2-16.Goldstein,H. (1995) Multilevel tatisticalModels. London: Arnold.Goldstein,H. and Blatchford, . (1998) Class size and educational chievement: review fmethodologywithparticular eferenceo study esign.Br. Educ. Res. J., 24, 255-268.Hardy,R. J. nd Thompson, . G. (1996)A likelihood pproach ometa-analysis ith andom ffects.tatist.AMed.,15, 619-629.Hedges, L. and Olkin, . (1985) StatisticalMethods or Meta-analysis. rlando:AcademicPress.Hedges,L. and Vevea,J. 1996) Estimatingffectize under ublication ias:small amplepropertiesndrobustnessofa random ffectselectionmodel.J. Educ. Behav.Statist., 1, 299-332.Mazareas, J. (1981) Effects f class size on theachievementf first radepupils. DoctoralDissertation. ostonUniversity, oston.McGiverin, .,Gilman,D. andTillitski, . (1989)A meta-analysisf therelation etween lass size andachievement.Elem. SchoolJ., 89,47-56.Pfeffermann,., Skinner, . J.,Holmes, D., Goldstein,H. and Rasbash,J. 1997) Weightingorunequalselectionprobabilitiesn multilevelmodels. J. R. Statist. oc. B, 60, 23-40.Raudenbush, . and Bryk,A. S. (1985) EmpiricalBayes meta-analysis. . Educ. Statist., 0,75-98.San JuanUnified chool District1991) Class size reduction valuation:freshmannglish, pring1991.ResearchReport. an JuanUnified chool District, an Juan.Shapson,S. M., Wright, . N., Eason,G. andFitzgerald, . 1980)An experimentaltudy f the ffectsf class size.Am. Educ. Res. J., 17, 141-152.Slavin,R. (1986) Best-evidenceynthesis:n alternative o meta-analyticnd traditional eviews. duc. Res., 15,5-11.

    (1990) Class size and student chievement:s smaller etter? ontemp. duc., 62, 6-12.Thompson, . G. (1994) Whysourcesof heterogeneitynmeta-analysishouldbe investigated.r. Med. J.,309,1351-1355.

  • 8/3/2019 Meta-Analysis Using Multilevel Models With an Application to the Study of Class Size Effects

    15/15

    412 H. Goldstein,M. Yang, R. Omar, R. Turner nd S. ThompsonTurner,R. M., Rumana, R. Z., Yang, M., Goldstein,H. and Thompson, . G. (1999) Multilevelmodels formetaanalysis f clinical rialswithbinary utcomes. tatist.Med., to be published.Verducci, . (1969) Effects f class size on the earning f a motor kill.Res. Q., 40, 391-395.Wagner, . D. (1981) The effects f reduced lass size upon theacquisition f reading kills ngrade two. DoctoralDissertation. niversityf Toledo, Toledo.Wilsberg,M. and Castiglione, . V. (1968) The Reduction fPupil-Teacher atios n Grades and2 and the rovisionofAdditionalMaterials. Program oStrengthenarly Childhood ducationnPoverty chools,New York,NY.New York: New York City Board of Education.Word,E. R., Johnston, .,Bain,H. P., Fulton,B. D., Zaharias, J. B., Achilles, . M., Lintz,M. N., Folger,J. andBreda, C. (1990) The state of Tennessee's tudent/teacherchievement atio STAR) project. TechnicalReport1985-90.Nashville,Tennessee tate University.