measurement options for development of sustainable

Post on 24-Nov-2021

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

MeasurementOptionsforDevelopmentofSustainableDevelopmentGoalIndicator4.2.1

Memo,GlobalAlliancetoMonitorLearning,TaskforceonTarget4.2

PreparedbyHirokazuYoshikawa,AbbieRaikes,andAliceWuermli1

June2017DRAFT

InthismemowedescribeoptionsfordevelopingameasurementstrategyatthegloballevelforTarget4.2,andspecificallyIndicator4.2.1(“Proportionofchildrenunder5yearsofagewhoaredevelopmentallyontrackinhealth,learningandpsychosocialwell-being,bysex”).Goalsforsuchastrategyincludedevelopmentofameasurethatisappropriateforcross-countrycomparisonoflevelsofdevelopmentineachofthesemajordomains(health,learninganditssubdomains,andpsychosocialwell-being);feasibletouseinnationalmonitoringandevaluation,intermsofcostandhumancapitalresources(e.g.,trainingintensityandeaseinachievingreliability);verticallyequitableinitsunits2acrossthespanofbirththrough60months;andpsychometricallysoundinreliabilityandvalidityfromstandpointsofbothclassicaltesttheoryandotherapproachessuchasIRT.

Ourmemohasthreeparts:1)Threeoptionsforaglobalmeasurementstrategy;2)weighingoptionsattheintersectionofvalidityanduse;and3)optionsfornextsteps.

A. ThreeOptionsforAGlobalMeasurementStrategy.

Option1.AnexistingmeasurecouldbechosenwithoutadaptationasasingleglobalmeasureforIndicator4.2.1.

TheSDGswheneverpossibleaimtoachieveglobalindicatorsthatarecomparableacrosscountries.IncontrasttotheMDGs,thecountriesinquestionincludeallUNmembernations,andthuscutacrosshigh-,middle-andlow-incomecountries.Thechoicetoselectasingleglobalindicatorfor4.2.1isthusquiteformidable;andsettingagoalby2030ofasingleassessment,drawnfrompreviouslearningsacrossothermeasuresthathavebeenanalyzedforvalidityacrossmultiplecountries,isimportanttoweightcarefully.

Whatdomainsofchilddevelopmentshouldsuchameasureassess?ThelanguageforIndicator4.2.1reflectsaglobalconsensusinthefieldofearlychildhooddevelopmentregardingthe

1YoshikawaandWuermli,NYUGlobalTIESforChildrenCenter;Raikes,UniversityofNebraskaMedicalCenter.WethankEliseLegault,BaelaRazaJamil,selectmembersoftheTaskforceonTarget4.2.ofGAML,andIvelinaBorisovaforcommentsonpreviousdrafts.2I.e.,foraparticularconstruct,aunitatonepointofthescale’sdistributioniscomparabletoaunitatanotherpointofthescale’sdistribution.

GAML4/REF/14

2

multi-domainnatureofdevelopmentinthefirstyearsoflife.Domainsofphysical,cognitive,language,numeracy,andsocio-emotionaldevelopmentaretypicallyinter-related,yetdistinct,withintheagerangecovered(birththrough60months).Althoughgreatvariabilityoccursinthenatureofbehaviorsandskillsintheseoveralldomainsofdevelopment,bothwithinandacrosscountries,theconsensusconcerningthemeaningfulnessofthesedomainsincontextsofnationalECDpolicyandplanninghasbeenshownacrossmultipleregionsandnations(e.g.,Kagan&Britto,2005).

Theinclusionofmultipledomainsincludingphysical,cognitive,learning(e.g.,languageandnumeracy)andsocio-emotionaldomainsrepresentsarelativelystrongconsensusinmeasuresthathaverecentlybeenassessedacrossmultiplenations(Raikes&Anderson,2017).Theseincludeinstrumentsbasedonadult/caregiverreport,suchastheEDI(Janus&Offord,2007),theCREDI(McCoyetal.,2017)andtheUNICEFEarlyChildhoodDevelopmentIndex(Bornsteinetal.,2012;McCoyetal.,2016);aswellasinstrumentsthatdirectlyassesschildren,suchasthePRIDI(Verdisco,Cueto,&Thompson,2016),IDELA(Wolfetal.,2017),EAP-ECDS(Raoetal.,2014),theMELQOMODELmeasure(UNESCO,2017),andothers.

Despitethisrangeofrecenteffortstomeasure,incoordinatedfashion,multipledomainsofearlychildhooddevelopment,currentlynoconsensusmeasureexistsforIndicator4.2.1thatismeasuredacrossalargenumberofcountries(across,e.g.,low-,middle-andhigh-incomecountries)andmeetsothercriteriaforaTierIindicatoroftheUNStatisticalCommission(2016,2017).Thus,althoughOption1wouldbeidealifallconditionsweremetforfeasibility,relevanceandvalidityacrosscountries,inthecurrentcontextoftheSDGindicators,thereisnoalternativethatmeetsthesecriteria.Asdiscussedbelow,Option3wouldworktowardscreatingsuchasinglecriterionmeasureinthefuture.

Option2.Useanexistingcommonsetofitemsoridentifyasetofanchoritemstointegrateintonationalandregionalassessments.Atpresent,therearearangeofmeasuresthathavebeendevelopedandtestedwithincountriesandregions.Anoverviewofthesemeasuresappearsinthefirstbackgroundpaper(Anderson&Raikes,2017).Manycountrieshavealsoexpressedthedesiretobuild(eitherbyadaptingorcreatingnew)nationally-specificmeasurestopromoteongoingmonitoringofchilddevelopmentinamannerthatisalignedwithnationalstandardsandculturalexpectations.Belowwepresenttwoideasonhowcommonitemsetsoranchoritemscouldbeusedinglobalmeasurement.Common Outcome Sets. Invariousfields,theuseofCommonOutcomeSets(COS’s)hasbeenimplementedtoestablishcommonsetsofmeasuresoritemsacrossasetofevaluationorotherresearchstudies(e.g.,Gershonetal.,2013;Schmittetal.,2015;Williamsonetal.,2012).Acrosstheseinitiatives,atypicalmulti-phaseprocessincludesthefollowing.First,aconsensusgroupofexpertsandpractitioners/policyleadersisbroughttogethertoestablishagreementontheconstructsthatwillconstituteameasurementdomain.Second,criteriaformeasuresthatmaybeconsideredascandidatestocontributeitemsorentirescales/

GAML4/REF/14

3

assessmentsareagreedupon.Third,aninventoryofmeasuresmeetingthesecriteriaisassembled,andcommonitemsortasksareidentified.Fourth,dependingontheinitiative,asingle“consensus”measuremaybedeveloped(someaspectsofwhichmaybenewlydeveloped,withothersdrawnfromexistingmeasures).Fifth,phasesofpilottesting,psychometricanalyses,andrevisionmayoccuriterativelyuntilafinalconsensusmeasureisagreedupon.Finally,ameasureanditsguidelinesforadministrationmaybedisseminatedtoawiderangeofpotentialusers,withcontinuedinputandrefinementasthemeasureentersgeneraluse.IntheareaofIndicator4.2.1,arecentinitiativetodevelopacommonsetofitemswascarriedoutaspartoftheMeasuringEarlyLearningandQualityOutcomesproject(MELQO;UNESCO,2017).MELQObeganwiththeintentofclarifyingifonemeasurewouldbesufficientformeasurementinallcountries(Option1),butmovedquicklyinthedirectionofOption2.Option2,findingacommonitemset,wasdesirablefortwomainreasons:1)becauseitwouldallowcountriestobuildonexistingmeasuresthatwerealreadydevelopedandvalidatedineachregion;and2)itwouldallowagreaterdealofflexibilitytoaddmoreculturally-responsive,nationally-specificitemsthatarenotpossibletoincludewhenrelyingononlyonemeasure.UltimatelythecommonitemsetwasdistilledintoasinglenewmeasurenamedtheMODEL,coveringdevelopmentaldomainsofsocial,cognitive,languageandliteracy,numeracy,andexecutivefunction.Cross-countryanalysesareunderwayonthismeasure(Raikesetal.2017).Measures harmonization.Multipleexistingmeasuresmaybeharmonizedusingsomeformofstandardizationtoallowforscoringonacommonscale.Twoapproachesareoutlinedbelow(butothermayberelevant):1)crosswalksamples;and2)identificationofanchoritems.Theprocessofharmonizingacrossmeasuresistypicallydonethroughidentifyingcommonitemsthatcanhelplinkthedifferentassessments(“anchoritems”)(Chanetal.,2015),and/orbyadministeringmultiplemeasuresonthesamesampletosynchronizemeasurementsandestablishthebasisforcomparingchildren’slearninganddevelopmentonthesamescale,butwithdatacollectedthroughdifferentmeasures(“crosswalksamples”).Toinvestigatethefeasibilityofthisoption,eithermultipledatasetswithasetofcommonitemsareneeded(“anchoritems”)ormultiplemeasuresmustbeadministeredtothesamechildren(“crosswalksample”),sothatcalibrationacrossdifferentmeasurescantakeplace.Forexample,anchoritemscouldbecreatedfromthosemeasuresusedinmultiplecountrieswherecertainitemshaveshownevidenceofcross-countryinvariance.Itwouldthenbenecessarytoensurethatthisscaleworksinsimilarwaysacrosscountries,arelatedbutdistinctstepinbuildinginternationalcomparability.

Crosswalksamples.Crosswalksamples(singlesamplesthatincorporateassessmentofmultipleinstruments)areusefulforfacilitatingdecisionsaboutwhattoincludeandexcludefromparticularinstrumentsindevelopingasinglesetofcommonitemsorreducedconsensusassessment.Thisisbecauseinasinglesample,multiplealternativemeasuresareassessed,allowingfordirectcalculationsofcorrelationsamongmeasures,differencesinpredictiveorotherformsofvaliditythatdonotconfoundsamplewithmeasure.Thisapproachhasbeen

GAML4/REF/14

4

usedrecentlyinastudythataimedtoharmonizemultiplemeasuresofdepressionandsubjectivehealthamongolderadults(Gatzetal.,2015).

Anchoritems.Achallengeishowtoidentifyanchoritemsintheabsenceofany

universalmeasureorindicator.Forexample,arecenteffortintheUnitedStatestoharmonizestatelevelstandardizedassessmentsreliedonananchormeasurethatisadministeredacrossthecountry,namelytheNationalAssessmentofEducationalProgress(NAEP).Basedonthedistributionofdistrictsonthatnationalassessment,astandardizationprocedurewasusedtolinkthestate-levelassessments(Reardon,Kalogrides,&Ho,2016).IntheabsenceofacriterionoraudittestsuchastheNAEPintheU.S.example,amixofconceptualandtraditionalempiricalcriteriafrombothclassicaltesttheoryandotherapproachessuchasitem-responseanalysismaybeutilized.However,variationintaskrequirements,itemlanguage,assessorsacrossdatasets,orderofitems,andresponsecategoriesallcreatedauntingdifferencesinsourcesofmeasurementerrorevenwhenconsideredwithindomain(e.g.,languageornumeracy).Option3.Createanewuniversal“criterion”scaleofchilddevelopmentagainstwhichmanyotherpossiblemeasurescouldbeplacedThedevelopmentofanewuniversalcriterionscalecouldproceedfollowingestablishedproceduressomewhatsimilartotheonesthatledtotheMELQO,IDELAorregionalmeasures,butwithlearningssynthesizedfromallofthem.Itcouldstartwiththetwoleadinginitiativesinthisfield,whicharetheUNICEFECDI(longstanding,for3-6yearolds)andtheWHOconsortiuminstrumentfor0-3yearolds(morerecentlydeveloped).Thefirststepofpoolingitemsfrommeasuresusedinmultiplecountries,categorizingthembyoutcomedomain,andcompilinginformationonvaliditystudies,samples,andcountries,wasalsorecentlycompletedinthefirstphaseofworkoftheMELQOproject(UNESCO,2017).Acrosstheseinitiativesandothers,ECDmeasurementanalyseshaveadvancedtocross-countryinvarianceanalysesonsomeexistingmeasures.Thussomeinformationisemergingbothonthepsychometricstructureofmulti-domainassessmentsofchilddevelopmentwithincountries,aswellaswhetherthesemeasuresfunctionwellacrosscountriesinordertopermitcomparisons.Suchinformationcouldbeusedintheprocessofcreatinganew“criterion”scaleacrossthefullrangefrombirthtoage5or6thatwouldadvancethefieldtowardstestingasinglemeasure.Twoexamplescanillustratetheeffortstocreateasinglecriterionscale.WHOisleadingaconsortiumtocreateasinglecriterionscalefor0-3yearolds.UNICEF,withitsinfrastructureforconductingmultiplenationallyrepresentativesamplesinasustainedmanneracrossyears,couldadaptandextenditsECDImeasure(thusfarfor3-6yearolds)withinputfromtheinitiativesofthepastdecadethathaveledthefieldtowardscross-countrycomparablemeasures.SuchaneffortcouldintegratetheWHOscaleonsomeconstructsthatmaybesuitableformeasurementacrossthe0-3and3-6yearoldageranges.Theadvantagesofthisprocessincludetheleveragingofresourcesforlargesamplesatthecountrylevelformanycountriesthatmaynototherwisecurrentlyhavetheseresources;experienceinarangeofregions;andthelargenumberofLMICswithinwhichtheMICSiscurrentlyfielded.

GAML4/REF/14

5

WenotethattheprocessofcreatingasinglecriterionmeasurecouldalsobenefitfromtheexperienceofthecreationandrevisionofthePISAcross-nationalassessmentsorotherssuchasTIMSS,PIRLS,etc.(andcurrentlythesupplementationofthePISAwiththePISAforDevelopment,aimedforLMICuse).Thisisparticularlyrelevanttotheextensiontorichcountriesrequiredinanydevelopmentofanewcriterionmeasure.InthePISAdevelopmentprocess,forexample,initialwide-rangingexpertconsensusonameasurementframeworkatthelevelofconstructsoccurred,followedbyconveningpanelsofexpertstodevelopitemsinspecificdomains;phasesofpre-pilotinginmultiplecountrieswithrelativelysmallsamplestoascertainmeaningofitemsandvariationinresponsetotasks,assessors,andadministrationformats;moresystematicpilotingofitemsacrosscountries;itemrevisionandselectionforlarge-scalepiloting;andfinallynationallyrepresentativeadministrationacrosscountries(OECD,2000a,2000b).ManyofthesestepshavebeencarriedoutintheECDworkoftheMELQOinitiativeandothers,butnotall.However,therearechallengesfacingsuchaneffortaswell,whichshouldbetakenintoaccount.Theyinclude:

1) Theneedtocontinuetosupportcountry-leveladaptationprocesses.Forcountry-leveluse,thestakeholderprocesstobuildconsensustowardsnationalmeasurementofearlychildhoodassessmentformonitoringpurposesandtoinformpoliciesinareassuchasqualityimprovementandteachingandlearningcanandshouldbecomprehensive.Asinglecriterionmeasurecanbeusedbutcouldalsobesupplemented,forexample,inparticularcountrieswithculturallyspecificconstructsofchilddevelopmentthatarerelevanttogoalsforchildren’slearning,behavioranddevelopment.Somecountriesmaychoosetousetheirownmeasures,andthisissupportedwithintheSDGprocess.

2) Thedefinitionof“ontrack”and“offtrack.”Nocurrentwidelyusedearlychildhooddevelopmentmeasureamongthosementionedinthisdocumenthasestablishedcutoffsforonvsofftrack.Thisisinpartbecausethesearenotdesignedasscreeningmeasures.However,thedevelopmentofnationalnormscanbedonewithoutexpectationofacross-country,uniformdefinitionofonandofftrack.TechnicalworktoestablishaconsensusonthisprocesswithinandacrosscountriesisnecessaryinthefieldofECD.

3) Theunprecedentedrangeofcountrycontexts.NoneofthecurrentinitiativesorexistingmeasuresinthefieldofECDhavebeenwidelyadministeredoranalyzedacrossbothLMICandrich-countrycontexts.ShouldasinglemeasurebedevelopedfromthebasisoftheUNICEFECDIandotherexistingmeasureswithcross-countrydata,itwillbevitaltoconsidercross-countrymeasuresthathavebeenfieldedinrichcountries,includingcurrentinitiativesoftheOECD,theEU,andotherentities.SomeoftheECDmeasuresrecentlyfieldedinmultipleLMICsarestartingtobeappliedinrichcountries;theselearningsshouldalsobeintegrated.

4) Needtoincludebothcaregiver/adultreportanddirectchildassessment.AconsensusisbuildingintheECDfieldthatmeasurementofsomedomainsofdevelopmentbenefit

GAML4/REF/14

6

fromtheintegrationofinformationfromadultswhospendsubstantialtimewithchildren(caregivers/parents;teachers)anddirectchildassessment.Forexample,adultsfamiliarwithchildren’sbehaviorsinhomeand/orcaresettingsmaybeinabetterpositiontoobservelow-frequencybehaviorssuchasaggressionthancanbeassessedinanassessoradministeredtask.Conversely,directassessmentsmaybemoreappropriatewhencertainskillsarenotonesthatadultsinchildren’slivesareusedtonoticing,butmayneverthelessbepredictiveoflateroutcomes(e.g.,aspectsofexecutivefunction),orwhencomplexskillsbenefitfromstandardstimuli(e.g.,comprehensionofasentence).Itisundeniablethatdirectchildassessmentismorecostly,withtrainingtoreliabilitymoredifficultatscalethanwithmorestraightforwardsurvey-basedmeasures.However,optionssuchasrandomsubsamples,withinalargernationallyrepresentativesample,fordirectchildassessmentmodulesshouldbeconsidered.Thisapproachmayreducetheoverallcostsofaddingadirectchildassessmentportiontoanadult-reportedmeasureinanationallyrepresentativesample.

5) Ameasurethatverticallyequatessomedomainsofdevelopmentacrosswideragerangesthan0-3and3-6.Theverticalequatingrequiredtoachievemeasuresofsomeareasofdevelopmentthataremeaningfultomeasureacrossthefullagespanisverychallenging,giventherapidityofdevelopmentintheseyearsandthequalitativechangesinskillsthatoccur,notjustquantitative.Yetexistingmeasurescollectedacrosscountrieshaveforthemostpartbeenrestrictedtothe0-3vsthe3-6yearoldagerange.TheintegrationoftheWHOconsortiumon0-3measurementandcurrenteffortsinthe3-6yearoldsagerangewouldbecriticalforthiseffort.Itislikelythatonlysomeconstructsofdevelopmentaresuitedtointegrationacross0-3and3-6.

6) AlignmentwithlaterlearningtargetsandindicatorsinSDG4.Thecontinuumoflearninganddevelopmentstretchesfrombirthtoadulthood.Alignmentof4.2.1withindicator4.1.1,inparticular(andespeciallythegrade2or3indicator),isimportanttoenablenationstotrackhowlearningunfoldsinthefirst8yearsoflife.

7) AlignmentwithotherSDGtargetsandindicators.ThealignmentofSDG4.2.1withothergoals,targetsandindicatorsintheareasofhealth,mentalhealth,nutrition,andchildprotectionissignaledinthewordingof4.2.1,which(unliketheprimaryschoolingindicators)integrateshealthandpsychosocialwell-being.Suchalignmentcanmovebeyondhealthandpsychosocialwell-beingtoconsiderrelationshipswithotherSDGindicatorsoutsideofGoal4.

8) Integrationofmembernationinputintodevelopmentofacriterionmeasure.TheinputofUNmembernationsintothedevelopmentoftheSDGs,includingTarget4.2.,wasunprecedentedinhistory.Continuedinputintothedevelopmentofacriterionmeasurefor4.2.1isvitalforultimateuseofthemeasureatthecountrylevelandtheglobalprocessestotrackSDG4.

B. Assessingouroptions:Whichmeasurementstrategymaximizesbothvalidityand

use?

GAML4/REF/14

7

Asingleorganizingquestiontohelpinassessingtheseoptionsmightbe:Whatapproachmaximizesvalidity,feasibilityandproductivenationalandcross-nationalusetoinformpolicyandpractice?Toanswersuchaquestion,agreementonthemeaningandevidencetosupportvaliditymustbeastartingpoint.Whatdoes“validity”meanincross-nationalmeasurementofearlychildhooddevelopment?Acentralgoalofassessmentinthefieldsofchilddevelopmentandeducationistoachievemeasurementwithevidenceofvalidity.Currentnotionsofvalidityconsideritaunitaryconstructsupportedbyevidenceinthecontextofuse.Withthenewdemandforpolicy-relevantdata,thetypesofevidencethatshouldbeweighedinassessingvalidityincludebutgobeyondolderconceptualizationsofvalidity(forexample,thetrioofcontent,criterion-related,andconstructvalidityandtheirsubtypes;Cronbach&Meehl,1955).“Validityreferstothedegreetowhichevidenceandtheorysupporttheinterpretationsoftestscoresentailedbyproposedusesoftests”(AERA,APA,&NCME,1999).Notethatthisoverallsingleconceptionofvaliditysupercedesthesubtypesofcontent,criterion-relatedandconstructvalidity.Fivekindsofevidencehavebeenputforwardtosupportvalidityaccordingtothisdefinition(Goodwin&Leech,2003).Theseincludemanyofthetraditionalsubtypesofcontent,criterion-relatedandconstructvalidity.First,evidencebasedontestcontentrequiresthatexpertconsensusbeachievedonthematchbetweenitemandtaskcontentandtheconstructthatisbeingmeasured,andwhetherthecontentreflectsbiasordifferentialmatchbetweencontentandconstructforparticulargroups(e.g.,asdefinedbygender,language,culture,etc.).Inordertomaximizethiskindofevidence,the“groupconsensus”approachintestdevelopmentandrefinement,aswellastestingexplicitlyforbiasthroughqualitativeresearch,cognitivetesting,understandingofvariationinexperienceofthetestingcontext,analysestotestformeasurementinvarianceacrossdiversepopulationsandothermethodscanbeemployed.Second,evidencebasedonresponsesoftesttakersexaminespotentialunintendedresponsessuchasthosebasedonsocialdesirability,unfamiliaritywiththeadministrationapproachortestingcontext.ThisformofevidenceinthecaseofmeasuresofIndicator4.2.1includes,forexample,analyzingreasonsfornon-responsethatgobeyondlackofunderlyingability,todiscomfort,anxiety,orresponsetotheassessorandsetting.Third,evidenceoninternalstructuretapswhetherthecomponentsofameasureadequatelyreflectthesubdomainsofaconstruct.Inthecaseofthemulti-domainmeasuresofearlychildhooddevelopment,thisincludeswhethertheinstrumentadequatelyreflectsphysical,learning,andpsychosocialdomains.Thelearningdomainisoftenconsideredtoincludepotentialsubdomainssuchaslanguage/earlyliteracy;numeracy,spatialandquantitativeskills;andexecutivefunctionorapproachestolearning.Thiskindofevidenceismostoftenanalyzedthroughexploratoryandconfirmatoryfactoranalyses.

GAML4/REF/14

8

Fourth,evidencebasedonrelationstoothermeasuresincludestraditionalformsofcriterion-relatedandconstructvalidity,suchasconcurrent,predictive,convergent,anddiscriminantvalidity.Suchevidencealsoincludestheimportantcriterionofsensitivitytointervention.ThisisparticularlyimportantintheSDG4.2context,astheoveralltargetlinksearlychildhooddevelopmenttothequalityofpoliciesandprogramsthatsupportit.Finally,evidencebasedontheconsequencesoftestingfocusescentrallyonusesoftheassessment.Theabilityofameasuretoinformpracticeandpolicyinvolvesnotonlyfeasibilityofadministrationatlargescalewithregularperiodicity,butthelinkstopracticeandpolicydecision-making.Inthisregard,theMELQOinitiative,buildingonpriorefforts,proposedthatbothmeasurementofqualityofearlylearningenvironmentsandmeasurementofearlychildhooddevelopmentoutcomeswasnecessarytomostpowerfullyinformpolicyandpractice(UNESCO,2017).

C. NextStepsandQuestionstoGuideDiscussionTheSDGsprovideauniqueopportunityforbuildingaglobalECDmeasurementstrategythatsignificantlyenhancesthereliability,feasibilityandcomparabilityofexistingECDdata.Movingforwardonanyofthestrategiesoutlinedabovewillrequireagreaterdegreeofsystematicdatacollection,coordinationamongmeasuresdevelopersandexperts,andinputfromstakeholdersthanhastakenplaceinthepast.Belowwehaveoutlinedoptionsforpursuingconsensusonthesestrategies:

- NextStep1.Buildingonpriorconsensuswork,beginbyagreeingonageneralconceptualframework(e.g.,ofearlychildhooddevelopmentdomains)toguidemeasurement.Thiswillserveasthegroundworkforthenextstepsofclarifyingwhatcouldbemeasuredacrosscountries,andwhereexistingdatamaybeavailabletohelpinformdecisionsonconstructsanditems.Severalsuchconsensusmeetingshaveoccurredrecently;however,therearestillareasoflackofconsensussuchassomeoftheeightareasofchallengeslistedabove.

- NextStep2.AddressquestionsofvalidityinlightofthestrongpolicyemphasisofSDG-relateddata,theuniqueaspectsofearlychildhooddevelopmentrelativetolaterphasesoflearning,andtheneedtoensureequityandculturalrelevanceacrossarangeofcountries.Aconveningontechnicalstandardsandguidelinesforuse,synthesisandharmonizationofdatafromexistingmeasures;standardsforcross-countrycomparabilityoftheoptionofasinglecriterionmeasure;andthecriteriafordimensionsofvalidityinsuchaneffortiscritical.Engagingawiderangeofpsychometricians,researchersandotherexpertstakeholderswhocanhelptoassessthefeasibilityofeachapproach,includingthecostandcoordinationrequirementsforpursuingeachoftheoptions,isrequired.Noconveningtodate,forexample,hasbroughttogetherthesmallgroupofpsychometricexpertswhohaveworkedoncross-countryanalysesofexistingECDmeasures.Buildingatechnicalconsensusforthenext

GAML4/REF/14

9

phaseofworkwouldbeanimportantstepinthenextadvancestowardsglobalmeasurementofSDGIndicator4.2.1.

QuestionstoGuideDiscussion:

1) AmongOptions1,2and3,whichseemfeasibleintheshortrun(next12-18months)?Inthemediumrun(1.5-3years)?Inthelongerrun(3-5years)?

2) WhatisthebestplanformakingprogressonthethreeNextStepsnotedimmediatelyaboveinthissection?AreanyimportantNextStepsmissing?

GAML4/REF/14

10

REFERENCES

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. American Educational Research Association.

Anderson, K., & Raikes, A. (2017). Key measurement questions for Indicator 4.2.1 (Discussion paper for GAML Taskforce 4.2).

Bornstein, M. H., Britto, P. R., Nonoyama-Tarumi, Y., Ota, Y., Petrovic, O., & Putnick, D. L. (2012). Child development in developing countries: introduction and methods. Child Development, 83(1), 16-31.

Chan,K.S.,Gross,A.L.,Pezzin,L.E.,Brandt,J.,&Kasper,J.D.(2015).HarmonizingMeasuresofCognitivePerformanceAcrossInternationalSurveysofAgingUsingItemResponseTheory.Journalofagingandhealth,27(8),1392-1414.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281.

Gatz,M.,Reynolds,C.A.,Finkel,D.,Hahn,C.J.,Zhou,Y.,&Zavala,C.(2015).Dataharmonizationinagingresearch:Notsofast.Experimentalagingresearch,41(5),475-495.Gershon,R.C.,Wagster,M.V.,Hendrie,H.C.,Fox,N.A.,Cook,K.F.,&Nowinski,C.J.(2013).NIHtoolboxforassessmentofneurologicalandbehavioralfunction.Neurology,80(11Supplement3),S2-S6.Goodwin,L.D.,&Leech,N.L.(2003).Themeaningofvalidityinthenewstandardsforeducationalandpsychologicaltesting.MeasurementandEvaluationinCounselingandDevelopment,36(3),181-192.Kagan,S.L.,&Britto,P.R.(2005).Goingglobalwithindicatorsofchilddevelopment.NewYork:UNICEF.Jacobusse,G.,VanBuuren,S.,&Verkerk,P.H.(2006).Anintervalscalefordevelopmentofchildrenaged0–2years.Statisticsinmedicine,25(13),2272-2283.Janus,M.,&Offord,D.R.(2007).DevelopmentandpsychometricpropertiesoftheEarlyDevelopmentInstrument(EDI):Ameasureofchildren'sschoolreadiness.CanadianJournalofBehaviouralScience,39(1),1-22.McCoy,D.C.,Peet,E.D.,Ezzati,M.,Danaei,G.,Black,M.M.,Sudfeld,C.R.,...&Fink,G.(2016).Earlychildhooddevelopmentalstatusinlow-andmiddle-incomecountries:national,regional,andglobalprevalenceestimatesusingpredictivemodeling.PLoSMed,13(6),e1002034.

GAML4/REF/14

11

McCoy,D.C.,Sudfeld,C.R.,Bellinger,D.C.,Muhihi,A.,Ashery,G.,Weary,T.E.,...&Fink,G.(2017).Developmentandvalidationofanearlychildhooddevelopmentscaleforuseinlow-resourcedsettings.Populationhealthmetrics,15(1),3-.OECD(2000a).Measuringstudentknowledgeandskills:ThePISA2000assessmentofreading,mathematicalandscientificliteracy.Paris:Author.OECD(2000b).PISA2000:TechnicalReport.Paris:Author.Raikes,A.,&Anderson,K.L.(2017).KeymeasurementquestionsforSDG4.2.1(discussionpaper).Montreal:UNESCOInstituteforStatistics,GlobalAlliancetoMonitorLearning,Target4.2TaskForce.Rao,N.,Sun,J.,Ng,M.,Becher,Y.,Lee,D.,Ip,P.,&Bacon-Shone,J.(2014).Validation,FinalizationandAdoptionoftheEastAsia-PacificEarlyChildDevelopmentScales(EAP-ECDS).UNICEF,EastandPacificRegionalOffice.Ravens-Sieberer,U.,Erhart,M.,Rajmil,L.,Herdman,M.,Auquier,P.,Bruil,J.,...&Mazur,J.(2010).Reliability,constructandcriterionvalidityoftheKIDSCREEN-10score:ashortmeasureforchildrenandadolescents’well-beingandhealth-relatedqualityoflife.QualityofLifeResearch,19(10),1487-1500.Reardon,S.F.,Kalogrides,D.,&Ho,A.D.(2016).LinkingU.S.schooldistricttestcoredistributionstoacommonscale,2009-2013.PaloAlto,CA:StanfordCenterforEducationPolicyAnalysis.Schmitt,J.,Apfelbacher,C.,Spuls,P.I.,Thomas,K.S.,Simpson,E.L.,Furue,M.,...&Williams,H.C.(2015).TheHarmonizingOutcomeMeasuresforEczema(HOME)roadmap:amethodologicalframeworktodevelopcoresetsofoutcomemeasurementsindermatology.JournalofInvestigativeDermatology,135(1),24-30.UNStatisticalCommission(2016).ProvisionalproposedtiersforglobalSDGindicators.NewYork:Author.UNStatisticalCommission(2017).RevisedlistofglobalSustainableDevelopmentGoalindicators.NewYork:Author.UNESCO(2017).Overview:MeasuringEarlyLearningandQualityOutcomes(MELQO).Paris:Author.Verdisco,A.,Cueto,S.,&Thompson,J.(2016).EarlyChildhoodDevelopment:Wealth,theNurturingEnvironmentandInequalityFirstResultsfromthePRIDIDatabase.Inter-AmericanDevelopmentBank.

GAML4/REF/14

12

Williamson,P.R.,Altman,D.G.,Blazeby,J.M.,Clarke,M.,Devane,D.,Gargon,E.,&Tugwell,P.(2012).Developingcoreoutcomesetsforclinicaltrials:issuestoconsider.Trials,13(1),132.Wolf,S.,Halpin,P.,Yoshikawa,H.,Pisani,L.,Dowd,A.J.,&Borisova,I.(2017).AssessingtheconstructvalidityofSavetheChildren’sInternationalDevelopmentandEarlyLearningAssessment(IDELA).Manuscriptunderreview.

GAML4/REF/14

top related