validationofthenihtoolboxcognitivebattery in intellectual disability · ipad-based battery of brief...

13
ARTICLE OPEN ACCESS Validation of the NIH Toolbox Cognitive Battery in intellectual disability Rebecca H. Shields, MS, Aaron J. Kaat, PhD, Forrest J. McKenzie, BS, Andrea Drayton, BS, Stephanie M. Sansone, PhD, Jeanine Coleman, PhD, Claire Michalak, BS, Karen Riley, PhD, Elizabeth Berry-Kravis, MD, PhD, Richard C. Gershon, PhD, Keith F. Widaman, PhD, and David Hessl, PhD Neurology ® 2020;94:e1229-e1240. doi:10.1212/WNL.0000000000009131 Correspondence Dr. Hessl [email protected] Abstract Objective To advance the science of cognitive outcome measurement for individuals with intellectual disability (ID), we established administration guidelines and evaluated the psychometric properties of the NIH-Toolbox Cognitive Battery (NIHTB-CB) for use in clinical research. Methods We assessed feasibility, test-retest reliability, and convergent validity of the NIHTB-CB (measuring executive function, processing speed, memory, and language) by assessing 242 individuals with fragile X syndrome (FXS), Down syndrome (DS), and other ID, ages 6 through 25 years, with retesting completed after 1 month. To facilitate accessibility and measurement accuracy, we developed accommodations and standard assessment guidelines, documented in an e-manual. Finally, we assessed the sensitivity of the battery to expected syndrome-specic cognitive phenotypes. Results Above a mental age of 5.0 years, all tests had excellent feasibility. More varied feasibility across tests was seen between mental ages of 3 and 4 years. Reliability and convergent validity ranged from moderate to strong. Each test and the Crystallized and Fluid Composite scores correlated moderately to strongly with IQ, and the Crystallized Composite had modest correlations with adaptive behavior. The NIHTB-CB showed known-groups validity by detecting expected executive function decits in FXS and a receptive language decit in DS. Conclusion The NIHTB-CB is a reliable and valid test battery for children and young adults with ID with a mental age of 5 years and above. Adaptations for very low-functioning or younger children with ID are needed for some subtests to expand the developmental range of the battery. Studies examining sensitivity to developmental and treatment changes are now warranted. RELATED ARTICLE Editorial Finding a common path to the assessment of persons with intellectual development disorders Page 507 From the University of California Davis Medical Center (R.H.S.); Human Development Graduate Group (R.H.S.), University of California Davis, Sacramento; Feinberg School of Medicine (A.J.K., R.C.G.), Northwestern University, Chicago, IL; MIND Institute and Department of Psychiatry and Behavioral Sciences (F.J.M., A.D., S.M.S., D.H.), University of California Davis Medical Center, Sacramento; Morgridge College of Education (J.C., K.R.), University of Denver, CO; Departments of Pediatrics (C.M., E.B.-K.) Neurological Sciences (E.B.-K.), and Biochemistry (E.B.-K.), Rush University Medical Center, Chicago, IL; and Graduate School of Education (K.F.W.), University of California, Riverside. Go to Neurology.org/N for full disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article. The Article Processing Charge was funded by authors. This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND), which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. Copyright © 2020 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology. e1229

Upload: others

Post on 19-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

ARTICLE OPEN ACCESS

Validation of the NIH Toolbox Cognitive Batteryin intellectual disabilityRebecca H Shields MS Aaron J Kaat PhD Forrest J McKenzie BS Andrea Drayton BS

Stephanie M Sansone PhD Jeanine Coleman PhD Claire Michalak BS Karen Riley PhD

Elizabeth Berry-Kravis MD PhD Richard C Gershon PhD Keith F Widaman PhD and David Hessl PhD

Neurologyreg 202094e1229-e1240 doi101212WNL0000000000009131

Correspondence

Dr Hessl

drhesslucdavisedu

AbstractObjectiveTo advance the science of cognitive outcome measurement for individuals with intellectualdisability (ID) we established administration guidelines and evaluated the psychometricproperties of the NIH-Toolbox Cognitive Battery (NIHTB-CB) for use in clinical research

MethodsWe assessed feasibility test-retest reliability and convergent validity of the NIHTB-CB(measuring executive function processing speed memory and language) by assessing 242individuals with fragile X syndrome (FXS) Down syndrome (DS) and other ID ages 6through 25 years with retesting completed after 1 month To facilitate accessibility andmeasurement accuracy we developed accommodations and standard assessment guidelinesdocumented in an e-manual Finally we assessed the sensitivity of the battery to expectedsyndrome-specific cognitive phenotypes

ResultsAbove a mental age of 50 years all tests had excellent feasibility More varied feasibility acrosstests was seen between mental ages of 3 and 4 years Reliability and convergent validity rangedfrom moderate to strong Each test and the Crystallized and Fluid Composite scores correlatedmoderately to strongly with IQ and the Crystallized Composite had modest correlations withadaptive behavior The NIHTB-CB showed known-groups validity by detecting expectedexecutive function deficits in FXS and a receptive language deficit in DS

ConclusionThe NIHTB-CB is a reliable and valid test battery for children and young adults with ID witha mental age of asymp5 years and above Adaptations for very low-functioning or younger childrenwith ID are needed for some subtests to expand the developmental range of the battery Studiesexamining sensitivity to developmental and treatment changes are now warranted

RELATED ARTICLE

EditorialFinding a common path tothe assessment of personswith intellectualdevelopment disorders

Page 507

From the University of California Davis Medical Center (RHS) Human Development Graduate Group (RHS) University of California Davis Sacramento Feinberg School ofMedicine (AJK RCG) Northwestern University Chicago IL MIND Institute and Department of Psychiatry and Behavioral Sciences (FJM AD SMS DH) University of CaliforniaDavis Medical Center Sacramento Morgridge College of Education (JC KR) University of Denver CO Departments of Pediatrics (CM EB-K) Neurological Sciences (EB-K) andBiochemistry (EB-K) Rush University Medical Center Chicago IL and Graduate School of Education (KFW) University of California Riverside

Go to NeurologyorgN for full disclosures Funding information and disclosures deemed relevant by the authors if any are provided at the end of the article

The Article Processing Charge was funded by authors

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 40 (CC BY-NC-ND) which permits downloadingand sharing the work provided it is properly cited The work cannot be changed in any way or used commercially without permission from the journal

Copyright copy 2020 The Author(s) Published by Wolters Kluwer Health Inc on behalf of the American Academy of Neurology e1229

Approximately 20 of the global population has an intellectualdisability (ID) an understudied condition with lifelong effectson academic vocational and personal functioning1 As theetiologies and mechanisms underlying specific forms of ID arediscovered targeted treatments and human clinical trials soonfollow in the translational process23 raising the potential formedical remediation of disability The preclinical developmentof promising targeted treatments for ID-associated disordershas not been followed by successes in human trials Un-fortunately testing accessibility issues pervasive floor effectsand lack of consensus on acceptable cognitive endpoints havebeen obstacles in the field Although other barriers such aslimitations of animal models are involved it is apparent that theIDs continue to lag behind other neurologic or psychiatricconditions on scalable psychometrically supported andbroadly accepted endpoints45

The NIH Toolbox Cognitive Battery6ndash8 (NIHTB-CB) aniPad-based battery of brief memory executive function (EF)processing speed and language tests was developed within theNIH Blueprint for Neuroscience Research The NIHTB-CBhas the potential to provide a highly standardized objectiveand scalable tool for use across laboratories and clinical trialsites As an extension of our pilot work9 the present studyreflects progress made over a 4-year period to empirically val-idate and refine the NIHTB-CB for ID (aim 1) includingstandardized administration guidelines required for this chal-lenging population with the goal of supporting its use as a set ofoutcome measures for clinical trials and other clinical researchThe second aim was to measure the sensitivity of the battery todetect known cognitive phenotypes in 2 ID-associated syn-dromes that are a focus of translational research Down syn-drome5 (DS) and fragile X syndrome10 (FXS) On the basis ofprior research we hypothesized EF deficits in FXS and DS andepisodic memory deficits in DS (relative to a heterogeneousother-ID group) a relative language strength in FXS and nogroup differences on visual processing speed

MethodsStandard protocol approvals registrationsand patient consentsInstitutional Review Board approval was obtained at each site(University of California Davis MIND Institute University ofDenver and Rush University Medical Center) before study

initiation Written consent was obtained from each guardian-participant pair according to Institutional Review Boardrequirements

ParticipantsBecause of the study aim to measure the sensitivity of thebattery to syndrome-specific cognitive phenotypes 3 groupswere recruited Two ID-associated syndromes were chosen DS(affecting asymp1 in 700)11 and FXS (affecting asymp1 in 7000 malesand 1 in 11000 females)12 Individuals with ID of other orunknown cause (OID) were also recruited to evaluate theNIHTB-CB within a more heterogeneous group and to serveas comparison to FXS and DS Eligible participants met thefollowing criteria chronologic age of 6 through 25 years full-scale IQ (FSIQ) lt80 on the Stanford-Binet 5th edition (SB-5)13 mental age of at least 30 years on the SB-5 (in concor-dance with the lowest chronologic age limit of the NIHTB-CB) adaptive behavior deficits as measured by the VinelandAdaptive Behavior Scales 3rd edition Comprehensive In-terview14 (VABS-3) speech of at least short phrases English asfirst language and stable medication and intervention regimenfor 6 weeks before enrollment Exclusion criteria were un-corrected vision impairment uncontrolled seizures motorimpairment affecting touchscreen use and a history of headtrauma brain infection or stroke

In all 288 participants consented to the study After comple-tion of the SB-5 45 participants were ineligible 16 with anFSIQ gt80 and 29 with a mental age lt30 years One participantdiscontinued due to behavioral noncompliance Across sites242 participants completed initial neuropsychological testingwith 228 completing retesting of the NIHTB-CB asymp1 monthlater to examine test reliability This retest duration was se-lected to evaluate reliability within a typical time interval used inclinical trials Participants included 91 with DS 75 with FXSand 76 with OID A subset of 21 participants with OID hada diagnosed ID-associated syndrome the represented syn-dromes were 16p112 deletion (1) 22q112 deletion (1)Bannayan Riley Ruvalcaba (1) cri-du-chat (1) fetal alcohol(5) Floating-Harbor (1) Kleefstra (1) mitochondrial disease(1) mosaic trisomy 8 (1) neurofibromatosis type 1 (2)Phelan-McDermid (1) Potocki-Lupski (4) and Williams (1)

ProtocolThe NIHTB-CB and all convergent validity measures (seebelow) were completed at visit 1 across 2 days After

GlossaryCAT = computerized adaptive testing DCCS = Dimensional Change Card Sort DEXT = Developmental Extension DS =Down syndrome EF = executive function Flanker = Flanker Inhibitory Control and Attention FM = ForwardMemory FXS =fragile X syndrome FSIQ = full-scale IQ ICC = intraclass correlation ID = intellectual disability LS = List Sorting WorkingMemory NEPSY-In = NEPSY Inhibition subtest NIHTB-CB = NIH Toolbox Cognitive Battery OID = ID of other orunknown cause OR = Oral Reading Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PPVT-4 = Peabody Picture Vocabulary Test 4th edition PVT = Picture Vocabulary SB-5 = Stanford-Binet 5thedition VABS-3 = Vineland Adaptive Behavior Scales 3rd edition

e1230 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

completion of the SB-5 the order of remaining assessments wasrandomized with the exception of the NIHTB-CB which wasthe first assessment of day 2 The order of the NIHTB-CB testswas randomized for each participant At visit 2 the NIHTB-CBwas readministered with the same test order within participants

The NIHTB-CBThe NIHTB-CB is a computerized assessment validated inages 3 to 85 years in the general population The batteryincludes 7 tests Dimensional Change Card Sort (DCCS)Flanker Inhibitory Control and Attention (Flanker) ListSorting Working Memory (LS) Pattern Comparison Pro-cessing Speed (PC) Picture Sequence Memory (PSM) Pic-ture Vocabulary (PVT) and Oral Reading Recognition(OR)915 The NIHTB-CB provided experimental De-velopmental Extension (DEXT) versions of DCCS andFlanker designed to be more accessible to lower-functioning orvery young participants Because tests have multiple age ver-sions the participantrsquos mental age derived from the SB-5 wasused to select test versions allowing for a starting point ofreasonable difficulty thereby reducing frustration and im-proving compliance The DEXT versions were used for par-ticipants in the 3- to 7-year mental age range For PVT andORthere is 1 computerized adaptive testing (CAT) version andthe start point is typically based on age (children) or education(adults) Instead we used the education override feature en-tering the grade equivalent of the mental age as the start point

In addition PSM has multiple forms available for each testversion Pilot reliability results suggested nonequivalence offorms To assess PSM reliability Form A was used at visit 1and for half of participants at visit 2 (PSM A-A) the other halfreceived Form B at visit 2 (PSM A-B) For LS pilot studiesdemonstrated that additional teaching items improved feasi-bility For the current study PowerPoint slides of theseteaching items were used before test items on LS Age 3ndash6The NIHTB-CB developers then released the LS Age 3ndash6Experimental version with these extended instructions duringthe study which was subsequently used

Convergent validity testsSix tests were preselected as convergent validity measures forthe NIHTB-CB The NEPSY Inhibition subtest16 (NEPSY-In) iPad version was used as the convergent measure forDCCS The NEPSY-In measures cognitive flexibility and in-hibitory control From piloting the NEPSY-In we found thatparticipants could rarely do the most difficult level (Switching)We thus administered only theNaming and Inhibition portionsand created a prorated score indicating the number of correctitems per minute For Flanker we used the Conners KiddieContinuous Performance Test 2nd Edition17 administered ona computer with a spacebar as the response button The hitreaction time SD was used as the convergent validity variableFor LS convergent validity the SB-5 verbal working memoryraw score was used PC validity was measured with theWechsler Preschool and Primary Scale of Intelligence 4thEdition18 Bug Search from the number of correct items per

minute The Leiter International Performance Scale 3rd Edi-tion19 Forward Memory (FM) subtest assesses sequentialmemory span The raw score was the convergent variable forPSM The Peabody Picture Vocabulary Test 4th edition20

(PPVT-4) measures receptive vocabulary The raw score fromthe PPVT-4 iPad version was used for PVT For OR theWoodcock Johnson 4th Edition21 Letter-Word Identificationwas used which measures letter recognition and single wordreading Discriminant measures for NIHTB-CB tests were se-lected out of these measures by choosing a feasible measure ofa different construct than the NIHTB-CB test

AdaptationsTo increase feasibility and to improve reliability and validity inID we developed a manual of standardized procedures re-garding the test environment and NIHTB-CB administrationthe ldquoNIH Toolbox Cognitive Battery Supplemental Admin-istratorrsquos Manual for Intellectual and Developmental Dis-abilitiesrdquo (e-Manual hereafter Supplemental Manual linkslwwcomWNLB58)22 Strategies to proactively improvefeasibility and to reduce participant stress included usinga visual schedule before the visit a caregiver questionnaire onbehaviors and potential reinforcement rewards and a visualtoken board of the NIHTB-CB for the participant to check offduring testing Best practices in administering standardizedassessments with appropriate accommodations for ID wereused23 Test-specific guidelines are available in the manual toaid future users of the NIHTB-CB in standard administrationand feasibility specifically for the ID population In additionthe Supplemental Manual includes the Administration Formthat we developed to document test environment behavioralresponses and validity of tests for each participant

Data cleaning and scoringAfter every administration of the NIHTB-CB the Adminis-tration Form was used to record whether each test was con-sidered valid for the participant All analyses used only validscores The most common reasons for invalid scores were aninvalid response pattern refusal and excessive prompting(30 078 and 072 of all scores respectively)

Before conducting analyses we visually inspected all data andbivariate correlations for normality and the presence of outliersAll NIHTB-CB tests were examined for floor or ceiling issuesOnly 2 tests had such issues At visit 1 21 individuals receiveda score at the floor on LS Age 3ndash6 33 received a score at theceiling on PSM Age 3ndash4 and 4 received a score at the floor onPSMAge 5ndash6 After a thorough review of administration detailsand reliability and validity analyses LS floored scores were keptin the analyses Because PSM has multiple age versions andthese participants likely should have received a harder or easierversion PSM scores at floor or ceiling were excluded

Data analysisSAS version 94 (SAS Institute Inc Cary NC) and R version360 (R Foundation for Statistical Computing Vienna Aus-tria) were used for analyses Visit 1 data were used for

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1231

feasibility and validity analyses For validity and reliabilityNIHTB-CB raw scores were used computed score on DCCSFlanker and PC theta score on PSM PVT and OR raw scoreonLS and FlankerDEXT and percent correct onDCCSDEXTBecause DEXT scores are currently on a different scale thanstandard Flanker and DCCS scores DEXT results are presentedseparately For test-retest reliability single-score intraclass cor-relations (ICCs) were used The Cohen d was used to evaluatepotential practice effects with paired-sample t tests to measurethe significance of change Convergent discriminant and eco-logic validities were measured with Pearson correlations

Our prior work on IQmeasurement demonstrated the utility ofdeviation-based scoring to deal with problematic floor effects inID24 We used this method in the current study in place ofNIHTB-CB standard scores to circumvent imposed flooredscores (eg the current NIHTB-CB winsorizes age-correctedstandard scores at 54) We created z scores by transformingparticipant raw scores on each NIHTB-CB test using normativemeans and SDs for their chronologic age band The z scoreswere used to create deviation-based composites following thepreviously defined criteria25 For a Crystallized Composite 1 of2 valid scores on PVT and OR was required for a FluidComposite 4 of 5 valid scores on Flanker DCCS LS PC andPSM were required The Crystallized and Fluid z scores wereused to create a Cognitive Function Composite used in eco-logic validity analyses Deviation scores were also created forFSIQ on the SB-524 and were used in FSIQ analyses Groupcomparisons to examine known-groups validity were assessedwith a 2-way mixed-model analysis of variance on NIHTB-CB zscores Significant results were followed up with the Tukeyhonest significant difference tests to examine group differences

Data availabilityOn request to the corresponding author anonymized data areavailable to share

ResultsDescriptive statisticsTable 1 presents descriptive statistics by diagnostic group andoverall Groups did not differ significantly by chronologic age(F2238 = 091 p = 041) or by VABS-3 Adaptive BehaviorComposite (F2227 = 150 p = 023) However FSIQ differedsignificantly by group (F2238 = 316 p lt 0001) with FSIQhigher in OID than in both DS [t(238) = 757 p lt 0001] andFXS [t(238) = 605 p lt 0001] FSIQ did not significantly differbetweenDS and FXS [t(238) = 123 p = 022] Similarly mentalage was significantly different by group (F2238 = 253 p lt 0001)withmental age higher inOID than in bothDS [t(238) = 676 plt 0001] and FXS [t(238) = 540 p lt 0001] Mental age did notdiffer significantly between DS and FXS [t(238) = 110 p =027] Figure 1A shows the distribution of the NIHTB-CBCognitive Function Composite age-adjusted standard scores ofthe sample without the current imposed floor illustrating thevariability of the sample below this floor and the benefit ofdeviation-based composites (used in all analyses)

FeasibilityFeasibility data are provided in table 2 as the percentage ofparticipants with valid scores on each test Feasibility overall wassimilar to the normative 3- to 15-year-old sample26 with DCCSPC and LS having slightly lower feasibility and Flanker PSMPVT and OR having similar or higher feasibility rates than thenormative sample Even down to a mental age of 3 years PSMPVT and OR feasibility was very good The feasibility of theremaining tests improved particularly at 5 years All tests werefeasible for nearly every participant with amental age ofge6 years

Test-retest reliabilityTest-retest reliability was assessed after asymp1 month (mean = 317days SD = 63 days) (table 3) ICCs on each test were moderateto strong with the exception of DCCS DEXT and PSM A-Awhichwere in the high 040s All composites had strong reliabilityThree tests had small but significant visit 1 to visit 2 effect sizesreflecting amodest increase in performance Flanker PC and thePSM A-B group However the PSM A-B increase likely reflectsnonequivalent forms rather than a practice effect because thePSM A-A group had no practice effect The Fluid and CognitiveFunction composites also had small but significant increasessuggestive of small practice effects Within groups reliabilitycoefficients were mostly moderate to strong some DEXT andPSM reliabilities in small sample sizes were not significantFlanker DEXT in OID (ICC = 046 p = 014) and both PSMgroups in DS (A-A ICC = 024 p = 015 A-B ICC = 033 p =005) In FXS DCCS reliability (ICC = 041) was notably lowerthan in the total sample (ICC = 071) but FXS Flanker reliability(ICC= 084) was stronger than in the total sample (ICC= 074)

Construct validityConvergent and discriminant validity results are presented intable 4 Convergent correlations ranged from moderate tostrong with the exception of Flanker DEXT (n = 29 r =minus026 p = 017) Group results generally reflect validitysimilar to that of the overall results Of note PSM was onlyweakly correlated with Leiter-FM in DS (r = 033 p = 001)and in OID (r = 032 p = 001)

In DCCS for a subgroup of participants below a raw score of188 there appeared to be no association with the NEPSY-Inin this subgroup DCCS score was not significantly correlatedwith NEPSY-In score (r = 029 p = 016) This DCCS scorerepresents participants who did not pass the introductoryswitching portion of the test When this subgroup was re-moved validity improved (r = 057 p lt 0001)

Ecologic validityTable 5 provides the ecologic validity of NIHTB-CB tests andcomposites The composites each had moderate to strongcorrelations with FSIQ (figure 1B) as did all NIHTB-CB testscores other than Flanker DEXT VABS-3 Adaptive BehaviorComposite had small but significant correlations with severaltests (Flanker PC PSM PVT and OR) and with the Crys-tallized and Cognitive Function composites with better per-formance associated with higher levels of adaptive behavior

e1232 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Syndrome-specific comparisons (known-groups validity)To examine the specificity of the NIHTB-CB to detectsyndrome-specific performance a 2-way mixed-model analysisof variance was conducted on NIHTB-CB test z scores withgroup as a between-participants factor and NIHTB-CB test asa within-participants factor we also examined their interaction(figure 2) IQ was included as a repeated-measures varyingcovariate and an IQ-by-test interaction term was included toallow the effect of IQ to vary by test Because groups differed onFSIQ covarying IQ aimed to clarify whether group results re-flect phenotype-specific impairments or if they simply reflectglobally poorer performance due to overall level of cognitive

functioning To avoid overcontrolling for the domain ofinterest (because NIHTB-CB domains overlap with com-ponents of IQ) verbal IQ was used as the covariate for thefluid NIHTB-CB test outcomes (DCCS Flanker LS PCand PSM) and nonverbal IQ was used as the covariate forcrystallized NIHTB-CB tests (PVT and OR) To obtaineffect sizes for pairwise comparisons the Cohen d was cal-culated from the estimated marginal means from the modelto account for the effects of IQ

There was a main effect of group on NIHTB-CB z scores(F2238 = 490 p = 0008) as well as a main effect of test on zscores (F61067 = 1726 p lt 0001) These were qualified by

Table 1 Participant descriptive information

Total(n = 242)

Down syndrome(n = 91)

Fragile X syndrome(n = 75)

Other ID(n = 76)

Race n

American IndianAlaskaNative

3 1 1 1

Asian 6 2 2 2

Black 24 4 8 12

Native HawaiianPacificIslander

3 1 1 1

White 171 70 60 41

gt1 26 10 2 14

Ethnicity Hispanic orLatino

199 187 80 329

Sex male 593 451 733 632

Primary caregiver 4-ydegree

618 648 627 566

ASD diagnosis (parentreport)

353(8 unknown)

66(0 unknown)

520(6 unknown)

526(1 unknown)

M SD M SD M SD M SD

Chronologic age y 1571 515 1588 517 1616 492 1505 535

Mental age (SB-5) y 520 153 467 120 491 135 612 164

Full-scale IQ (SB-5) 5375 1587 4765 1290 5048 1513 6424 1469

Vineland-3 ABC score 5259 1711 5493 1623 5024 1863 5193 1654

DCCS computed score 381 251 333 224 320 226 468 271

Flanker computed score 492 236 454 223 446 244 571 228

LS raw score 630 441 430 363 602 363 834 475

PC computed score 3307 1593 2592 1374 3310 1408 3984 1674

PSM theta score minus158 090 minus186 076 minus174 087 minus109 088

PVT theta score minus291 280 minus379 273 minus270 257 minus209 283

OR theta score minus584 496 minus662 529 minus621 453 minus460 478

Abbreviations ABC = Adaptive Behavior Composite ASD = autism spectrum disorder DCCS = Dimensional Change Card Sort Flanker = Flanker InhibitoryControl and Attention ID = intellectual disability LS = List Sorting WorkingMemory M =mean OR = Oral Reading and Recognition PC = Pattern ComparisonProcessing Speed PSM = Picture Sequence Memory PVT = Picture Vocabulary SB-5 = Stanford-Binet 5th edition

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1233

a significant group times test interaction (F121067 = 368 p lt0001) and a significant IQ times test interaction (F61067 = 672 plt 0001) Follow up Tukey tests showed that FXS performedworse onDCCS thanOID [t(238) = 402 p lt 0001 d = 052]and worse than DS [t(238) = minus304 p = 0007 d = 039] OnFlanker FXS performed worse than OID [t(238) = 345 p =0002 d = 045] and worse than DS [t(238) = minus285 p = 001d = 037] These 2 EF test results supported the hypothesizedEF impairment in FXS although the DS impairment relativeto OID was not supported On PVT FXS performed betterthan DS fitting with expected language strength in FXS[t(238) = minus277 p = 002 d = 036] On PC DS showeda poorer performance than OID which approached

significance [t(238) = 224 p = 007 d = 029] There were nosignificant group differences on LS PSM or OR

DiscussionThis study provides the first comprehensive examination of thepsychometric properties and feasibility of the NIHTB-CB forindividuals with ID with an initial focus on 2 of the mostcommon genetic causes with robust translational researchprograms FXS and DS Overall the NIHTB-CB has demon-strated strong potential for use as an objective standardizedoutcome measure that can be confidently used in ID trials withparticipants with amental age of 5 years or higher Results of thestudy demonstrate very strong psychometrics for the Crystal-lized reasoning tests (PVT and OR) and good to excellentperformance of Fluid reasoning tests withmore variation acrossFXS DS and OID groups for some measures (eg strongreliability for Flanker in FXS compared toDS and vice versa forDCCS) Indeed the Fluid Composite appeared to have moreconsistently strong reliability across conditions and a solidconvergent association with FSIQ Thus it may be a goodcandidate outcome measure for studies seeking to examinebroad nonverbal cognitive changes for individuals with a mentalage of ge5 years Below a mental age of 5 years feasibility wasmore variable across tests indicating the need for furtheradaptations scoring algorithms on developmental extensionsor new tests targeting these lower-functioning individuals

The Supplemental Manual was compiled after hundreds ofadministrations of the NIHTB-CB to individuals with ID andthe study results on feasibility reliability and validity supportits use We encourage researchers and examiners planning touse the NIHTB-CB to follow these guidelines for the IDpopulation

Group comparisons demonstrated that the NIHTB-CB issensitive to substantial EF deficits among individuals with IDin that all 3 groups performed relatively poorly on Flanker andDCCS In particular participants with FXS showed weaknessin inhibitory control and attention and cognitive flexibility inexcess of their general cognitive level and compared to con-trols with other forms of ID This aligns with previous re-search showing that boys with FXS are impaired in inhibitorycontrol set shifting and planning relative to mental agendashmatched controls27 and that boys with FXS have impairmentsin inhibition and attention relative to mental agendashmatchedcontrols and relative to children with DS28 The hypothesizedEF weakness in DS compared to OID (to a lesser extent thanFXS) was not found however there have been some mixedresults on EF in children with DS compared to children withother IDs2930 The low-scoring subgroup on DCCS are thosewho failed introduction to the switching portion of the testfor participants who cannot perform switching at the earliestlevel the test may be less sensitive to variation in EF Re-moving this subgroup from analyses strengthened the con-vergent validity correlation This suggests that these

Figure 1 NIHTB-CB cognitive function composite scoredistribution and association with FSIQ

(A) Histogram showing the true distribution of cognitive function compositescores in the full sample of individuals with intellectual disability (ID) TheNIH Toolbox Cognitive Battery (NIHTB-CB) current floor (winsorized at 54vertical blue line) excludedmore than half of the present sample frommoreaccurate measurement below that level (B) Association between the de-viation-based cognitive function composite and deviation-based full-scaleIQ (FSIQ) by group

e1234 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

individualsrsquo level or lack of cognitive flexibility may not becaptured by the introductory switching portion of DCCS Anextension of DCCS that does capture this construct in veryyoung or low-functioning persons would have much valueThe lower reliability of DCCS in the FXS group may alsoreflect this limitation in that participants with FXS overallscored very low on this test Because anxiety and hyperactivityare common in FXS it is possible that individual state inter-acts with the task complexity and cognitive flexibility to resultin more variable performance over time in this group Whenparticipants needed Flanker DEXT or DCCS DEXT theywere almost always able to perform the tests however a fewissues remain to be worked out regarding these experimentalversions notably the ability to interpret these scores relativeto the standard Flanker and DCCS scores This study high-lighted other concerns with the DEXT measures (eg diffi-culty may be ordered incorrectly or portions are overlyburdensome) Our results provide clear evidence that DEXTlevels are necessary (and extremely feasible) especially inthose with FXS and DS but that further refinements andmodifications are necessary

To improve feasibility of LS extra instructions and practiceitems were developed and used The LS Age 3ndash6 experimentalversion with these additions is now available Feasibility didimprove after our initial pilot studies however the testremains challenging for this population and some participantspass practice but get no test sequences completely correctThe limited feasibility and floored or low variability in rawscores suggest that a lower range of LS is necessary A po-tential approach may be to give some credit on earlysequences that are partially recalled or items recalled out ofsequence because the construct of working memory builds onmore basic short-term memory (eg SB-5 Working Memoryindex and Wechsler Working Memory indices)1331

Both language tests PVT and OR had excellent performancein our samples of individuals with ID Both demonstratedstrong reliability and clear domain specificity with a muchhigher convergent than discriminant correlation TheNIHTB-CB was sensitive to expected language characteristicsin that DS was impaired relative to FXS on PVT32 These testshave an advantage over other language measures such as thePPVT-4 in that PVT and OR are brief (asymp3 minutes each) butaccurate owing to CAT the ability to obtain results witha brief assessment is especially important in individuals withfrequent behavior or attention issues such as in those with ID

Episodic memory is relevant to clinical trials particularly forDS in which memory impairments are well documented33

and FXS in which memory of sequential information is es-pecially impaired34 It is important to emphasize that despitethese known weaknesses PSM was among the highest of thetest scores in each group (figure 2) suggesting that individualsmay have compensatory strategies for performing well per-haps such as use of the contextual information in the storiesReliability was moderate and lower than that of the normative3- 15-year-old study although improved from our pilot studyThe significant effect size in the PSM A-B group suggestsnonequivalence of Forms A and B Notably in DS ICCs weresmall and nonsignificant suggesting that in its present formPSM appears contraindicated as a separate outcome measurefor this population

The high rate of ceiling scores on PSM 3ndash4 suggests thatmental age may not be a good indicator of start point on PSMin ID at least at this mental age level It is also possible thatthis age version is too easy in comparison to PSM scores onolder versions perhaps a personrsquos score on 1 PSM version isnot equivalent to that personrsquos score on another De-velopment of a CAT PSM test with the full range of version

Table 2 At visit 1 proportion of participants who received a valid scorea

Total n () Down syndrome n () Fragile X syndrome n () Other ID n ()

Flanker 195 (806) 78 (867) 50 (667) 67 (882)

Flanker (with DEXT)b 239 (988) 89 (989) 75 (987) 75 (987)

DCCS 152 (636) 55 (611) 40 (541) 57 (760)

DCCS (with DEXT)b 232 (971) 87 (967) 72 (973) 73 (973)

LS 164 (689) 57 (648) 45 (600) 62 (827)

PC 179 (752) 59 (648) 58 (801) 62 (827)

PSM 220 (921) 85 (944) 65 (867) 70 (933)

PVT 237 (983) 88 (978) 73 (973) 76 (1000)

OR 233 (979) 87 (978) 72 (973) 74 (987)

Abbreviations DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Flanker = Flanker Inhibitory Control and Attention ID = intellectualdisability LS = List Sorting Working Memory OR = Oral Reading and Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PVT = Picture Vocabularya Technical difficulties and administration errors are excludedb On Flanker 274 of the sample needed the DEXT portion On DCCS 531 needed DEXT

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1235

difficulties would likely simplify the testing process and yieldmore comparable and reliable scores

PC showed adequate feasibility overall and good feasibility inFXS and OID with sufficient feasibility at a mental age of ge4years While the ICC was excellent there was a small butsignificant practice effect although smaller than that found inthe normative sample35 The most common reason for lack offeasibility on PC was an invalid alternating response patternespecially common in DS The task has an inherent challengeof understanding ldquosamerdquo and ldquodifferentrdquo while mapping thischoice onto smiley face and frown face options For partic-ipants without invalid response patterns PC performs well inID On the basis of the feasibility challenges we developeda new processing speed task Speeded Matching now avail-able as an experimental version in the app The task is to selectthe animal face among 3 foils that matches a target image

Future psychometric studies will provide more informationabout its performance

The Fluid Crystallized and Cognitive Function compositesdemonstrated reliability results similar to those of the nor-mative age 3 to 6 sample with small practice effects in Fluidand Cognitive Function composites (although smaller than inthe normative sample) Each composite was well correlatedwith FSIQ Although not all participants were able to receivea composite (due to missing valid scores on some tests) whencomplete the composites appear to perform well The de-viation method used to create these composites has a clearadvantage over the current age-corrected standard scores onwhich more than half of our sample obtained scores at thelower limit currently set at 54 We are conducting analyses toidentify the best option for composite scores below this floor(deviation approach vs extension of existing age-adjusted

Table 3 Test-retest reliability and examination of practice effects

Total Down syndrome Fragile X syndrome Other ID

No ICC (95 CI)Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend

Flanker 144 074(065ndash081)

009a 56 063(044ndash076)

020 37 084(070ndash091)

minus011 51 069(051ndash081)

015

FlankerDEXT

36 060(033ndash077)

minus001 8 073(018ndash094)

minus033 23 062(028ndash082)

minus005 5 046(minus037ndash092)

056

DCCS 97 071(060ndash080)

013 32 078(058ndash088)

018 23 041(001ndash069)

024 42 082(068ndash090)

003

DCCS DEXT 73 049(030ndash065)

minus016 32 046(014ndash069)

minus017 27 050(016ndash074)

minus013 14 053(003ndash082)

minus021

LS 113 074(064ndash081)

014 37 075(056ndash086)

024 32 065(040ndash081)

minus011 43 069(049ndash082)

022

PC 133 077(062ndash085)

029b 45 074(048ndash086)

039b 40 071(050ndash084)

021a 48 075(054ndash086)

035c

PSM A-A 60 047(025ndash064)

004 22 024(minus022ndash060)

minus014 15 054(007ndash082)

027 23 040(001ndash069)

011

PSM A-B 60 055(034ndash071)

029a 22 033(minus006ndash065)

022 18 054(014ndash080)

025 20 040(minus003ndash071)

034

PVT 186 085(081ndash089)

003 69 087(080ndash092)

008 57 079(066ndash087)

001 60 086(078ndash092)

minus002

OR 183 096(095ndash097)

002 70 095(092ndash097)

001 56 096(093ndash098)

minus001 57 098(097ndash099)

004

FC 97 083(067ndash091)

033b 28 084(052ndash093)

040b 27 079(056ndash090)

034a 42 077(055ndash088)

033c

CC 191 093(091ndash095)

000 73 092(087ndash095)

002 58 094(090ndash096)

minus003 60 091(085ndash094)

minus000

CFC 97 092(085ndash096)

022b 28 093(079ndash097)

023c 27 091(079ndash096)

027a 42 089(078ndash094)

019a

Abbreviations A-A = Form A at visit 1 and Form A at visit 2 A-B = Form A at visit 1 and Form B at visit 2 CC = Crystallized Composite CFC = Cognitive FunctionComposite CI = confidence interval DCCS = Dimensional Change Card Sort DEXT = Developmental Extension FC = Fluid Composite Flanker = FlankerInhibitory Control and Attention ICC = intraclass correlation ID = intellectual disability LS = List SortingWorkingMemory OR =Oral Reading and RecognitionPC = Pattern Comparison Processing Speed PSM = Picture Sequence Memory PVT = Picture Vocabularya p lt 005b p lt 0001c p lt 001

e1236 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 2: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

Approximately 20 of the global population has an intellectualdisability (ID) an understudied condition with lifelong effectson academic vocational and personal functioning1 As theetiologies and mechanisms underlying specific forms of ID arediscovered targeted treatments and human clinical trials soonfollow in the translational process23 raising the potential formedical remediation of disability The preclinical developmentof promising targeted treatments for ID-associated disordershas not been followed by successes in human trials Un-fortunately testing accessibility issues pervasive floor effectsand lack of consensus on acceptable cognitive endpoints havebeen obstacles in the field Although other barriers such aslimitations of animal models are involved it is apparent that theIDs continue to lag behind other neurologic or psychiatricconditions on scalable psychometrically supported andbroadly accepted endpoints45

The NIH Toolbox Cognitive Battery6ndash8 (NIHTB-CB) aniPad-based battery of brief memory executive function (EF)processing speed and language tests was developed within theNIH Blueprint for Neuroscience Research The NIHTB-CBhas the potential to provide a highly standardized objectiveand scalable tool for use across laboratories and clinical trialsites As an extension of our pilot work9 the present studyreflects progress made over a 4-year period to empirically val-idate and refine the NIHTB-CB for ID (aim 1) includingstandardized administration guidelines required for this chal-lenging population with the goal of supporting its use as a set ofoutcome measures for clinical trials and other clinical researchThe second aim was to measure the sensitivity of the battery todetect known cognitive phenotypes in 2 ID-associated syn-dromes that are a focus of translational research Down syn-drome5 (DS) and fragile X syndrome10 (FXS) On the basis ofprior research we hypothesized EF deficits in FXS and DS andepisodic memory deficits in DS (relative to a heterogeneousother-ID group) a relative language strength in FXS and nogroup differences on visual processing speed

MethodsStandard protocol approvals registrationsand patient consentsInstitutional Review Board approval was obtained at each site(University of California Davis MIND Institute University ofDenver and Rush University Medical Center) before study

initiation Written consent was obtained from each guardian-participant pair according to Institutional Review Boardrequirements

ParticipantsBecause of the study aim to measure the sensitivity of thebattery to syndrome-specific cognitive phenotypes 3 groupswere recruited Two ID-associated syndromes were chosen DS(affecting asymp1 in 700)11 and FXS (affecting asymp1 in 7000 malesand 1 in 11000 females)12 Individuals with ID of other orunknown cause (OID) were also recruited to evaluate theNIHTB-CB within a more heterogeneous group and to serveas comparison to FXS and DS Eligible participants met thefollowing criteria chronologic age of 6 through 25 years full-scale IQ (FSIQ) lt80 on the Stanford-Binet 5th edition (SB-5)13 mental age of at least 30 years on the SB-5 (in concor-dance with the lowest chronologic age limit of the NIHTB-CB) adaptive behavior deficits as measured by the VinelandAdaptive Behavior Scales 3rd edition Comprehensive In-terview14 (VABS-3) speech of at least short phrases English asfirst language and stable medication and intervention regimenfor 6 weeks before enrollment Exclusion criteria were un-corrected vision impairment uncontrolled seizures motorimpairment affecting touchscreen use and a history of headtrauma brain infection or stroke

In all 288 participants consented to the study After comple-tion of the SB-5 45 participants were ineligible 16 with anFSIQ gt80 and 29 with a mental age lt30 years One participantdiscontinued due to behavioral noncompliance Across sites242 participants completed initial neuropsychological testingwith 228 completing retesting of the NIHTB-CB asymp1 monthlater to examine test reliability This retest duration was se-lected to evaluate reliability within a typical time interval used inclinical trials Participants included 91 with DS 75 with FXSand 76 with OID A subset of 21 participants with OID hada diagnosed ID-associated syndrome the represented syn-dromes were 16p112 deletion (1) 22q112 deletion (1)Bannayan Riley Ruvalcaba (1) cri-du-chat (1) fetal alcohol(5) Floating-Harbor (1) Kleefstra (1) mitochondrial disease(1) mosaic trisomy 8 (1) neurofibromatosis type 1 (2)Phelan-McDermid (1) Potocki-Lupski (4) and Williams (1)

ProtocolThe NIHTB-CB and all convergent validity measures (seebelow) were completed at visit 1 across 2 days After

GlossaryCAT = computerized adaptive testing DCCS = Dimensional Change Card Sort DEXT = Developmental Extension DS =Down syndrome EF = executive function Flanker = Flanker Inhibitory Control and Attention FM = ForwardMemory FXS =fragile X syndrome FSIQ = full-scale IQ ICC = intraclass correlation ID = intellectual disability LS = List Sorting WorkingMemory NEPSY-In = NEPSY Inhibition subtest NIHTB-CB = NIH Toolbox Cognitive Battery OID = ID of other orunknown cause OR = Oral Reading Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PPVT-4 = Peabody Picture Vocabulary Test 4th edition PVT = Picture Vocabulary SB-5 = Stanford-Binet 5thedition VABS-3 = Vineland Adaptive Behavior Scales 3rd edition

e1230 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

completion of the SB-5 the order of remaining assessments wasrandomized with the exception of the NIHTB-CB which wasthe first assessment of day 2 The order of the NIHTB-CB testswas randomized for each participant At visit 2 the NIHTB-CBwas readministered with the same test order within participants

The NIHTB-CBThe NIHTB-CB is a computerized assessment validated inages 3 to 85 years in the general population The batteryincludes 7 tests Dimensional Change Card Sort (DCCS)Flanker Inhibitory Control and Attention (Flanker) ListSorting Working Memory (LS) Pattern Comparison Pro-cessing Speed (PC) Picture Sequence Memory (PSM) Pic-ture Vocabulary (PVT) and Oral Reading Recognition(OR)915 The NIHTB-CB provided experimental De-velopmental Extension (DEXT) versions of DCCS andFlanker designed to be more accessible to lower-functioning orvery young participants Because tests have multiple age ver-sions the participantrsquos mental age derived from the SB-5 wasused to select test versions allowing for a starting point ofreasonable difficulty thereby reducing frustration and im-proving compliance The DEXT versions were used for par-ticipants in the 3- to 7-year mental age range For PVT andORthere is 1 computerized adaptive testing (CAT) version andthe start point is typically based on age (children) or education(adults) Instead we used the education override feature en-tering the grade equivalent of the mental age as the start point

In addition PSM has multiple forms available for each testversion Pilot reliability results suggested nonequivalence offorms To assess PSM reliability Form A was used at visit 1and for half of participants at visit 2 (PSM A-A) the other halfreceived Form B at visit 2 (PSM A-B) For LS pilot studiesdemonstrated that additional teaching items improved feasi-bility For the current study PowerPoint slides of theseteaching items were used before test items on LS Age 3ndash6The NIHTB-CB developers then released the LS Age 3ndash6Experimental version with these extended instructions duringthe study which was subsequently used

Convergent validity testsSix tests were preselected as convergent validity measures forthe NIHTB-CB The NEPSY Inhibition subtest16 (NEPSY-In) iPad version was used as the convergent measure forDCCS The NEPSY-In measures cognitive flexibility and in-hibitory control From piloting the NEPSY-In we found thatparticipants could rarely do the most difficult level (Switching)We thus administered only theNaming and Inhibition portionsand created a prorated score indicating the number of correctitems per minute For Flanker we used the Conners KiddieContinuous Performance Test 2nd Edition17 administered ona computer with a spacebar as the response button The hitreaction time SD was used as the convergent validity variableFor LS convergent validity the SB-5 verbal working memoryraw score was used PC validity was measured with theWechsler Preschool and Primary Scale of Intelligence 4thEdition18 Bug Search from the number of correct items per

minute The Leiter International Performance Scale 3rd Edi-tion19 Forward Memory (FM) subtest assesses sequentialmemory span The raw score was the convergent variable forPSM The Peabody Picture Vocabulary Test 4th edition20

(PPVT-4) measures receptive vocabulary The raw score fromthe PPVT-4 iPad version was used for PVT For OR theWoodcock Johnson 4th Edition21 Letter-Word Identificationwas used which measures letter recognition and single wordreading Discriminant measures for NIHTB-CB tests were se-lected out of these measures by choosing a feasible measure ofa different construct than the NIHTB-CB test

AdaptationsTo increase feasibility and to improve reliability and validity inID we developed a manual of standardized procedures re-garding the test environment and NIHTB-CB administrationthe ldquoNIH Toolbox Cognitive Battery Supplemental Admin-istratorrsquos Manual for Intellectual and Developmental Dis-abilitiesrdquo (e-Manual hereafter Supplemental Manual linkslwwcomWNLB58)22 Strategies to proactively improvefeasibility and to reduce participant stress included usinga visual schedule before the visit a caregiver questionnaire onbehaviors and potential reinforcement rewards and a visualtoken board of the NIHTB-CB for the participant to check offduring testing Best practices in administering standardizedassessments with appropriate accommodations for ID wereused23 Test-specific guidelines are available in the manual toaid future users of the NIHTB-CB in standard administrationand feasibility specifically for the ID population In additionthe Supplemental Manual includes the Administration Formthat we developed to document test environment behavioralresponses and validity of tests for each participant

Data cleaning and scoringAfter every administration of the NIHTB-CB the Adminis-tration Form was used to record whether each test was con-sidered valid for the participant All analyses used only validscores The most common reasons for invalid scores were aninvalid response pattern refusal and excessive prompting(30 078 and 072 of all scores respectively)

Before conducting analyses we visually inspected all data andbivariate correlations for normality and the presence of outliersAll NIHTB-CB tests were examined for floor or ceiling issuesOnly 2 tests had such issues At visit 1 21 individuals receiveda score at the floor on LS Age 3ndash6 33 received a score at theceiling on PSM Age 3ndash4 and 4 received a score at the floor onPSMAge 5ndash6 After a thorough review of administration detailsand reliability and validity analyses LS floored scores were keptin the analyses Because PSM has multiple age versions andthese participants likely should have received a harder or easierversion PSM scores at floor or ceiling were excluded

Data analysisSAS version 94 (SAS Institute Inc Cary NC) and R version360 (R Foundation for Statistical Computing Vienna Aus-tria) were used for analyses Visit 1 data were used for

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1231

feasibility and validity analyses For validity and reliabilityNIHTB-CB raw scores were used computed score on DCCSFlanker and PC theta score on PSM PVT and OR raw scoreonLS and FlankerDEXT and percent correct onDCCSDEXTBecause DEXT scores are currently on a different scale thanstandard Flanker and DCCS scores DEXT results are presentedseparately For test-retest reliability single-score intraclass cor-relations (ICCs) were used The Cohen d was used to evaluatepotential practice effects with paired-sample t tests to measurethe significance of change Convergent discriminant and eco-logic validities were measured with Pearson correlations

Our prior work on IQmeasurement demonstrated the utility ofdeviation-based scoring to deal with problematic floor effects inID24 We used this method in the current study in place ofNIHTB-CB standard scores to circumvent imposed flooredscores (eg the current NIHTB-CB winsorizes age-correctedstandard scores at 54) We created z scores by transformingparticipant raw scores on each NIHTB-CB test using normativemeans and SDs for their chronologic age band The z scoreswere used to create deviation-based composites following thepreviously defined criteria25 For a Crystallized Composite 1 of2 valid scores on PVT and OR was required for a FluidComposite 4 of 5 valid scores on Flanker DCCS LS PC andPSM were required The Crystallized and Fluid z scores wereused to create a Cognitive Function Composite used in eco-logic validity analyses Deviation scores were also created forFSIQ on the SB-524 and were used in FSIQ analyses Groupcomparisons to examine known-groups validity were assessedwith a 2-way mixed-model analysis of variance on NIHTB-CB zscores Significant results were followed up with the Tukeyhonest significant difference tests to examine group differences

Data availabilityOn request to the corresponding author anonymized data areavailable to share

ResultsDescriptive statisticsTable 1 presents descriptive statistics by diagnostic group andoverall Groups did not differ significantly by chronologic age(F2238 = 091 p = 041) or by VABS-3 Adaptive BehaviorComposite (F2227 = 150 p = 023) However FSIQ differedsignificantly by group (F2238 = 316 p lt 0001) with FSIQhigher in OID than in both DS [t(238) = 757 p lt 0001] andFXS [t(238) = 605 p lt 0001] FSIQ did not significantly differbetweenDS and FXS [t(238) = 123 p = 022] Similarly mentalage was significantly different by group (F2238 = 253 p lt 0001)withmental age higher inOID than in bothDS [t(238) = 676 plt 0001] and FXS [t(238) = 540 p lt 0001] Mental age did notdiffer significantly between DS and FXS [t(238) = 110 p =027] Figure 1A shows the distribution of the NIHTB-CBCognitive Function Composite age-adjusted standard scores ofthe sample without the current imposed floor illustrating thevariability of the sample below this floor and the benefit ofdeviation-based composites (used in all analyses)

FeasibilityFeasibility data are provided in table 2 as the percentage ofparticipants with valid scores on each test Feasibility overall wassimilar to the normative 3- to 15-year-old sample26 with DCCSPC and LS having slightly lower feasibility and Flanker PSMPVT and OR having similar or higher feasibility rates than thenormative sample Even down to a mental age of 3 years PSMPVT and OR feasibility was very good The feasibility of theremaining tests improved particularly at 5 years All tests werefeasible for nearly every participant with amental age ofge6 years

Test-retest reliabilityTest-retest reliability was assessed after asymp1 month (mean = 317days SD = 63 days) (table 3) ICCs on each test were moderateto strong with the exception of DCCS DEXT and PSM A-Awhichwere in the high 040s All composites had strong reliabilityThree tests had small but significant visit 1 to visit 2 effect sizesreflecting amodest increase in performance Flanker PC and thePSM A-B group However the PSM A-B increase likely reflectsnonequivalent forms rather than a practice effect because thePSM A-A group had no practice effect The Fluid and CognitiveFunction composites also had small but significant increasessuggestive of small practice effects Within groups reliabilitycoefficients were mostly moderate to strong some DEXT andPSM reliabilities in small sample sizes were not significantFlanker DEXT in OID (ICC = 046 p = 014) and both PSMgroups in DS (A-A ICC = 024 p = 015 A-B ICC = 033 p =005) In FXS DCCS reliability (ICC = 041) was notably lowerthan in the total sample (ICC = 071) but FXS Flanker reliability(ICC= 084) was stronger than in the total sample (ICC= 074)

Construct validityConvergent and discriminant validity results are presented intable 4 Convergent correlations ranged from moderate tostrong with the exception of Flanker DEXT (n = 29 r =minus026 p = 017) Group results generally reflect validitysimilar to that of the overall results Of note PSM was onlyweakly correlated with Leiter-FM in DS (r = 033 p = 001)and in OID (r = 032 p = 001)

In DCCS for a subgroup of participants below a raw score of188 there appeared to be no association with the NEPSY-Inin this subgroup DCCS score was not significantly correlatedwith NEPSY-In score (r = 029 p = 016) This DCCS scorerepresents participants who did not pass the introductoryswitching portion of the test When this subgroup was re-moved validity improved (r = 057 p lt 0001)

Ecologic validityTable 5 provides the ecologic validity of NIHTB-CB tests andcomposites The composites each had moderate to strongcorrelations with FSIQ (figure 1B) as did all NIHTB-CB testscores other than Flanker DEXT VABS-3 Adaptive BehaviorComposite had small but significant correlations with severaltests (Flanker PC PSM PVT and OR) and with the Crys-tallized and Cognitive Function composites with better per-formance associated with higher levels of adaptive behavior

e1232 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Syndrome-specific comparisons (known-groups validity)To examine the specificity of the NIHTB-CB to detectsyndrome-specific performance a 2-way mixed-model analysisof variance was conducted on NIHTB-CB test z scores withgroup as a between-participants factor and NIHTB-CB test asa within-participants factor we also examined their interaction(figure 2) IQ was included as a repeated-measures varyingcovariate and an IQ-by-test interaction term was included toallow the effect of IQ to vary by test Because groups differed onFSIQ covarying IQ aimed to clarify whether group results re-flect phenotype-specific impairments or if they simply reflectglobally poorer performance due to overall level of cognitive

functioning To avoid overcontrolling for the domain ofinterest (because NIHTB-CB domains overlap with com-ponents of IQ) verbal IQ was used as the covariate for thefluid NIHTB-CB test outcomes (DCCS Flanker LS PCand PSM) and nonverbal IQ was used as the covariate forcrystallized NIHTB-CB tests (PVT and OR) To obtaineffect sizes for pairwise comparisons the Cohen d was cal-culated from the estimated marginal means from the modelto account for the effects of IQ

There was a main effect of group on NIHTB-CB z scores(F2238 = 490 p = 0008) as well as a main effect of test on zscores (F61067 = 1726 p lt 0001) These were qualified by

Table 1 Participant descriptive information

Total(n = 242)

Down syndrome(n = 91)

Fragile X syndrome(n = 75)

Other ID(n = 76)

Race n

American IndianAlaskaNative

3 1 1 1

Asian 6 2 2 2

Black 24 4 8 12

Native HawaiianPacificIslander

3 1 1 1

White 171 70 60 41

gt1 26 10 2 14

Ethnicity Hispanic orLatino

199 187 80 329

Sex male 593 451 733 632

Primary caregiver 4-ydegree

618 648 627 566

ASD diagnosis (parentreport)

353(8 unknown)

66(0 unknown)

520(6 unknown)

526(1 unknown)

M SD M SD M SD M SD

Chronologic age y 1571 515 1588 517 1616 492 1505 535

Mental age (SB-5) y 520 153 467 120 491 135 612 164

Full-scale IQ (SB-5) 5375 1587 4765 1290 5048 1513 6424 1469

Vineland-3 ABC score 5259 1711 5493 1623 5024 1863 5193 1654

DCCS computed score 381 251 333 224 320 226 468 271

Flanker computed score 492 236 454 223 446 244 571 228

LS raw score 630 441 430 363 602 363 834 475

PC computed score 3307 1593 2592 1374 3310 1408 3984 1674

PSM theta score minus158 090 minus186 076 minus174 087 minus109 088

PVT theta score minus291 280 minus379 273 minus270 257 minus209 283

OR theta score minus584 496 minus662 529 minus621 453 minus460 478

Abbreviations ABC = Adaptive Behavior Composite ASD = autism spectrum disorder DCCS = Dimensional Change Card Sort Flanker = Flanker InhibitoryControl and Attention ID = intellectual disability LS = List Sorting WorkingMemory M =mean OR = Oral Reading and Recognition PC = Pattern ComparisonProcessing Speed PSM = Picture Sequence Memory PVT = Picture Vocabulary SB-5 = Stanford-Binet 5th edition

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1233

a significant group times test interaction (F121067 = 368 p lt0001) and a significant IQ times test interaction (F61067 = 672 plt 0001) Follow up Tukey tests showed that FXS performedworse onDCCS thanOID [t(238) = 402 p lt 0001 d = 052]and worse than DS [t(238) = minus304 p = 0007 d = 039] OnFlanker FXS performed worse than OID [t(238) = 345 p =0002 d = 045] and worse than DS [t(238) = minus285 p = 001d = 037] These 2 EF test results supported the hypothesizedEF impairment in FXS although the DS impairment relativeto OID was not supported On PVT FXS performed betterthan DS fitting with expected language strength in FXS[t(238) = minus277 p = 002 d = 036] On PC DS showeda poorer performance than OID which approached

significance [t(238) = 224 p = 007 d = 029] There were nosignificant group differences on LS PSM or OR

DiscussionThis study provides the first comprehensive examination of thepsychometric properties and feasibility of the NIHTB-CB forindividuals with ID with an initial focus on 2 of the mostcommon genetic causes with robust translational researchprograms FXS and DS Overall the NIHTB-CB has demon-strated strong potential for use as an objective standardizedoutcome measure that can be confidently used in ID trials withparticipants with amental age of 5 years or higher Results of thestudy demonstrate very strong psychometrics for the Crystal-lized reasoning tests (PVT and OR) and good to excellentperformance of Fluid reasoning tests withmore variation acrossFXS DS and OID groups for some measures (eg strongreliability for Flanker in FXS compared toDS and vice versa forDCCS) Indeed the Fluid Composite appeared to have moreconsistently strong reliability across conditions and a solidconvergent association with FSIQ Thus it may be a goodcandidate outcome measure for studies seeking to examinebroad nonverbal cognitive changes for individuals with a mentalage of ge5 years Below a mental age of 5 years feasibility wasmore variable across tests indicating the need for furtheradaptations scoring algorithms on developmental extensionsor new tests targeting these lower-functioning individuals

The Supplemental Manual was compiled after hundreds ofadministrations of the NIHTB-CB to individuals with ID andthe study results on feasibility reliability and validity supportits use We encourage researchers and examiners planning touse the NIHTB-CB to follow these guidelines for the IDpopulation

Group comparisons demonstrated that the NIHTB-CB issensitive to substantial EF deficits among individuals with IDin that all 3 groups performed relatively poorly on Flanker andDCCS In particular participants with FXS showed weaknessin inhibitory control and attention and cognitive flexibility inexcess of their general cognitive level and compared to con-trols with other forms of ID This aligns with previous re-search showing that boys with FXS are impaired in inhibitorycontrol set shifting and planning relative to mental agendashmatched controls27 and that boys with FXS have impairmentsin inhibition and attention relative to mental agendashmatchedcontrols and relative to children with DS28 The hypothesizedEF weakness in DS compared to OID (to a lesser extent thanFXS) was not found however there have been some mixedresults on EF in children with DS compared to children withother IDs2930 The low-scoring subgroup on DCCS are thosewho failed introduction to the switching portion of the testfor participants who cannot perform switching at the earliestlevel the test may be less sensitive to variation in EF Re-moving this subgroup from analyses strengthened the con-vergent validity correlation This suggests that these

Figure 1 NIHTB-CB cognitive function composite scoredistribution and association with FSIQ

(A) Histogram showing the true distribution of cognitive function compositescores in the full sample of individuals with intellectual disability (ID) TheNIH Toolbox Cognitive Battery (NIHTB-CB) current floor (winsorized at 54vertical blue line) excludedmore than half of the present sample frommoreaccurate measurement below that level (B) Association between the de-viation-based cognitive function composite and deviation-based full-scaleIQ (FSIQ) by group

e1234 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

individualsrsquo level or lack of cognitive flexibility may not becaptured by the introductory switching portion of DCCS Anextension of DCCS that does capture this construct in veryyoung or low-functioning persons would have much valueThe lower reliability of DCCS in the FXS group may alsoreflect this limitation in that participants with FXS overallscored very low on this test Because anxiety and hyperactivityare common in FXS it is possible that individual state inter-acts with the task complexity and cognitive flexibility to resultin more variable performance over time in this group Whenparticipants needed Flanker DEXT or DCCS DEXT theywere almost always able to perform the tests however a fewissues remain to be worked out regarding these experimentalversions notably the ability to interpret these scores relativeto the standard Flanker and DCCS scores This study high-lighted other concerns with the DEXT measures (eg diffi-culty may be ordered incorrectly or portions are overlyburdensome) Our results provide clear evidence that DEXTlevels are necessary (and extremely feasible) especially inthose with FXS and DS but that further refinements andmodifications are necessary

To improve feasibility of LS extra instructions and practiceitems were developed and used The LS Age 3ndash6 experimentalversion with these additions is now available Feasibility didimprove after our initial pilot studies however the testremains challenging for this population and some participantspass practice but get no test sequences completely correctThe limited feasibility and floored or low variability in rawscores suggest that a lower range of LS is necessary A po-tential approach may be to give some credit on earlysequences that are partially recalled or items recalled out ofsequence because the construct of working memory builds onmore basic short-term memory (eg SB-5 Working Memoryindex and Wechsler Working Memory indices)1331

Both language tests PVT and OR had excellent performancein our samples of individuals with ID Both demonstratedstrong reliability and clear domain specificity with a muchhigher convergent than discriminant correlation TheNIHTB-CB was sensitive to expected language characteristicsin that DS was impaired relative to FXS on PVT32 These testshave an advantage over other language measures such as thePPVT-4 in that PVT and OR are brief (asymp3 minutes each) butaccurate owing to CAT the ability to obtain results witha brief assessment is especially important in individuals withfrequent behavior or attention issues such as in those with ID

Episodic memory is relevant to clinical trials particularly forDS in which memory impairments are well documented33

and FXS in which memory of sequential information is es-pecially impaired34 It is important to emphasize that despitethese known weaknesses PSM was among the highest of thetest scores in each group (figure 2) suggesting that individualsmay have compensatory strategies for performing well per-haps such as use of the contextual information in the storiesReliability was moderate and lower than that of the normative3- 15-year-old study although improved from our pilot studyThe significant effect size in the PSM A-B group suggestsnonequivalence of Forms A and B Notably in DS ICCs weresmall and nonsignificant suggesting that in its present formPSM appears contraindicated as a separate outcome measurefor this population

The high rate of ceiling scores on PSM 3ndash4 suggests thatmental age may not be a good indicator of start point on PSMin ID at least at this mental age level It is also possible thatthis age version is too easy in comparison to PSM scores onolder versions perhaps a personrsquos score on 1 PSM version isnot equivalent to that personrsquos score on another De-velopment of a CAT PSM test with the full range of version

Table 2 At visit 1 proportion of participants who received a valid scorea

Total n () Down syndrome n () Fragile X syndrome n () Other ID n ()

Flanker 195 (806) 78 (867) 50 (667) 67 (882)

Flanker (with DEXT)b 239 (988) 89 (989) 75 (987) 75 (987)

DCCS 152 (636) 55 (611) 40 (541) 57 (760)

DCCS (with DEXT)b 232 (971) 87 (967) 72 (973) 73 (973)

LS 164 (689) 57 (648) 45 (600) 62 (827)

PC 179 (752) 59 (648) 58 (801) 62 (827)

PSM 220 (921) 85 (944) 65 (867) 70 (933)

PVT 237 (983) 88 (978) 73 (973) 76 (1000)

OR 233 (979) 87 (978) 72 (973) 74 (987)

Abbreviations DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Flanker = Flanker Inhibitory Control and Attention ID = intellectualdisability LS = List Sorting Working Memory OR = Oral Reading and Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PVT = Picture Vocabularya Technical difficulties and administration errors are excludedb On Flanker 274 of the sample needed the DEXT portion On DCCS 531 needed DEXT

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1235

difficulties would likely simplify the testing process and yieldmore comparable and reliable scores

PC showed adequate feasibility overall and good feasibility inFXS and OID with sufficient feasibility at a mental age of ge4years While the ICC was excellent there was a small butsignificant practice effect although smaller than that found inthe normative sample35 The most common reason for lack offeasibility on PC was an invalid alternating response patternespecially common in DS The task has an inherent challengeof understanding ldquosamerdquo and ldquodifferentrdquo while mapping thischoice onto smiley face and frown face options For partic-ipants without invalid response patterns PC performs well inID On the basis of the feasibility challenges we developeda new processing speed task Speeded Matching now avail-able as an experimental version in the app The task is to selectthe animal face among 3 foils that matches a target image

Future psychometric studies will provide more informationabout its performance

The Fluid Crystallized and Cognitive Function compositesdemonstrated reliability results similar to those of the nor-mative age 3 to 6 sample with small practice effects in Fluidand Cognitive Function composites (although smaller than inthe normative sample) Each composite was well correlatedwith FSIQ Although not all participants were able to receivea composite (due to missing valid scores on some tests) whencomplete the composites appear to perform well The de-viation method used to create these composites has a clearadvantage over the current age-corrected standard scores onwhich more than half of our sample obtained scores at thelower limit currently set at 54 We are conducting analyses toidentify the best option for composite scores below this floor(deviation approach vs extension of existing age-adjusted

Table 3 Test-retest reliability and examination of practice effects

Total Down syndrome Fragile X syndrome Other ID

No ICC (95 CI)Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend

Flanker 144 074(065ndash081)

009a 56 063(044ndash076)

020 37 084(070ndash091)

minus011 51 069(051ndash081)

015

FlankerDEXT

36 060(033ndash077)

minus001 8 073(018ndash094)

minus033 23 062(028ndash082)

minus005 5 046(minus037ndash092)

056

DCCS 97 071(060ndash080)

013 32 078(058ndash088)

018 23 041(001ndash069)

024 42 082(068ndash090)

003

DCCS DEXT 73 049(030ndash065)

minus016 32 046(014ndash069)

minus017 27 050(016ndash074)

minus013 14 053(003ndash082)

minus021

LS 113 074(064ndash081)

014 37 075(056ndash086)

024 32 065(040ndash081)

minus011 43 069(049ndash082)

022

PC 133 077(062ndash085)

029b 45 074(048ndash086)

039b 40 071(050ndash084)

021a 48 075(054ndash086)

035c

PSM A-A 60 047(025ndash064)

004 22 024(minus022ndash060)

minus014 15 054(007ndash082)

027 23 040(001ndash069)

011

PSM A-B 60 055(034ndash071)

029a 22 033(minus006ndash065)

022 18 054(014ndash080)

025 20 040(minus003ndash071)

034

PVT 186 085(081ndash089)

003 69 087(080ndash092)

008 57 079(066ndash087)

001 60 086(078ndash092)

minus002

OR 183 096(095ndash097)

002 70 095(092ndash097)

001 56 096(093ndash098)

minus001 57 098(097ndash099)

004

FC 97 083(067ndash091)

033b 28 084(052ndash093)

040b 27 079(056ndash090)

034a 42 077(055ndash088)

033c

CC 191 093(091ndash095)

000 73 092(087ndash095)

002 58 094(090ndash096)

minus003 60 091(085ndash094)

minus000

CFC 97 092(085ndash096)

022b 28 093(079ndash097)

023c 27 091(079ndash096)

027a 42 089(078ndash094)

019a

Abbreviations A-A = Form A at visit 1 and Form A at visit 2 A-B = Form A at visit 1 and Form B at visit 2 CC = Crystallized Composite CFC = Cognitive FunctionComposite CI = confidence interval DCCS = Dimensional Change Card Sort DEXT = Developmental Extension FC = Fluid Composite Flanker = FlankerInhibitory Control and Attention ICC = intraclass correlation ID = intellectual disability LS = List SortingWorkingMemory OR =Oral Reading and RecognitionPC = Pattern Comparison Processing Speed PSM = Picture Sequence Memory PVT = Picture Vocabularya p lt 005b p lt 0001c p lt 001

e1236 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 3: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

completion of the SB-5 the order of remaining assessments wasrandomized with the exception of the NIHTB-CB which wasthe first assessment of day 2 The order of the NIHTB-CB testswas randomized for each participant At visit 2 the NIHTB-CBwas readministered with the same test order within participants

The NIHTB-CBThe NIHTB-CB is a computerized assessment validated inages 3 to 85 years in the general population The batteryincludes 7 tests Dimensional Change Card Sort (DCCS)Flanker Inhibitory Control and Attention (Flanker) ListSorting Working Memory (LS) Pattern Comparison Pro-cessing Speed (PC) Picture Sequence Memory (PSM) Pic-ture Vocabulary (PVT) and Oral Reading Recognition(OR)915 The NIHTB-CB provided experimental De-velopmental Extension (DEXT) versions of DCCS andFlanker designed to be more accessible to lower-functioning orvery young participants Because tests have multiple age ver-sions the participantrsquos mental age derived from the SB-5 wasused to select test versions allowing for a starting point ofreasonable difficulty thereby reducing frustration and im-proving compliance The DEXT versions were used for par-ticipants in the 3- to 7-year mental age range For PVT andORthere is 1 computerized adaptive testing (CAT) version andthe start point is typically based on age (children) or education(adults) Instead we used the education override feature en-tering the grade equivalent of the mental age as the start point

In addition PSM has multiple forms available for each testversion Pilot reliability results suggested nonequivalence offorms To assess PSM reliability Form A was used at visit 1and for half of participants at visit 2 (PSM A-A) the other halfreceived Form B at visit 2 (PSM A-B) For LS pilot studiesdemonstrated that additional teaching items improved feasi-bility For the current study PowerPoint slides of theseteaching items were used before test items on LS Age 3ndash6The NIHTB-CB developers then released the LS Age 3ndash6Experimental version with these extended instructions duringthe study which was subsequently used

Convergent validity testsSix tests were preselected as convergent validity measures forthe NIHTB-CB The NEPSY Inhibition subtest16 (NEPSY-In) iPad version was used as the convergent measure forDCCS The NEPSY-In measures cognitive flexibility and in-hibitory control From piloting the NEPSY-In we found thatparticipants could rarely do the most difficult level (Switching)We thus administered only theNaming and Inhibition portionsand created a prorated score indicating the number of correctitems per minute For Flanker we used the Conners KiddieContinuous Performance Test 2nd Edition17 administered ona computer with a spacebar as the response button The hitreaction time SD was used as the convergent validity variableFor LS convergent validity the SB-5 verbal working memoryraw score was used PC validity was measured with theWechsler Preschool and Primary Scale of Intelligence 4thEdition18 Bug Search from the number of correct items per

minute The Leiter International Performance Scale 3rd Edi-tion19 Forward Memory (FM) subtest assesses sequentialmemory span The raw score was the convergent variable forPSM The Peabody Picture Vocabulary Test 4th edition20

(PPVT-4) measures receptive vocabulary The raw score fromthe PPVT-4 iPad version was used for PVT For OR theWoodcock Johnson 4th Edition21 Letter-Word Identificationwas used which measures letter recognition and single wordreading Discriminant measures for NIHTB-CB tests were se-lected out of these measures by choosing a feasible measure ofa different construct than the NIHTB-CB test

AdaptationsTo increase feasibility and to improve reliability and validity inID we developed a manual of standardized procedures re-garding the test environment and NIHTB-CB administrationthe ldquoNIH Toolbox Cognitive Battery Supplemental Admin-istratorrsquos Manual for Intellectual and Developmental Dis-abilitiesrdquo (e-Manual hereafter Supplemental Manual linkslwwcomWNLB58)22 Strategies to proactively improvefeasibility and to reduce participant stress included usinga visual schedule before the visit a caregiver questionnaire onbehaviors and potential reinforcement rewards and a visualtoken board of the NIHTB-CB for the participant to check offduring testing Best practices in administering standardizedassessments with appropriate accommodations for ID wereused23 Test-specific guidelines are available in the manual toaid future users of the NIHTB-CB in standard administrationand feasibility specifically for the ID population In additionthe Supplemental Manual includes the Administration Formthat we developed to document test environment behavioralresponses and validity of tests for each participant

Data cleaning and scoringAfter every administration of the NIHTB-CB the Adminis-tration Form was used to record whether each test was con-sidered valid for the participant All analyses used only validscores The most common reasons for invalid scores were aninvalid response pattern refusal and excessive prompting(30 078 and 072 of all scores respectively)

Before conducting analyses we visually inspected all data andbivariate correlations for normality and the presence of outliersAll NIHTB-CB tests were examined for floor or ceiling issuesOnly 2 tests had such issues At visit 1 21 individuals receiveda score at the floor on LS Age 3ndash6 33 received a score at theceiling on PSM Age 3ndash4 and 4 received a score at the floor onPSMAge 5ndash6 After a thorough review of administration detailsand reliability and validity analyses LS floored scores were keptin the analyses Because PSM has multiple age versions andthese participants likely should have received a harder or easierversion PSM scores at floor or ceiling were excluded

Data analysisSAS version 94 (SAS Institute Inc Cary NC) and R version360 (R Foundation for Statistical Computing Vienna Aus-tria) were used for analyses Visit 1 data were used for

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1231

feasibility and validity analyses For validity and reliabilityNIHTB-CB raw scores were used computed score on DCCSFlanker and PC theta score on PSM PVT and OR raw scoreonLS and FlankerDEXT and percent correct onDCCSDEXTBecause DEXT scores are currently on a different scale thanstandard Flanker and DCCS scores DEXT results are presentedseparately For test-retest reliability single-score intraclass cor-relations (ICCs) were used The Cohen d was used to evaluatepotential practice effects with paired-sample t tests to measurethe significance of change Convergent discriminant and eco-logic validities were measured with Pearson correlations

Our prior work on IQmeasurement demonstrated the utility ofdeviation-based scoring to deal with problematic floor effects inID24 We used this method in the current study in place ofNIHTB-CB standard scores to circumvent imposed flooredscores (eg the current NIHTB-CB winsorizes age-correctedstandard scores at 54) We created z scores by transformingparticipant raw scores on each NIHTB-CB test using normativemeans and SDs for their chronologic age band The z scoreswere used to create deviation-based composites following thepreviously defined criteria25 For a Crystallized Composite 1 of2 valid scores on PVT and OR was required for a FluidComposite 4 of 5 valid scores on Flanker DCCS LS PC andPSM were required The Crystallized and Fluid z scores wereused to create a Cognitive Function Composite used in eco-logic validity analyses Deviation scores were also created forFSIQ on the SB-524 and were used in FSIQ analyses Groupcomparisons to examine known-groups validity were assessedwith a 2-way mixed-model analysis of variance on NIHTB-CB zscores Significant results were followed up with the Tukeyhonest significant difference tests to examine group differences

Data availabilityOn request to the corresponding author anonymized data areavailable to share

ResultsDescriptive statisticsTable 1 presents descriptive statistics by diagnostic group andoverall Groups did not differ significantly by chronologic age(F2238 = 091 p = 041) or by VABS-3 Adaptive BehaviorComposite (F2227 = 150 p = 023) However FSIQ differedsignificantly by group (F2238 = 316 p lt 0001) with FSIQhigher in OID than in both DS [t(238) = 757 p lt 0001] andFXS [t(238) = 605 p lt 0001] FSIQ did not significantly differbetweenDS and FXS [t(238) = 123 p = 022] Similarly mentalage was significantly different by group (F2238 = 253 p lt 0001)withmental age higher inOID than in bothDS [t(238) = 676 plt 0001] and FXS [t(238) = 540 p lt 0001] Mental age did notdiffer significantly between DS and FXS [t(238) = 110 p =027] Figure 1A shows the distribution of the NIHTB-CBCognitive Function Composite age-adjusted standard scores ofthe sample without the current imposed floor illustrating thevariability of the sample below this floor and the benefit ofdeviation-based composites (used in all analyses)

FeasibilityFeasibility data are provided in table 2 as the percentage ofparticipants with valid scores on each test Feasibility overall wassimilar to the normative 3- to 15-year-old sample26 with DCCSPC and LS having slightly lower feasibility and Flanker PSMPVT and OR having similar or higher feasibility rates than thenormative sample Even down to a mental age of 3 years PSMPVT and OR feasibility was very good The feasibility of theremaining tests improved particularly at 5 years All tests werefeasible for nearly every participant with amental age ofge6 years

Test-retest reliabilityTest-retest reliability was assessed after asymp1 month (mean = 317days SD = 63 days) (table 3) ICCs on each test were moderateto strong with the exception of DCCS DEXT and PSM A-Awhichwere in the high 040s All composites had strong reliabilityThree tests had small but significant visit 1 to visit 2 effect sizesreflecting amodest increase in performance Flanker PC and thePSM A-B group However the PSM A-B increase likely reflectsnonequivalent forms rather than a practice effect because thePSM A-A group had no practice effect The Fluid and CognitiveFunction composites also had small but significant increasessuggestive of small practice effects Within groups reliabilitycoefficients were mostly moderate to strong some DEXT andPSM reliabilities in small sample sizes were not significantFlanker DEXT in OID (ICC = 046 p = 014) and both PSMgroups in DS (A-A ICC = 024 p = 015 A-B ICC = 033 p =005) In FXS DCCS reliability (ICC = 041) was notably lowerthan in the total sample (ICC = 071) but FXS Flanker reliability(ICC= 084) was stronger than in the total sample (ICC= 074)

Construct validityConvergent and discriminant validity results are presented intable 4 Convergent correlations ranged from moderate tostrong with the exception of Flanker DEXT (n = 29 r =minus026 p = 017) Group results generally reflect validitysimilar to that of the overall results Of note PSM was onlyweakly correlated with Leiter-FM in DS (r = 033 p = 001)and in OID (r = 032 p = 001)

In DCCS for a subgroup of participants below a raw score of188 there appeared to be no association with the NEPSY-Inin this subgroup DCCS score was not significantly correlatedwith NEPSY-In score (r = 029 p = 016) This DCCS scorerepresents participants who did not pass the introductoryswitching portion of the test When this subgroup was re-moved validity improved (r = 057 p lt 0001)

Ecologic validityTable 5 provides the ecologic validity of NIHTB-CB tests andcomposites The composites each had moderate to strongcorrelations with FSIQ (figure 1B) as did all NIHTB-CB testscores other than Flanker DEXT VABS-3 Adaptive BehaviorComposite had small but significant correlations with severaltests (Flanker PC PSM PVT and OR) and with the Crys-tallized and Cognitive Function composites with better per-formance associated with higher levels of adaptive behavior

e1232 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Syndrome-specific comparisons (known-groups validity)To examine the specificity of the NIHTB-CB to detectsyndrome-specific performance a 2-way mixed-model analysisof variance was conducted on NIHTB-CB test z scores withgroup as a between-participants factor and NIHTB-CB test asa within-participants factor we also examined their interaction(figure 2) IQ was included as a repeated-measures varyingcovariate and an IQ-by-test interaction term was included toallow the effect of IQ to vary by test Because groups differed onFSIQ covarying IQ aimed to clarify whether group results re-flect phenotype-specific impairments or if they simply reflectglobally poorer performance due to overall level of cognitive

functioning To avoid overcontrolling for the domain ofinterest (because NIHTB-CB domains overlap with com-ponents of IQ) verbal IQ was used as the covariate for thefluid NIHTB-CB test outcomes (DCCS Flanker LS PCand PSM) and nonverbal IQ was used as the covariate forcrystallized NIHTB-CB tests (PVT and OR) To obtaineffect sizes for pairwise comparisons the Cohen d was cal-culated from the estimated marginal means from the modelto account for the effects of IQ

There was a main effect of group on NIHTB-CB z scores(F2238 = 490 p = 0008) as well as a main effect of test on zscores (F61067 = 1726 p lt 0001) These were qualified by

Table 1 Participant descriptive information

Total(n = 242)

Down syndrome(n = 91)

Fragile X syndrome(n = 75)

Other ID(n = 76)

Race n

American IndianAlaskaNative

3 1 1 1

Asian 6 2 2 2

Black 24 4 8 12

Native HawaiianPacificIslander

3 1 1 1

White 171 70 60 41

gt1 26 10 2 14

Ethnicity Hispanic orLatino

199 187 80 329

Sex male 593 451 733 632

Primary caregiver 4-ydegree

618 648 627 566

ASD diagnosis (parentreport)

353(8 unknown)

66(0 unknown)

520(6 unknown)

526(1 unknown)

M SD M SD M SD M SD

Chronologic age y 1571 515 1588 517 1616 492 1505 535

Mental age (SB-5) y 520 153 467 120 491 135 612 164

Full-scale IQ (SB-5) 5375 1587 4765 1290 5048 1513 6424 1469

Vineland-3 ABC score 5259 1711 5493 1623 5024 1863 5193 1654

DCCS computed score 381 251 333 224 320 226 468 271

Flanker computed score 492 236 454 223 446 244 571 228

LS raw score 630 441 430 363 602 363 834 475

PC computed score 3307 1593 2592 1374 3310 1408 3984 1674

PSM theta score minus158 090 minus186 076 minus174 087 minus109 088

PVT theta score minus291 280 minus379 273 minus270 257 minus209 283

OR theta score minus584 496 minus662 529 minus621 453 minus460 478

Abbreviations ABC = Adaptive Behavior Composite ASD = autism spectrum disorder DCCS = Dimensional Change Card Sort Flanker = Flanker InhibitoryControl and Attention ID = intellectual disability LS = List Sorting WorkingMemory M =mean OR = Oral Reading and Recognition PC = Pattern ComparisonProcessing Speed PSM = Picture Sequence Memory PVT = Picture Vocabulary SB-5 = Stanford-Binet 5th edition

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1233

a significant group times test interaction (F121067 = 368 p lt0001) and a significant IQ times test interaction (F61067 = 672 plt 0001) Follow up Tukey tests showed that FXS performedworse onDCCS thanOID [t(238) = 402 p lt 0001 d = 052]and worse than DS [t(238) = minus304 p = 0007 d = 039] OnFlanker FXS performed worse than OID [t(238) = 345 p =0002 d = 045] and worse than DS [t(238) = minus285 p = 001d = 037] These 2 EF test results supported the hypothesizedEF impairment in FXS although the DS impairment relativeto OID was not supported On PVT FXS performed betterthan DS fitting with expected language strength in FXS[t(238) = minus277 p = 002 d = 036] On PC DS showeda poorer performance than OID which approached

significance [t(238) = 224 p = 007 d = 029] There were nosignificant group differences on LS PSM or OR

DiscussionThis study provides the first comprehensive examination of thepsychometric properties and feasibility of the NIHTB-CB forindividuals with ID with an initial focus on 2 of the mostcommon genetic causes with robust translational researchprograms FXS and DS Overall the NIHTB-CB has demon-strated strong potential for use as an objective standardizedoutcome measure that can be confidently used in ID trials withparticipants with amental age of 5 years or higher Results of thestudy demonstrate very strong psychometrics for the Crystal-lized reasoning tests (PVT and OR) and good to excellentperformance of Fluid reasoning tests withmore variation acrossFXS DS and OID groups for some measures (eg strongreliability for Flanker in FXS compared toDS and vice versa forDCCS) Indeed the Fluid Composite appeared to have moreconsistently strong reliability across conditions and a solidconvergent association with FSIQ Thus it may be a goodcandidate outcome measure for studies seeking to examinebroad nonverbal cognitive changes for individuals with a mentalage of ge5 years Below a mental age of 5 years feasibility wasmore variable across tests indicating the need for furtheradaptations scoring algorithms on developmental extensionsor new tests targeting these lower-functioning individuals

The Supplemental Manual was compiled after hundreds ofadministrations of the NIHTB-CB to individuals with ID andthe study results on feasibility reliability and validity supportits use We encourage researchers and examiners planning touse the NIHTB-CB to follow these guidelines for the IDpopulation

Group comparisons demonstrated that the NIHTB-CB issensitive to substantial EF deficits among individuals with IDin that all 3 groups performed relatively poorly on Flanker andDCCS In particular participants with FXS showed weaknessin inhibitory control and attention and cognitive flexibility inexcess of their general cognitive level and compared to con-trols with other forms of ID This aligns with previous re-search showing that boys with FXS are impaired in inhibitorycontrol set shifting and planning relative to mental agendashmatched controls27 and that boys with FXS have impairmentsin inhibition and attention relative to mental agendashmatchedcontrols and relative to children with DS28 The hypothesizedEF weakness in DS compared to OID (to a lesser extent thanFXS) was not found however there have been some mixedresults on EF in children with DS compared to children withother IDs2930 The low-scoring subgroup on DCCS are thosewho failed introduction to the switching portion of the testfor participants who cannot perform switching at the earliestlevel the test may be less sensitive to variation in EF Re-moving this subgroup from analyses strengthened the con-vergent validity correlation This suggests that these

Figure 1 NIHTB-CB cognitive function composite scoredistribution and association with FSIQ

(A) Histogram showing the true distribution of cognitive function compositescores in the full sample of individuals with intellectual disability (ID) TheNIH Toolbox Cognitive Battery (NIHTB-CB) current floor (winsorized at 54vertical blue line) excludedmore than half of the present sample frommoreaccurate measurement below that level (B) Association between the de-viation-based cognitive function composite and deviation-based full-scaleIQ (FSIQ) by group

e1234 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

individualsrsquo level or lack of cognitive flexibility may not becaptured by the introductory switching portion of DCCS Anextension of DCCS that does capture this construct in veryyoung or low-functioning persons would have much valueThe lower reliability of DCCS in the FXS group may alsoreflect this limitation in that participants with FXS overallscored very low on this test Because anxiety and hyperactivityare common in FXS it is possible that individual state inter-acts with the task complexity and cognitive flexibility to resultin more variable performance over time in this group Whenparticipants needed Flanker DEXT or DCCS DEXT theywere almost always able to perform the tests however a fewissues remain to be worked out regarding these experimentalversions notably the ability to interpret these scores relativeto the standard Flanker and DCCS scores This study high-lighted other concerns with the DEXT measures (eg diffi-culty may be ordered incorrectly or portions are overlyburdensome) Our results provide clear evidence that DEXTlevels are necessary (and extremely feasible) especially inthose with FXS and DS but that further refinements andmodifications are necessary

To improve feasibility of LS extra instructions and practiceitems were developed and used The LS Age 3ndash6 experimentalversion with these additions is now available Feasibility didimprove after our initial pilot studies however the testremains challenging for this population and some participantspass practice but get no test sequences completely correctThe limited feasibility and floored or low variability in rawscores suggest that a lower range of LS is necessary A po-tential approach may be to give some credit on earlysequences that are partially recalled or items recalled out ofsequence because the construct of working memory builds onmore basic short-term memory (eg SB-5 Working Memoryindex and Wechsler Working Memory indices)1331

Both language tests PVT and OR had excellent performancein our samples of individuals with ID Both demonstratedstrong reliability and clear domain specificity with a muchhigher convergent than discriminant correlation TheNIHTB-CB was sensitive to expected language characteristicsin that DS was impaired relative to FXS on PVT32 These testshave an advantage over other language measures such as thePPVT-4 in that PVT and OR are brief (asymp3 minutes each) butaccurate owing to CAT the ability to obtain results witha brief assessment is especially important in individuals withfrequent behavior or attention issues such as in those with ID

Episodic memory is relevant to clinical trials particularly forDS in which memory impairments are well documented33

and FXS in which memory of sequential information is es-pecially impaired34 It is important to emphasize that despitethese known weaknesses PSM was among the highest of thetest scores in each group (figure 2) suggesting that individualsmay have compensatory strategies for performing well per-haps such as use of the contextual information in the storiesReliability was moderate and lower than that of the normative3- 15-year-old study although improved from our pilot studyThe significant effect size in the PSM A-B group suggestsnonequivalence of Forms A and B Notably in DS ICCs weresmall and nonsignificant suggesting that in its present formPSM appears contraindicated as a separate outcome measurefor this population

The high rate of ceiling scores on PSM 3ndash4 suggests thatmental age may not be a good indicator of start point on PSMin ID at least at this mental age level It is also possible thatthis age version is too easy in comparison to PSM scores onolder versions perhaps a personrsquos score on 1 PSM version isnot equivalent to that personrsquos score on another De-velopment of a CAT PSM test with the full range of version

Table 2 At visit 1 proportion of participants who received a valid scorea

Total n () Down syndrome n () Fragile X syndrome n () Other ID n ()

Flanker 195 (806) 78 (867) 50 (667) 67 (882)

Flanker (with DEXT)b 239 (988) 89 (989) 75 (987) 75 (987)

DCCS 152 (636) 55 (611) 40 (541) 57 (760)

DCCS (with DEXT)b 232 (971) 87 (967) 72 (973) 73 (973)

LS 164 (689) 57 (648) 45 (600) 62 (827)

PC 179 (752) 59 (648) 58 (801) 62 (827)

PSM 220 (921) 85 (944) 65 (867) 70 (933)

PVT 237 (983) 88 (978) 73 (973) 76 (1000)

OR 233 (979) 87 (978) 72 (973) 74 (987)

Abbreviations DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Flanker = Flanker Inhibitory Control and Attention ID = intellectualdisability LS = List Sorting Working Memory OR = Oral Reading and Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PVT = Picture Vocabularya Technical difficulties and administration errors are excludedb On Flanker 274 of the sample needed the DEXT portion On DCCS 531 needed DEXT

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1235

difficulties would likely simplify the testing process and yieldmore comparable and reliable scores

PC showed adequate feasibility overall and good feasibility inFXS and OID with sufficient feasibility at a mental age of ge4years While the ICC was excellent there was a small butsignificant practice effect although smaller than that found inthe normative sample35 The most common reason for lack offeasibility on PC was an invalid alternating response patternespecially common in DS The task has an inherent challengeof understanding ldquosamerdquo and ldquodifferentrdquo while mapping thischoice onto smiley face and frown face options For partic-ipants without invalid response patterns PC performs well inID On the basis of the feasibility challenges we developeda new processing speed task Speeded Matching now avail-able as an experimental version in the app The task is to selectthe animal face among 3 foils that matches a target image

Future psychometric studies will provide more informationabout its performance

The Fluid Crystallized and Cognitive Function compositesdemonstrated reliability results similar to those of the nor-mative age 3 to 6 sample with small practice effects in Fluidand Cognitive Function composites (although smaller than inthe normative sample) Each composite was well correlatedwith FSIQ Although not all participants were able to receivea composite (due to missing valid scores on some tests) whencomplete the composites appear to perform well The de-viation method used to create these composites has a clearadvantage over the current age-corrected standard scores onwhich more than half of our sample obtained scores at thelower limit currently set at 54 We are conducting analyses toidentify the best option for composite scores below this floor(deviation approach vs extension of existing age-adjusted

Table 3 Test-retest reliability and examination of practice effects

Total Down syndrome Fragile X syndrome Other ID

No ICC (95 CI)Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend

Flanker 144 074(065ndash081)

009a 56 063(044ndash076)

020 37 084(070ndash091)

minus011 51 069(051ndash081)

015

FlankerDEXT

36 060(033ndash077)

minus001 8 073(018ndash094)

minus033 23 062(028ndash082)

minus005 5 046(minus037ndash092)

056

DCCS 97 071(060ndash080)

013 32 078(058ndash088)

018 23 041(001ndash069)

024 42 082(068ndash090)

003

DCCS DEXT 73 049(030ndash065)

minus016 32 046(014ndash069)

minus017 27 050(016ndash074)

minus013 14 053(003ndash082)

minus021

LS 113 074(064ndash081)

014 37 075(056ndash086)

024 32 065(040ndash081)

minus011 43 069(049ndash082)

022

PC 133 077(062ndash085)

029b 45 074(048ndash086)

039b 40 071(050ndash084)

021a 48 075(054ndash086)

035c

PSM A-A 60 047(025ndash064)

004 22 024(minus022ndash060)

minus014 15 054(007ndash082)

027 23 040(001ndash069)

011

PSM A-B 60 055(034ndash071)

029a 22 033(minus006ndash065)

022 18 054(014ndash080)

025 20 040(minus003ndash071)

034

PVT 186 085(081ndash089)

003 69 087(080ndash092)

008 57 079(066ndash087)

001 60 086(078ndash092)

minus002

OR 183 096(095ndash097)

002 70 095(092ndash097)

001 56 096(093ndash098)

minus001 57 098(097ndash099)

004

FC 97 083(067ndash091)

033b 28 084(052ndash093)

040b 27 079(056ndash090)

034a 42 077(055ndash088)

033c

CC 191 093(091ndash095)

000 73 092(087ndash095)

002 58 094(090ndash096)

minus003 60 091(085ndash094)

minus000

CFC 97 092(085ndash096)

022b 28 093(079ndash097)

023c 27 091(079ndash096)

027a 42 089(078ndash094)

019a

Abbreviations A-A = Form A at visit 1 and Form A at visit 2 A-B = Form A at visit 1 and Form B at visit 2 CC = Crystallized Composite CFC = Cognitive FunctionComposite CI = confidence interval DCCS = Dimensional Change Card Sort DEXT = Developmental Extension FC = Fluid Composite Flanker = FlankerInhibitory Control and Attention ICC = intraclass correlation ID = intellectual disability LS = List SortingWorkingMemory OR =Oral Reading and RecognitionPC = Pattern Comparison Processing Speed PSM = Picture Sequence Memory PVT = Picture Vocabularya p lt 005b p lt 0001c p lt 001

e1236 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 4: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

feasibility and validity analyses For validity and reliabilityNIHTB-CB raw scores were used computed score on DCCSFlanker and PC theta score on PSM PVT and OR raw scoreonLS and FlankerDEXT and percent correct onDCCSDEXTBecause DEXT scores are currently on a different scale thanstandard Flanker and DCCS scores DEXT results are presentedseparately For test-retest reliability single-score intraclass cor-relations (ICCs) were used The Cohen d was used to evaluatepotential practice effects with paired-sample t tests to measurethe significance of change Convergent discriminant and eco-logic validities were measured with Pearson correlations

Our prior work on IQmeasurement demonstrated the utility ofdeviation-based scoring to deal with problematic floor effects inID24 We used this method in the current study in place ofNIHTB-CB standard scores to circumvent imposed flooredscores (eg the current NIHTB-CB winsorizes age-correctedstandard scores at 54) We created z scores by transformingparticipant raw scores on each NIHTB-CB test using normativemeans and SDs for their chronologic age band The z scoreswere used to create deviation-based composites following thepreviously defined criteria25 For a Crystallized Composite 1 of2 valid scores on PVT and OR was required for a FluidComposite 4 of 5 valid scores on Flanker DCCS LS PC andPSM were required The Crystallized and Fluid z scores wereused to create a Cognitive Function Composite used in eco-logic validity analyses Deviation scores were also created forFSIQ on the SB-524 and were used in FSIQ analyses Groupcomparisons to examine known-groups validity were assessedwith a 2-way mixed-model analysis of variance on NIHTB-CB zscores Significant results were followed up with the Tukeyhonest significant difference tests to examine group differences

Data availabilityOn request to the corresponding author anonymized data areavailable to share

ResultsDescriptive statisticsTable 1 presents descriptive statistics by diagnostic group andoverall Groups did not differ significantly by chronologic age(F2238 = 091 p = 041) or by VABS-3 Adaptive BehaviorComposite (F2227 = 150 p = 023) However FSIQ differedsignificantly by group (F2238 = 316 p lt 0001) with FSIQhigher in OID than in both DS [t(238) = 757 p lt 0001] andFXS [t(238) = 605 p lt 0001] FSIQ did not significantly differbetweenDS and FXS [t(238) = 123 p = 022] Similarly mentalage was significantly different by group (F2238 = 253 p lt 0001)withmental age higher inOID than in bothDS [t(238) = 676 plt 0001] and FXS [t(238) = 540 p lt 0001] Mental age did notdiffer significantly between DS and FXS [t(238) = 110 p =027] Figure 1A shows the distribution of the NIHTB-CBCognitive Function Composite age-adjusted standard scores ofthe sample without the current imposed floor illustrating thevariability of the sample below this floor and the benefit ofdeviation-based composites (used in all analyses)

FeasibilityFeasibility data are provided in table 2 as the percentage ofparticipants with valid scores on each test Feasibility overall wassimilar to the normative 3- to 15-year-old sample26 with DCCSPC and LS having slightly lower feasibility and Flanker PSMPVT and OR having similar or higher feasibility rates than thenormative sample Even down to a mental age of 3 years PSMPVT and OR feasibility was very good The feasibility of theremaining tests improved particularly at 5 years All tests werefeasible for nearly every participant with amental age ofge6 years

Test-retest reliabilityTest-retest reliability was assessed after asymp1 month (mean = 317days SD = 63 days) (table 3) ICCs on each test were moderateto strong with the exception of DCCS DEXT and PSM A-Awhichwere in the high 040s All composites had strong reliabilityThree tests had small but significant visit 1 to visit 2 effect sizesreflecting amodest increase in performance Flanker PC and thePSM A-B group However the PSM A-B increase likely reflectsnonequivalent forms rather than a practice effect because thePSM A-A group had no practice effect The Fluid and CognitiveFunction composites also had small but significant increasessuggestive of small practice effects Within groups reliabilitycoefficients were mostly moderate to strong some DEXT andPSM reliabilities in small sample sizes were not significantFlanker DEXT in OID (ICC = 046 p = 014) and both PSMgroups in DS (A-A ICC = 024 p = 015 A-B ICC = 033 p =005) In FXS DCCS reliability (ICC = 041) was notably lowerthan in the total sample (ICC = 071) but FXS Flanker reliability(ICC= 084) was stronger than in the total sample (ICC= 074)

Construct validityConvergent and discriminant validity results are presented intable 4 Convergent correlations ranged from moderate tostrong with the exception of Flanker DEXT (n = 29 r =minus026 p = 017) Group results generally reflect validitysimilar to that of the overall results Of note PSM was onlyweakly correlated with Leiter-FM in DS (r = 033 p = 001)and in OID (r = 032 p = 001)

In DCCS for a subgroup of participants below a raw score of188 there appeared to be no association with the NEPSY-Inin this subgroup DCCS score was not significantly correlatedwith NEPSY-In score (r = 029 p = 016) This DCCS scorerepresents participants who did not pass the introductoryswitching portion of the test When this subgroup was re-moved validity improved (r = 057 p lt 0001)

Ecologic validityTable 5 provides the ecologic validity of NIHTB-CB tests andcomposites The composites each had moderate to strongcorrelations with FSIQ (figure 1B) as did all NIHTB-CB testscores other than Flanker DEXT VABS-3 Adaptive BehaviorComposite had small but significant correlations with severaltests (Flanker PC PSM PVT and OR) and with the Crys-tallized and Cognitive Function composites with better per-formance associated with higher levels of adaptive behavior

e1232 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Syndrome-specific comparisons (known-groups validity)To examine the specificity of the NIHTB-CB to detectsyndrome-specific performance a 2-way mixed-model analysisof variance was conducted on NIHTB-CB test z scores withgroup as a between-participants factor and NIHTB-CB test asa within-participants factor we also examined their interaction(figure 2) IQ was included as a repeated-measures varyingcovariate and an IQ-by-test interaction term was included toallow the effect of IQ to vary by test Because groups differed onFSIQ covarying IQ aimed to clarify whether group results re-flect phenotype-specific impairments or if they simply reflectglobally poorer performance due to overall level of cognitive

functioning To avoid overcontrolling for the domain ofinterest (because NIHTB-CB domains overlap with com-ponents of IQ) verbal IQ was used as the covariate for thefluid NIHTB-CB test outcomes (DCCS Flanker LS PCand PSM) and nonverbal IQ was used as the covariate forcrystallized NIHTB-CB tests (PVT and OR) To obtaineffect sizes for pairwise comparisons the Cohen d was cal-culated from the estimated marginal means from the modelto account for the effects of IQ

There was a main effect of group on NIHTB-CB z scores(F2238 = 490 p = 0008) as well as a main effect of test on zscores (F61067 = 1726 p lt 0001) These were qualified by

Table 1 Participant descriptive information

Total(n = 242)

Down syndrome(n = 91)

Fragile X syndrome(n = 75)

Other ID(n = 76)

Race n

American IndianAlaskaNative

3 1 1 1

Asian 6 2 2 2

Black 24 4 8 12

Native HawaiianPacificIslander

3 1 1 1

White 171 70 60 41

gt1 26 10 2 14

Ethnicity Hispanic orLatino

199 187 80 329

Sex male 593 451 733 632

Primary caregiver 4-ydegree

618 648 627 566

ASD diagnosis (parentreport)

353(8 unknown)

66(0 unknown)

520(6 unknown)

526(1 unknown)

M SD M SD M SD M SD

Chronologic age y 1571 515 1588 517 1616 492 1505 535

Mental age (SB-5) y 520 153 467 120 491 135 612 164

Full-scale IQ (SB-5) 5375 1587 4765 1290 5048 1513 6424 1469

Vineland-3 ABC score 5259 1711 5493 1623 5024 1863 5193 1654

DCCS computed score 381 251 333 224 320 226 468 271

Flanker computed score 492 236 454 223 446 244 571 228

LS raw score 630 441 430 363 602 363 834 475

PC computed score 3307 1593 2592 1374 3310 1408 3984 1674

PSM theta score minus158 090 minus186 076 minus174 087 minus109 088

PVT theta score minus291 280 minus379 273 minus270 257 minus209 283

OR theta score minus584 496 minus662 529 minus621 453 minus460 478

Abbreviations ABC = Adaptive Behavior Composite ASD = autism spectrum disorder DCCS = Dimensional Change Card Sort Flanker = Flanker InhibitoryControl and Attention ID = intellectual disability LS = List Sorting WorkingMemory M =mean OR = Oral Reading and Recognition PC = Pattern ComparisonProcessing Speed PSM = Picture Sequence Memory PVT = Picture Vocabulary SB-5 = Stanford-Binet 5th edition

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1233

a significant group times test interaction (F121067 = 368 p lt0001) and a significant IQ times test interaction (F61067 = 672 plt 0001) Follow up Tukey tests showed that FXS performedworse onDCCS thanOID [t(238) = 402 p lt 0001 d = 052]and worse than DS [t(238) = minus304 p = 0007 d = 039] OnFlanker FXS performed worse than OID [t(238) = 345 p =0002 d = 045] and worse than DS [t(238) = minus285 p = 001d = 037] These 2 EF test results supported the hypothesizedEF impairment in FXS although the DS impairment relativeto OID was not supported On PVT FXS performed betterthan DS fitting with expected language strength in FXS[t(238) = minus277 p = 002 d = 036] On PC DS showeda poorer performance than OID which approached

significance [t(238) = 224 p = 007 d = 029] There were nosignificant group differences on LS PSM or OR

DiscussionThis study provides the first comprehensive examination of thepsychometric properties and feasibility of the NIHTB-CB forindividuals with ID with an initial focus on 2 of the mostcommon genetic causes with robust translational researchprograms FXS and DS Overall the NIHTB-CB has demon-strated strong potential for use as an objective standardizedoutcome measure that can be confidently used in ID trials withparticipants with amental age of 5 years or higher Results of thestudy demonstrate very strong psychometrics for the Crystal-lized reasoning tests (PVT and OR) and good to excellentperformance of Fluid reasoning tests withmore variation acrossFXS DS and OID groups for some measures (eg strongreliability for Flanker in FXS compared toDS and vice versa forDCCS) Indeed the Fluid Composite appeared to have moreconsistently strong reliability across conditions and a solidconvergent association with FSIQ Thus it may be a goodcandidate outcome measure for studies seeking to examinebroad nonverbal cognitive changes for individuals with a mentalage of ge5 years Below a mental age of 5 years feasibility wasmore variable across tests indicating the need for furtheradaptations scoring algorithms on developmental extensionsor new tests targeting these lower-functioning individuals

The Supplemental Manual was compiled after hundreds ofadministrations of the NIHTB-CB to individuals with ID andthe study results on feasibility reliability and validity supportits use We encourage researchers and examiners planning touse the NIHTB-CB to follow these guidelines for the IDpopulation

Group comparisons demonstrated that the NIHTB-CB issensitive to substantial EF deficits among individuals with IDin that all 3 groups performed relatively poorly on Flanker andDCCS In particular participants with FXS showed weaknessin inhibitory control and attention and cognitive flexibility inexcess of their general cognitive level and compared to con-trols with other forms of ID This aligns with previous re-search showing that boys with FXS are impaired in inhibitorycontrol set shifting and planning relative to mental agendashmatched controls27 and that boys with FXS have impairmentsin inhibition and attention relative to mental agendashmatchedcontrols and relative to children with DS28 The hypothesizedEF weakness in DS compared to OID (to a lesser extent thanFXS) was not found however there have been some mixedresults on EF in children with DS compared to children withother IDs2930 The low-scoring subgroup on DCCS are thosewho failed introduction to the switching portion of the testfor participants who cannot perform switching at the earliestlevel the test may be less sensitive to variation in EF Re-moving this subgroup from analyses strengthened the con-vergent validity correlation This suggests that these

Figure 1 NIHTB-CB cognitive function composite scoredistribution and association with FSIQ

(A) Histogram showing the true distribution of cognitive function compositescores in the full sample of individuals with intellectual disability (ID) TheNIH Toolbox Cognitive Battery (NIHTB-CB) current floor (winsorized at 54vertical blue line) excludedmore than half of the present sample frommoreaccurate measurement below that level (B) Association between the de-viation-based cognitive function composite and deviation-based full-scaleIQ (FSIQ) by group

e1234 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

individualsrsquo level or lack of cognitive flexibility may not becaptured by the introductory switching portion of DCCS Anextension of DCCS that does capture this construct in veryyoung or low-functioning persons would have much valueThe lower reliability of DCCS in the FXS group may alsoreflect this limitation in that participants with FXS overallscored very low on this test Because anxiety and hyperactivityare common in FXS it is possible that individual state inter-acts with the task complexity and cognitive flexibility to resultin more variable performance over time in this group Whenparticipants needed Flanker DEXT or DCCS DEXT theywere almost always able to perform the tests however a fewissues remain to be worked out regarding these experimentalversions notably the ability to interpret these scores relativeto the standard Flanker and DCCS scores This study high-lighted other concerns with the DEXT measures (eg diffi-culty may be ordered incorrectly or portions are overlyburdensome) Our results provide clear evidence that DEXTlevels are necessary (and extremely feasible) especially inthose with FXS and DS but that further refinements andmodifications are necessary

To improve feasibility of LS extra instructions and practiceitems were developed and used The LS Age 3ndash6 experimentalversion with these additions is now available Feasibility didimprove after our initial pilot studies however the testremains challenging for this population and some participantspass practice but get no test sequences completely correctThe limited feasibility and floored or low variability in rawscores suggest that a lower range of LS is necessary A po-tential approach may be to give some credit on earlysequences that are partially recalled or items recalled out ofsequence because the construct of working memory builds onmore basic short-term memory (eg SB-5 Working Memoryindex and Wechsler Working Memory indices)1331

Both language tests PVT and OR had excellent performancein our samples of individuals with ID Both demonstratedstrong reliability and clear domain specificity with a muchhigher convergent than discriminant correlation TheNIHTB-CB was sensitive to expected language characteristicsin that DS was impaired relative to FXS on PVT32 These testshave an advantage over other language measures such as thePPVT-4 in that PVT and OR are brief (asymp3 minutes each) butaccurate owing to CAT the ability to obtain results witha brief assessment is especially important in individuals withfrequent behavior or attention issues such as in those with ID

Episodic memory is relevant to clinical trials particularly forDS in which memory impairments are well documented33

and FXS in which memory of sequential information is es-pecially impaired34 It is important to emphasize that despitethese known weaknesses PSM was among the highest of thetest scores in each group (figure 2) suggesting that individualsmay have compensatory strategies for performing well per-haps such as use of the contextual information in the storiesReliability was moderate and lower than that of the normative3- 15-year-old study although improved from our pilot studyThe significant effect size in the PSM A-B group suggestsnonequivalence of Forms A and B Notably in DS ICCs weresmall and nonsignificant suggesting that in its present formPSM appears contraindicated as a separate outcome measurefor this population

The high rate of ceiling scores on PSM 3ndash4 suggests thatmental age may not be a good indicator of start point on PSMin ID at least at this mental age level It is also possible thatthis age version is too easy in comparison to PSM scores onolder versions perhaps a personrsquos score on 1 PSM version isnot equivalent to that personrsquos score on another De-velopment of a CAT PSM test with the full range of version

Table 2 At visit 1 proportion of participants who received a valid scorea

Total n () Down syndrome n () Fragile X syndrome n () Other ID n ()

Flanker 195 (806) 78 (867) 50 (667) 67 (882)

Flanker (with DEXT)b 239 (988) 89 (989) 75 (987) 75 (987)

DCCS 152 (636) 55 (611) 40 (541) 57 (760)

DCCS (with DEXT)b 232 (971) 87 (967) 72 (973) 73 (973)

LS 164 (689) 57 (648) 45 (600) 62 (827)

PC 179 (752) 59 (648) 58 (801) 62 (827)

PSM 220 (921) 85 (944) 65 (867) 70 (933)

PVT 237 (983) 88 (978) 73 (973) 76 (1000)

OR 233 (979) 87 (978) 72 (973) 74 (987)

Abbreviations DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Flanker = Flanker Inhibitory Control and Attention ID = intellectualdisability LS = List Sorting Working Memory OR = Oral Reading and Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PVT = Picture Vocabularya Technical difficulties and administration errors are excludedb On Flanker 274 of the sample needed the DEXT portion On DCCS 531 needed DEXT

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1235

difficulties would likely simplify the testing process and yieldmore comparable and reliable scores

PC showed adequate feasibility overall and good feasibility inFXS and OID with sufficient feasibility at a mental age of ge4years While the ICC was excellent there was a small butsignificant practice effect although smaller than that found inthe normative sample35 The most common reason for lack offeasibility on PC was an invalid alternating response patternespecially common in DS The task has an inherent challengeof understanding ldquosamerdquo and ldquodifferentrdquo while mapping thischoice onto smiley face and frown face options For partic-ipants without invalid response patterns PC performs well inID On the basis of the feasibility challenges we developeda new processing speed task Speeded Matching now avail-able as an experimental version in the app The task is to selectthe animal face among 3 foils that matches a target image

Future psychometric studies will provide more informationabout its performance

The Fluid Crystallized and Cognitive Function compositesdemonstrated reliability results similar to those of the nor-mative age 3 to 6 sample with small practice effects in Fluidand Cognitive Function composites (although smaller than inthe normative sample) Each composite was well correlatedwith FSIQ Although not all participants were able to receivea composite (due to missing valid scores on some tests) whencomplete the composites appear to perform well The de-viation method used to create these composites has a clearadvantage over the current age-corrected standard scores onwhich more than half of our sample obtained scores at thelower limit currently set at 54 We are conducting analyses toidentify the best option for composite scores below this floor(deviation approach vs extension of existing age-adjusted

Table 3 Test-retest reliability and examination of practice effects

Total Down syndrome Fragile X syndrome Other ID

No ICC (95 CI)Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend

Flanker 144 074(065ndash081)

009a 56 063(044ndash076)

020 37 084(070ndash091)

minus011 51 069(051ndash081)

015

FlankerDEXT

36 060(033ndash077)

minus001 8 073(018ndash094)

minus033 23 062(028ndash082)

minus005 5 046(minus037ndash092)

056

DCCS 97 071(060ndash080)

013 32 078(058ndash088)

018 23 041(001ndash069)

024 42 082(068ndash090)

003

DCCS DEXT 73 049(030ndash065)

minus016 32 046(014ndash069)

minus017 27 050(016ndash074)

minus013 14 053(003ndash082)

minus021

LS 113 074(064ndash081)

014 37 075(056ndash086)

024 32 065(040ndash081)

minus011 43 069(049ndash082)

022

PC 133 077(062ndash085)

029b 45 074(048ndash086)

039b 40 071(050ndash084)

021a 48 075(054ndash086)

035c

PSM A-A 60 047(025ndash064)

004 22 024(minus022ndash060)

minus014 15 054(007ndash082)

027 23 040(001ndash069)

011

PSM A-B 60 055(034ndash071)

029a 22 033(minus006ndash065)

022 18 054(014ndash080)

025 20 040(minus003ndash071)

034

PVT 186 085(081ndash089)

003 69 087(080ndash092)

008 57 079(066ndash087)

001 60 086(078ndash092)

minus002

OR 183 096(095ndash097)

002 70 095(092ndash097)

001 56 096(093ndash098)

minus001 57 098(097ndash099)

004

FC 97 083(067ndash091)

033b 28 084(052ndash093)

040b 27 079(056ndash090)

034a 42 077(055ndash088)

033c

CC 191 093(091ndash095)

000 73 092(087ndash095)

002 58 094(090ndash096)

minus003 60 091(085ndash094)

minus000

CFC 97 092(085ndash096)

022b 28 093(079ndash097)

023c 27 091(079ndash096)

027a 42 089(078ndash094)

019a

Abbreviations A-A = Form A at visit 1 and Form A at visit 2 A-B = Form A at visit 1 and Form B at visit 2 CC = Crystallized Composite CFC = Cognitive FunctionComposite CI = confidence interval DCCS = Dimensional Change Card Sort DEXT = Developmental Extension FC = Fluid Composite Flanker = FlankerInhibitory Control and Attention ICC = intraclass correlation ID = intellectual disability LS = List SortingWorkingMemory OR =Oral Reading and RecognitionPC = Pattern Comparison Processing Speed PSM = Picture Sequence Memory PVT = Picture Vocabularya p lt 005b p lt 0001c p lt 001

e1236 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 5: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

Syndrome-specific comparisons (known-groups validity)To examine the specificity of the NIHTB-CB to detectsyndrome-specific performance a 2-way mixed-model analysisof variance was conducted on NIHTB-CB test z scores withgroup as a between-participants factor and NIHTB-CB test asa within-participants factor we also examined their interaction(figure 2) IQ was included as a repeated-measures varyingcovariate and an IQ-by-test interaction term was included toallow the effect of IQ to vary by test Because groups differed onFSIQ covarying IQ aimed to clarify whether group results re-flect phenotype-specific impairments or if they simply reflectglobally poorer performance due to overall level of cognitive

functioning To avoid overcontrolling for the domain ofinterest (because NIHTB-CB domains overlap with com-ponents of IQ) verbal IQ was used as the covariate for thefluid NIHTB-CB test outcomes (DCCS Flanker LS PCand PSM) and nonverbal IQ was used as the covariate forcrystallized NIHTB-CB tests (PVT and OR) To obtaineffect sizes for pairwise comparisons the Cohen d was cal-culated from the estimated marginal means from the modelto account for the effects of IQ

There was a main effect of group on NIHTB-CB z scores(F2238 = 490 p = 0008) as well as a main effect of test on zscores (F61067 = 1726 p lt 0001) These were qualified by

Table 1 Participant descriptive information

Total(n = 242)

Down syndrome(n = 91)

Fragile X syndrome(n = 75)

Other ID(n = 76)

Race n

American IndianAlaskaNative

3 1 1 1

Asian 6 2 2 2

Black 24 4 8 12

Native HawaiianPacificIslander

3 1 1 1

White 171 70 60 41

gt1 26 10 2 14

Ethnicity Hispanic orLatino

199 187 80 329

Sex male 593 451 733 632

Primary caregiver 4-ydegree

618 648 627 566

ASD diagnosis (parentreport)

353(8 unknown)

66(0 unknown)

520(6 unknown)

526(1 unknown)

M SD M SD M SD M SD

Chronologic age y 1571 515 1588 517 1616 492 1505 535

Mental age (SB-5) y 520 153 467 120 491 135 612 164

Full-scale IQ (SB-5) 5375 1587 4765 1290 5048 1513 6424 1469

Vineland-3 ABC score 5259 1711 5493 1623 5024 1863 5193 1654

DCCS computed score 381 251 333 224 320 226 468 271

Flanker computed score 492 236 454 223 446 244 571 228

LS raw score 630 441 430 363 602 363 834 475

PC computed score 3307 1593 2592 1374 3310 1408 3984 1674

PSM theta score minus158 090 minus186 076 minus174 087 minus109 088

PVT theta score minus291 280 minus379 273 minus270 257 minus209 283

OR theta score minus584 496 minus662 529 minus621 453 minus460 478

Abbreviations ABC = Adaptive Behavior Composite ASD = autism spectrum disorder DCCS = Dimensional Change Card Sort Flanker = Flanker InhibitoryControl and Attention ID = intellectual disability LS = List Sorting WorkingMemory M =mean OR = Oral Reading and Recognition PC = Pattern ComparisonProcessing Speed PSM = Picture Sequence Memory PVT = Picture Vocabulary SB-5 = Stanford-Binet 5th edition

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1233

a significant group times test interaction (F121067 = 368 p lt0001) and a significant IQ times test interaction (F61067 = 672 plt 0001) Follow up Tukey tests showed that FXS performedworse onDCCS thanOID [t(238) = 402 p lt 0001 d = 052]and worse than DS [t(238) = minus304 p = 0007 d = 039] OnFlanker FXS performed worse than OID [t(238) = 345 p =0002 d = 045] and worse than DS [t(238) = minus285 p = 001d = 037] These 2 EF test results supported the hypothesizedEF impairment in FXS although the DS impairment relativeto OID was not supported On PVT FXS performed betterthan DS fitting with expected language strength in FXS[t(238) = minus277 p = 002 d = 036] On PC DS showeda poorer performance than OID which approached

significance [t(238) = 224 p = 007 d = 029] There were nosignificant group differences on LS PSM or OR

DiscussionThis study provides the first comprehensive examination of thepsychometric properties and feasibility of the NIHTB-CB forindividuals with ID with an initial focus on 2 of the mostcommon genetic causes with robust translational researchprograms FXS and DS Overall the NIHTB-CB has demon-strated strong potential for use as an objective standardizedoutcome measure that can be confidently used in ID trials withparticipants with amental age of 5 years or higher Results of thestudy demonstrate very strong psychometrics for the Crystal-lized reasoning tests (PVT and OR) and good to excellentperformance of Fluid reasoning tests withmore variation acrossFXS DS and OID groups for some measures (eg strongreliability for Flanker in FXS compared toDS and vice versa forDCCS) Indeed the Fluid Composite appeared to have moreconsistently strong reliability across conditions and a solidconvergent association with FSIQ Thus it may be a goodcandidate outcome measure for studies seeking to examinebroad nonverbal cognitive changes for individuals with a mentalage of ge5 years Below a mental age of 5 years feasibility wasmore variable across tests indicating the need for furtheradaptations scoring algorithms on developmental extensionsor new tests targeting these lower-functioning individuals

The Supplemental Manual was compiled after hundreds ofadministrations of the NIHTB-CB to individuals with ID andthe study results on feasibility reliability and validity supportits use We encourage researchers and examiners planning touse the NIHTB-CB to follow these guidelines for the IDpopulation

Group comparisons demonstrated that the NIHTB-CB issensitive to substantial EF deficits among individuals with IDin that all 3 groups performed relatively poorly on Flanker andDCCS In particular participants with FXS showed weaknessin inhibitory control and attention and cognitive flexibility inexcess of their general cognitive level and compared to con-trols with other forms of ID This aligns with previous re-search showing that boys with FXS are impaired in inhibitorycontrol set shifting and planning relative to mental agendashmatched controls27 and that boys with FXS have impairmentsin inhibition and attention relative to mental agendashmatchedcontrols and relative to children with DS28 The hypothesizedEF weakness in DS compared to OID (to a lesser extent thanFXS) was not found however there have been some mixedresults on EF in children with DS compared to children withother IDs2930 The low-scoring subgroup on DCCS are thosewho failed introduction to the switching portion of the testfor participants who cannot perform switching at the earliestlevel the test may be less sensitive to variation in EF Re-moving this subgroup from analyses strengthened the con-vergent validity correlation This suggests that these

Figure 1 NIHTB-CB cognitive function composite scoredistribution and association with FSIQ

(A) Histogram showing the true distribution of cognitive function compositescores in the full sample of individuals with intellectual disability (ID) TheNIH Toolbox Cognitive Battery (NIHTB-CB) current floor (winsorized at 54vertical blue line) excludedmore than half of the present sample frommoreaccurate measurement below that level (B) Association between the de-viation-based cognitive function composite and deviation-based full-scaleIQ (FSIQ) by group

e1234 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

individualsrsquo level or lack of cognitive flexibility may not becaptured by the introductory switching portion of DCCS Anextension of DCCS that does capture this construct in veryyoung or low-functioning persons would have much valueThe lower reliability of DCCS in the FXS group may alsoreflect this limitation in that participants with FXS overallscored very low on this test Because anxiety and hyperactivityare common in FXS it is possible that individual state inter-acts with the task complexity and cognitive flexibility to resultin more variable performance over time in this group Whenparticipants needed Flanker DEXT or DCCS DEXT theywere almost always able to perform the tests however a fewissues remain to be worked out regarding these experimentalversions notably the ability to interpret these scores relativeto the standard Flanker and DCCS scores This study high-lighted other concerns with the DEXT measures (eg diffi-culty may be ordered incorrectly or portions are overlyburdensome) Our results provide clear evidence that DEXTlevels are necessary (and extremely feasible) especially inthose with FXS and DS but that further refinements andmodifications are necessary

To improve feasibility of LS extra instructions and practiceitems were developed and used The LS Age 3ndash6 experimentalversion with these additions is now available Feasibility didimprove after our initial pilot studies however the testremains challenging for this population and some participantspass practice but get no test sequences completely correctThe limited feasibility and floored or low variability in rawscores suggest that a lower range of LS is necessary A po-tential approach may be to give some credit on earlysequences that are partially recalled or items recalled out ofsequence because the construct of working memory builds onmore basic short-term memory (eg SB-5 Working Memoryindex and Wechsler Working Memory indices)1331

Both language tests PVT and OR had excellent performancein our samples of individuals with ID Both demonstratedstrong reliability and clear domain specificity with a muchhigher convergent than discriminant correlation TheNIHTB-CB was sensitive to expected language characteristicsin that DS was impaired relative to FXS on PVT32 These testshave an advantage over other language measures such as thePPVT-4 in that PVT and OR are brief (asymp3 minutes each) butaccurate owing to CAT the ability to obtain results witha brief assessment is especially important in individuals withfrequent behavior or attention issues such as in those with ID

Episodic memory is relevant to clinical trials particularly forDS in which memory impairments are well documented33

and FXS in which memory of sequential information is es-pecially impaired34 It is important to emphasize that despitethese known weaknesses PSM was among the highest of thetest scores in each group (figure 2) suggesting that individualsmay have compensatory strategies for performing well per-haps such as use of the contextual information in the storiesReliability was moderate and lower than that of the normative3- 15-year-old study although improved from our pilot studyThe significant effect size in the PSM A-B group suggestsnonequivalence of Forms A and B Notably in DS ICCs weresmall and nonsignificant suggesting that in its present formPSM appears contraindicated as a separate outcome measurefor this population

The high rate of ceiling scores on PSM 3ndash4 suggests thatmental age may not be a good indicator of start point on PSMin ID at least at this mental age level It is also possible thatthis age version is too easy in comparison to PSM scores onolder versions perhaps a personrsquos score on 1 PSM version isnot equivalent to that personrsquos score on another De-velopment of a CAT PSM test with the full range of version

Table 2 At visit 1 proportion of participants who received a valid scorea

Total n () Down syndrome n () Fragile X syndrome n () Other ID n ()

Flanker 195 (806) 78 (867) 50 (667) 67 (882)

Flanker (with DEXT)b 239 (988) 89 (989) 75 (987) 75 (987)

DCCS 152 (636) 55 (611) 40 (541) 57 (760)

DCCS (with DEXT)b 232 (971) 87 (967) 72 (973) 73 (973)

LS 164 (689) 57 (648) 45 (600) 62 (827)

PC 179 (752) 59 (648) 58 (801) 62 (827)

PSM 220 (921) 85 (944) 65 (867) 70 (933)

PVT 237 (983) 88 (978) 73 (973) 76 (1000)

OR 233 (979) 87 (978) 72 (973) 74 (987)

Abbreviations DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Flanker = Flanker Inhibitory Control and Attention ID = intellectualdisability LS = List Sorting Working Memory OR = Oral Reading and Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PVT = Picture Vocabularya Technical difficulties and administration errors are excludedb On Flanker 274 of the sample needed the DEXT portion On DCCS 531 needed DEXT

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1235

difficulties would likely simplify the testing process and yieldmore comparable and reliable scores

PC showed adequate feasibility overall and good feasibility inFXS and OID with sufficient feasibility at a mental age of ge4years While the ICC was excellent there was a small butsignificant practice effect although smaller than that found inthe normative sample35 The most common reason for lack offeasibility on PC was an invalid alternating response patternespecially common in DS The task has an inherent challengeof understanding ldquosamerdquo and ldquodifferentrdquo while mapping thischoice onto smiley face and frown face options For partic-ipants without invalid response patterns PC performs well inID On the basis of the feasibility challenges we developeda new processing speed task Speeded Matching now avail-able as an experimental version in the app The task is to selectthe animal face among 3 foils that matches a target image

Future psychometric studies will provide more informationabout its performance

The Fluid Crystallized and Cognitive Function compositesdemonstrated reliability results similar to those of the nor-mative age 3 to 6 sample with small practice effects in Fluidand Cognitive Function composites (although smaller than inthe normative sample) Each composite was well correlatedwith FSIQ Although not all participants were able to receivea composite (due to missing valid scores on some tests) whencomplete the composites appear to perform well The de-viation method used to create these composites has a clearadvantage over the current age-corrected standard scores onwhich more than half of our sample obtained scores at thelower limit currently set at 54 We are conducting analyses toidentify the best option for composite scores below this floor(deviation approach vs extension of existing age-adjusted

Table 3 Test-retest reliability and examination of practice effects

Total Down syndrome Fragile X syndrome Other ID

No ICC (95 CI)Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend

Flanker 144 074(065ndash081)

009a 56 063(044ndash076)

020 37 084(070ndash091)

minus011 51 069(051ndash081)

015

FlankerDEXT

36 060(033ndash077)

minus001 8 073(018ndash094)

minus033 23 062(028ndash082)

minus005 5 046(minus037ndash092)

056

DCCS 97 071(060ndash080)

013 32 078(058ndash088)

018 23 041(001ndash069)

024 42 082(068ndash090)

003

DCCS DEXT 73 049(030ndash065)

minus016 32 046(014ndash069)

minus017 27 050(016ndash074)

minus013 14 053(003ndash082)

minus021

LS 113 074(064ndash081)

014 37 075(056ndash086)

024 32 065(040ndash081)

minus011 43 069(049ndash082)

022

PC 133 077(062ndash085)

029b 45 074(048ndash086)

039b 40 071(050ndash084)

021a 48 075(054ndash086)

035c

PSM A-A 60 047(025ndash064)

004 22 024(minus022ndash060)

minus014 15 054(007ndash082)

027 23 040(001ndash069)

011

PSM A-B 60 055(034ndash071)

029a 22 033(minus006ndash065)

022 18 054(014ndash080)

025 20 040(minus003ndash071)

034

PVT 186 085(081ndash089)

003 69 087(080ndash092)

008 57 079(066ndash087)

001 60 086(078ndash092)

minus002

OR 183 096(095ndash097)

002 70 095(092ndash097)

001 56 096(093ndash098)

minus001 57 098(097ndash099)

004

FC 97 083(067ndash091)

033b 28 084(052ndash093)

040b 27 079(056ndash090)

034a 42 077(055ndash088)

033c

CC 191 093(091ndash095)

000 73 092(087ndash095)

002 58 094(090ndash096)

minus003 60 091(085ndash094)

minus000

CFC 97 092(085ndash096)

022b 28 093(079ndash097)

023c 27 091(079ndash096)

027a 42 089(078ndash094)

019a

Abbreviations A-A = Form A at visit 1 and Form A at visit 2 A-B = Form A at visit 1 and Form B at visit 2 CC = Crystallized Composite CFC = Cognitive FunctionComposite CI = confidence interval DCCS = Dimensional Change Card Sort DEXT = Developmental Extension FC = Fluid Composite Flanker = FlankerInhibitory Control and Attention ICC = intraclass correlation ID = intellectual disability LS = List SortingWorkingMemory OR =Oral Reading and RecognitionPC = Pattern Comparison Processing Speed PSM = Picture Sequence Memory PVT = Picture Vocabularya p lt 005b p lt 0001c p lt 001

e1236 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 6: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

a significant group times test interaction (F121067 = 368 p lt0001) and a significant IQ times test interaction (F61067 = 672 plt 0001) Follow up Tukey tests showed that FXS performedworse onDCCS thanOID [t(238) = 402 p lt 0001 d = 052]and worse than DS [t(238) = minus304 p = 0007 d = 039] OnFlanker FXS performed worse than OID [t(238) = 345 p =0002 d = 045] and worse than DS [t(238) = minus285 p = 001d = 037] These 2 EF test results supported the hypothesizedEF impairment in FXS although the DS impairment relativeto OID was not supported On PVT FXS performed betterthan DS fitting with expected language strength in FXS[t(238) = minus277 p = 002 d = 036] On PC DS showeda poorer performance than OID which approached

significance [t(238) = 224 p = 007 d = 029] There were nosignificant group differences on LS PSM or OR

DiscussionThis study provides the first comprehensive examination of thepsychometric properties and feasibility of the NIHTB-CB forindividuals with ID with an initial focus on 2 of the mostcommon genetic causes with robust translational researchprograms FXS and DS Overall the NIHTB-CB has demon-strated strong potential for use as an objective standardizedoutcome measure that can be confidently used in ID trials withparticipants with amental age of 5 years or higher Results of thestudy demonstrate very strong psychometrics for the Crystal-lized reasoning tests (PVT and OR) and good to excellentperformance of Fluid reasoning tests withmore variation acrossFXS DS and OID groups for some measures (eg strongreliability for Flanker in FXS compared toDS and vice versa forDCCS) Indeed the Fluid Composite appeared to have moreconsistently strong reliability across conditions and a solidconvergent association with FSIQ Thus it may be a goodcandidate outcome measure for studies seeking to examinebroad nonverbal cognitive changes for individuals with a mentalage of ge5 years Below a mental age of 5 years feasibility wasmore variable across tests indicating the need for furtheradaptations scoring algorithms on developmental extensionsor new tests targeting these lower-functioning individuals

The Supplemental Manual was compiled after hundreds ofadministrations of the NIHTB-CB to individuals with ID andthe study results on feasibility reliability and validity supportits use We encourage researchers and examiners planning touse the NIHTB-CB to follow these guidelines for the IDpopulation

Group comparisons demonstrated that the NIHTB-CB issensitive to substantial EF deficits among individuals with IDin that all 3 groups performed relatively poorly on Flanker andDCCS In particular participants with FXS showed weaknessin inhibitory control and attention and cognitive flexibility inexcess of their general cognitive level and compared to con-trols with other forms of ID This aligns with previous re-search showing that boys with FXS are impaired in inhibitorycontrol set shifting and planning relative to mental agendashmatched controls27 and that boys with FXS have impairmentsin inhibition and attention relative to mental agendashmatchedcontrols and relative to children with DS28 The hypothesizedEF weakness in DS compared to OID (to a lesser extent thanFXS) was not found however there have been some mixedresults on EF in children with DS compared to children withother IDs2930 The low-scoring subgroup on DCCS are thosewho failed introduction to the switching portion of the testfor participants who cannot perform switching at the earliestlevel the test may be less sensitive to variation in EF Re-moving this subgroup from analyses strengthened the con-vergent validity correlation This suggests that these

Figure 1 NIHTB-CB cognitive function composite scoredistribution and association with FSIQ

(A) Histogram showing the true distribution of cognitive function compositescores in the full sample of individuals with intellectual disability (ID) TheNIH Toolbox Cognitive Battery (NIHTB-CB) current floor (winsorized at 54vertical blue line) excludedmore than half of the present sample frommoreaccurate measurement below that level (B) Association between the de-viation-based cognitive function composite and deviation-based full-scaleIQ (FSIQ) by group

e1234 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

individualsrsquo level or lack of cognitive flexibility may not becaptured by the introductory switching portion of DCCS Anextension of DCCS that does capture this construct in veryyoung or low-functioning persons would have much valueThe lower reliability of DCCS in the FXS group may alsoreflect this limitation in that participants with FXS overallscored very low on this test Because anxiety and hyperactivityare common in FXS it is possible that individual state inter-acts with the task complexity and cognitive flexibility to resultin more variable performance over time in this group Whenparticipants needed Flanker DEXT or DCCS DEXT theywere almost always able to perform the tests however a fewissues remain to be worked out regarding these experimentalversions notably the ability to interpret these scores relativeto the standard Flanker and DCCS scores This study high-lighted other concerns with the DEXT measures (eg diffi-culty may be ordered incorrectly or portions are overlyburdensome) Our results provide clear evidence that DEXTlevels are necessary (and extremely feasible) especially inthose with FXS and DS but that further refinements andmodifications are necessary

To improve feasibility of LS extra instructions and practiceitems were developed and used The LS Age 3ndash6 experimentalversion with these additions is now available Feasibility didimprove after our initial pilot studies however the testremains challenging for this population and some participantspass practice but get no test sequences completely correctThe limited feasibility and floored or low variability in rawscores suggest that a lower range of LS is necessary A po-tential approach may be to give some credit on earlysequences that are partially recalled or items recalled out ofsequence because the construct of working memory builds onmore basic short-term memory (eg SB-5 Working Memoryindex and Wechsler Working Memory indices)1331

Both language tests PVT and OR had excellent performancein our samples of individuals with ID Both demonstratedstrong reliability and clear domain specificity with a muchhigher convergent than discriminant correlation TheNIHTB-CB was sensitive to expected language characteristicsin that DS was impaired relative to FXS on PVT32 These testshave an advantage over other language measures such as thePPVT-4 in that PVT and OR are brief (asymp3 minutes each) butaccurate owing to CAT the ability to obtain results witha brief assessment is especially important in individuals withfrequent behavior or attention issues such as in those with ID

Episodic memory is relevant to clinical trials particularly forDS in which memory impairments are well documented33

and FXS in which memory of sequential information is es-pecially impaired34 It is important to emphasize that despitethese known weaknesses PSM was among the highest of thetest scores in each group (figure 2) suggesting that individualsmay have compensatory strategies for performing well per-haps such as use of the contextual information in the storiesReliability was moderate and lower than that of the normative3- 15-year-old study although improved from our pilot studyThe significant effect size in the PSM A-B group suggestsnonequivalence of Forms A and B Notably in DS ICCs weresmall and nonsignificant suggesting that in its present formPSM appears contraindicated as a separate outcome measurefor this population

The high rate of ceiling scores on PSM 3ndash4 suggests thatmental age may not be a good indicator of start point on PSMin ID at least at this mental age level It is also possible thatthis age version is too easy in comparison to PSM scores onolder versions perhaps a personrsquos score on 1 PSM version isnot equivalent to that personrsquos score on another De-velopment of a CAT PSM test with the full range of version

Table 2 At visit 1 proportion of participants who received a valid scorea

Total n () Down syndrome n () Fragile X syndrome n () Other ID n ()

Flanker 195 (806) 78 (867) 50 (667) 67 (882)

Flanker (with DEXT)b 239 (988) 89 (989) 75 (987) 75 (987)

DCCS 152 (636) 55 (611) 40 (541) 57 (760)

DCCS (with DEXT)b 232 (971) 87 (967) 72 (973) 73 (973)

LS 164 (689) 57 (648) 45 (600) 62 (827)

PC 179 (752) 59 (648) 58 (801) 62 (827)

PSM 220 (921) 85 (944) 65 (867) 70 (933)

PVT 237 (983) 88 (978) 73 (973) 76 (1000)

OR 233 (979) 87 (978) 72 (973) 74 (987)

Abbreviations DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Flanker = Flanker Inhibitory Control and Attention ID = intellectualdisability LS = List Sorting Working Memory OR = Oral Reading and Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PVT = Picture Vocabularya Technical difficulties and administration errors are excludedb On Flanker 274 of the sample needed the DEXT portion On DCCS 531 needed DEXT

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1235

difficulties would likely simplify the testing process and yieldmore comparable and reliable scores

PC showed adequate feasibility overall and good feasibility inFXS and OID with sufficient feasibility at a mental age of ge4years While the ICC was excellent there was a small butsignificant practice effect although smaller than that found inthe normative sample35 The most common reason for lack offeasibility on PC was an invalid alternating response patternespecially common in DS The task has an inherent challengeof understanding ldquosamerdquo and ldquodifferentrdquo while mapping thischoice onto smiley face and frown face options For partic-ipants without invalid response patterns PC performs well inID On the basis of the feasibility challenges we developeda new processing speed task Speeded Matching now avail-able as an experimental version in the app The task is to selectthe animal face among 3 foils that matches a target image

Future psychometric studies will provide more informationabout its performance

The Fluid Crystallized and Cognitive Function compositesdemonstrated reliability results similar to those of the nor-mative age 3 to 6 sample with small practice effects in Fluidand Cognitive Function composites (although smaller than inthe normative sample) Each composite was well correlatedwith FSIQ Although not all participants were able to receivea composite (due to missing valid scores on some tests) whencomplete the composites appear to perform well The de-viation method used to create these composites has a clearadvantage over the current age-corrected standard scores onwhich more than half of our sample obtained scores at thelower limit currently set at 54 We are conducting analyses toidentify the best option for composite scores below this floor(deviation approach vs extension of existing age-adjusted

Table 3 Test-retest reliability and examination of practice effects

Total Down syndrome Fragile X syndrome Other ID

No ICC (95 CI)Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend

Flanker 144 074(065ndash081)

009a 56 063(044ndash076)

020 37 084(070ndash091)

minus011 51 069(051ndash081)

015

FlankerDEXT

36 060(033ndash077)

minus001 8 073(018ndash094)

minus033 23 062(028ndash082)

minus005 5 046(minus037ndash092)

056

DCCS 97 071(060ndash080)

013 32 078(058ndash088)

018 23 041(001ndash069)

024 42 082(068ndash090)

003

DCCS DEXT 73 049(030ndash065)

minus016 32 046(014ndash069)

minus017 27 050(016ndash074)

minus013 14 053(003ndash082)

minus021

LS 113 074(064ndash081)

014 37 075(056ndash086)

024 32 065(040ndash081)

minus011 43 069(049ndash082)

022

PC 133 077(062ndash085)

029b 45 074(048ndash086)

039b 40 071(050ndash084)

021a 48 075(054ndash086)

035c

PSM A-A 60 047(025ndash064)

004 22 024(minus022ndash060)

minus014 15 054(007ndash082)

027 23 040(001ndash069)

011

PSM A-B 60 055(034ndash071)

029a 22 033(minus006ndash065)

022 18 054(014ndash080)

025 20 040(minus003ndash071)

034

PVT 186 085(081ndash089)

003 69 087(080ndash092)

008 57 079(066ndash087)

001 60 086(078ndash092)

minus002

OR 183 096(095ndash097)

002 70 095(092ndash097)

001 56 096(093ndash098)

minus001 57 098(097ndash099)

004

FC 97 083(067ndash091)

033b 28 084(052ndash093)

040b 27 079(056ndash090)

034a 42 077(055ndash088)

033c

CC 191 093(091ndash095)

000 73 092(087ndash095)

002 58 094(090ndash096)

minus003 60 091(085ndash094)

minus000

CFC 97 092(085ndash096)

022b 28 093(079ndash097)

023c 27 091(079ndash096)

027a 42 089(078ndash094)

019a

Abbreviations A-A = Form A at visit 1 and Form A at visit 2 A-B = Form A at visit 1 and Form B at visit 2 CC = Crystallized Composite CFC = Cognitive FunctionComposite CI = confidence interval DCCS = Dimensional Change Card Sort DEXT = Developmental Extension FC = Fluid Composite Flanker = FlankerInhibitory Control and Attention ICC = intraclass correlation ID = intellectual disability LS = List SortingWorkingMemory OR =Oral Reading and RecognitionPC = Pattern Comparison Processing Speed PSM = Picture Sequence Memory PVT = Picture Vocabularya p lt 005b p lt 0001c p lt 001

e1236 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 7: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

individualsrsquo level or lack of cognitive flexibility may not becaptured by the introductory switching portion of DCCS Anextension of DCCS that does capture this construct in veryyoung or low-functioning persons would have much valueThe lower reliability of DCCS in the FXS group may alsoreflect this limitation in that participants with FXS overallscored very low on this test Because anxiety and hyperactivityare common in FXS it is possible that individual state inter-acts with the task complexity and cognitive flexibility to resultin more variable performance over time in this group Whenparticipants needed Flanker DEXT or DCCS DEXT theywere almost always able to perform the tests however a fewissues remain to be worked out regarding these experimentalversions notably the ability to interpret these scores relativeto the standard Flanker and DCCS scores This study high-lighted other concerns with the DEXT measures (eg diffi-culty may be ordered incorrectly or portions are overlyburdensome) Our results provide clear evidence that DEXTlevels are necessary (and extremely feasible) especially inthose with FXS and DS but that further refinements andmodifications are necessary

To improve feasibility of LS extra instructions and practiceitems were developed and used The LS Age 3ndash6 experimentalversion with these additions is now available Feasibility didimprove after our initial pilot studies however the testremains challenging for this population and some participantspass practice but get no test sequences completely correctThe limited feasibility and floored or low variability in rawscores suggest that a lower range of LS is necessary A po-tential approach may be to give some credit on earlysequences that are partially recalled or items recalled out ofsequence because the construct of working memory builds onmore basic short-term memory (eg SB-5 Working Memoryindex and Wechsler Working Memory indices)1331

Both language tests PVT and OR had excellent performancein our samples of individuals with ID Both demonstratedstrong reliability and clear domain specificity with a muchhigher convergent than discriminant correlation TheNIHTB-CB was sensitive to expected language characteristicsin that DS was impaired relative to FXS on PVT32 These testshave an advantage over other language measures such as thePPVT-4 in that PVT and OR are brief (asymp3 minutes each) butaccurate owing to CAT the ability to obtain results witha brief assessment is especially important in individuals withfrequent behavior or attention issues such as in those with ID

Episodic memory is relevant to clinical trials particularly forDS in which memory impairments are well documented33

and FXS in which memory of sequential information is es-pecially impaired34 It is important to emphasize that despitethese known weaknesses PSM was among the highest of thetest scores in each group (figure 2) suggesting that individualsmay have compensatory strategies for performing well per-haps such as use of the contextual information in the storiesReliability was moderate and lower than that of the normative3- 15-year-old study although improved from our pilot studyThe significant effect size in the PSM A-B group suggestsnonequivalence of Forms A and B Notably in DS ICCs weresmall and nonsignificant suggesting that in its present formPSM appears contraindicated as a separate outcome measurefor this population

The high rate of ceiling scores on PSM 3ndash4 suggests thatmental age may not be a good indicator of start point on PSMin ID at least at this mental age level It is also possible thatthis age version is too easy in comparison to PSM scores onolder versions perhaps a personrsquos score on 1 PSM version isnot equivalent to that personrsquos score on another De-velopment of a CAT PSM test with the full range of version

Table 2 At visit 1 proportion of participants who received a valid scorea

Total n () Down syndrome n () Fragile X syndrome n () Other ID n ()

Flanker 195 (806) 78 (867) 50 (667) 67 (882)

Flanker (with DEXT)b 239 (988) 89 (989) 75 (987) 75 (987)

DCCS 152 (636) 55 (611) 40 (541) 57 (760)

DCCS (with DEXT)b 232 (971) 87 (967) 72 (973) 73 (973)

LS 164 (689) 57 (648) 45 (600) 62 (827)

PC 179 (752) 59 (648) 58 (801) 62 (827)

PSM 220 (921) 85 (944) 65 (867) 70 (933)

PVT 237 (983) 88 (978) 73 (973) 76 (1000)

OR 233 (979) 87 (978) 72 (973) 74 (987)

Abbreviations DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Flanker = Flanker Inhibitory Control and Attention ID = intellectualdisability LS = List Sorting Working Memory OR = Oral Reading and Recognition PC = Pattern Comparison Processing Speed PSM = Picture SequenceMemory PVT = Picture Vocabularya Technical difficulties and administration errors are excludedb On Flanker 274 of the sample needed the DEXT portion On DCCS 531 needed DEXT

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1235

difficulties would likely simplify the testing process and yieldmore comparable and reliable scores

PC showed adequate feasibility overall and good feasibility inFXS and OID with sufficient feasibility at a mental age of ge4years While the ICC was excellent there was a small butsignificant practice effect although smaller than that found inthe normative sample35 The most common reason for lack offeasibility on PC was an invalid alternating response patternespecially common in DS The task has an inherent challengeof understanding ldquosamerdquo and ldquodifferentrdquo while mapping thischoice onto smiley face and frown face options For partic-ipants without invalid response patterns PC performs well inID On the basis of the feasibility challenges we developeda new processing speed task Speeded Matching now avail-able as an experimental version in the app The task is to selectthe animal face among 3 foils that matches a target image

Future psychometric studies will provide more informationabout its performance

The Fluid Crystallized and Cognitive Function compositesdemonstrated reliability results similar to those of the nor-mative age 3 to 6 sample with small practice effects in Fluidand Cognitive Function composites (although smaller than inthe normative sample) Each composite was well correlatedwith FSIQ Although not all participants were able to receivea composite (due to missing valid scores on some tests) whencomplete the composites appear to perform well The de-viation method used to create these composites has a clearadvantage over the current age-corrected standard scores onwhich more than half of our sample obtained scores at thelower limit currently set at 54 We are conducting analyses toidentify the best option for composite scores below this floor(deviation approach vs extension of existing age-adjusted

Table 3 Test-retest reliability and examination of practice effects

Total Down syndrome Fragile X syndrome Other ID

No ICC (95 CI)Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend

Flanker 144 074(065ndash081)

009a 56 063(044ndash076)

020 37 084(070ndash091)

minus011 51 069(051ndash081)

015

FlankerDEXT

36 060(033ndash077)

minus001 8 073(018ndash094)

minus033 23 062(028ndash082)

minus005 5 046(minus037ndash092)

056

DCCS 97 071(060ndash080)

013 32 078(058ndash088)

018 23 041(001ndash069)

024 42 082(068ndash090)

003

DCCS DEXT 73 049(030ndash065)

minus016 32 046(014ndash069)

minus017 27 050(016ndash074)

minus013 14 053(003ndash082)

minus021

LS 113 074(064ndash081)

014 37 075(056ndash086)

024 32 065(040ndash081)

minus011 43 069(049ndash082)

022

PC 133 077(062ndash085)

029b 45 074(048ndash086)

039b 40 071(050ndash084)

021a 48 075(054ndash086)

035c

PSM A-A 60 047(025ndash064)

004 22 024(minus022ndash060)

minus014 15 054(007ndash082)

027 23 040(001ndash069)

011

PSM A-B 60 055(034ndash071)

029a 22 033(minus006ndash065)

022 18 054(014ndash080)

025 20 040(minus003ndash071)

034

PVT 186 085(081ndash089)

003 69 087(080ndash092)

008 57 079(066ndash087)

001 60 086(078ndash092)

minus002

OR 183 096(095ndash097)

002 70 095(092ndash097)

001 56 096(093ndash098)

minus001 57 098(097ndash099)

004

FC 97 083(067ndash091)

033b 28 084(052ndash093)

040b 27 079(056ndash090)

034a 42 077(055ndash088)

033c

CC 191 093(091ndash095)

000 73 092(087ndash095)

002 58 094(090ndash096)

minus003 60 091(085ndash094)

minus000

CFC 97 092(085ndash096)

022b 28 093(079ndash097)

023c 27 091(079ndash096)

027a 42 089(078ndash094)

019a

Abbreviations A-A = Form A at visit 1 and Form A at visit 2 A-B = Form A at visit 1 and Form B at visit 2 CC = Crystallized Composite CFC = Cognitive FunctionComposite CI = confidence interval DCCS = Dimensional Change Card Sort DEXT = Developmental Extension FC = Fluid Composite Flanker = FlankerInhibitory Control and Attention ICC = intraclass correlation ID = intellectual disability LS = List SortingWorkingMemory OR =Oral Reading and RecognitionPC = Pattern Comparison Processing Speed PSM = Picture Sequence Memory PVT = Picture Vocabularya p lt 005b p lt 0001c p lt 001

e1236 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 8: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

difficulties would likely simplify the testing process and yieldmore comparable and reliable scores

PC showed adequate feasibility overall and good feasibility inFXS and OID with sufficient feasibility at a mental age of ge4years While the ICC was excellent there was a small butsignificant practice effect although smaller than that found inthe normative sample35 The most common reason for lack offeasibility on PC was an invalid alternating response patternespecially common in DS The task has an inherent challengeof understanding ldquosamerdquo and ldquodifferentrdquo while mapping thischoice onto smiley face and frown face options For partic-ipants without invalid response patterns PC performs well inID On the basis of the feasibility challenges we developeda new processing speed task Speeded Matching now avail-able as an experimental version in the app The task is to selectthe animal face among 3 foils that matches a target image

Future psychometric studies will provide more informationabout its performance

The Fluid Crystallized and Cognitive Function compositesdemonstrated reliability results similar to those of the nor-mative age 3 to 6 sample with small practice effects in Fluidand Cognitive Function composites (although smaller than inthe normative sample) Each composite was well correlatedwith FSIQ Although not all participants were able to receivea composite (due to missing valid scores on some tests) whencomplete the composites appear to perform well The de-viation method used to create these composites has a clearadvantage over the current age-corrected standard scores onwhich more than half of our sample obtained scores at thelower limit currently set at 54 We are conducting analyses toidentify the best option for composite scores below this floor(deviation approach vs extension of existing age-adjusted

Table 3 Test-retest reliability and examination of practice effects

Total Down syndrome Fragile X syndrome Other ID

No ICC (95 CI)Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend No ICC (95 CI)

Cohend

Flanker 144 074(065ndash081)

009a 56 063(044ndash076)

020 37 084(070ndash091)

minus011 51 069(051ndash081)

015

FlankerDEXT

36 060(033ndash077)

minus001 8 073(018ndash094)

minus033 23 062(028ndash082)

minus005 5 046(minus037ndash092)

056

DCCS 97 071(060ndash080)

013 32 078(058ndash088)

018 23 041(001ndash069)

024 42 082(068ndash090)

003

DCCS DEXT 73 049(030ndash065)

minus016 32 046(014ndash069)

minus017 27 050(016ndash074)

minus013 14 053(003ndash082)

minus021

LS 113 074(064ndash081)

014 37 075(056ndash086)

024 32 065(040ndash081)

minus011 43 069(049ndash082)

022

PC 133 077(062ndash085)

029b 45 074(048ndash086)

039b 40 071(050ndash084)

021a 48 075(054ndash086)

035c

PSM A-A 60 047(025ndash064)

004 22 024(minus022ndash060)

minus014 15 054(007ndash082)

027 23 040(001ndash069)

011

PSM A-B 60 055(034ndash071)

029a 22 033(minus006ndash065)

022 18 054(014ndash080)

025 20 040(minus003ndash071)

034

PVT 186 085(081ndash089)

003 69 087(080ndash092)

008 57 079(066ndash087)

001 60 086(078ndash092)

minus002

OR 183 096(095ndash097)

002 70 095(092ndash097)

001 56 096(093ndash098)

minus001 57 098(097ndash099)

004

FC 97 083(067ndash091)

033b 28 084(052ndash093)

040b 27 079(056ndash090)

034a 42 077(055ndash088)

033c

CC 191 093(091ndash095)

000 73 092(087ndash095)

002 58 094(090ndash096)

minus003 60 091(085ndash094)

minus000

CFC 97 092(085ndash096)

022b 28 093(079ndash097)

023c 27 091(079ndash096)

027a 42 089(078ndash094)

019a

Abbreviations A-A = Form A at visit 1 and Form A at visit 2 A-B = Form A at visit 1 and Form B at visit 2 CC = Crystallized Composite CFC = Cognitive FunctionComposite CI = confidence interval DCCS = Dimensional Change Card Sort DEXT = Developmental Extension FC = Fluid Composite Flanker = FlankerInhibitory Control and Attention ICC = intraclass correlation ID = intellectual disability LS = List SortingWorkingMemory OR =Oral Reading and RecognitionPC = Pattern Comparison Processing Speed PSM = Picture Sequence Memory PVT = Picture Vocabularya p lt 005b p lt 0001c p lt 001

e1236 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 9: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

Table 4 Convergent and discriminant validity of NIHTB-CB tests (Pearson r)

Total Down syndrome Fragile X syndrome Other ID

No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity No Conv Validity No Disc Validity

Flanker 144 minus052a 193 053a 52 minus056a 78 048a 36 minus044b 48 049a 56 minus044a 67 056a

Flanker DEXT 29 minus026 61 036b 7 004 15 051 15 minus031 32 050b 7 minus023 14 034

DCCSc 109 048a 151 046a 34 055a 55 056a 30 033 39 007 45 036d 58 054a

DCCS DEXT 43 042b 113 037a 16 034 48 045b 11 033 40 043b 16 048 25 011

LS 165 065a 164 049a 57 054a 57 047a 46 050a 45 038d 62 066a 62 052a

PC 171 066a 178 045a 54 060a 59 058a 56 048a 57 042b 61 069a 62 036b

PSM 175 047a 179 050a 69 034b 71 050a 49 064a 51 052a 57 033d 58 033d

PVT 235 083a 232 047a 88 075a 86 054a 71 085a 70 050a 76 088a 76 053a

OR 230 092a 227 058a 86 092a 84 062a 70 089a 69 061a 74 095a 74 059a

FC 151 060a 152 061a 53 049a 53 043b 40 047b 40 057a 58 047a 76 071a

CC 237 075a 237 068a 89 068a 89 064a 72 080a 73 068a 59 055a 75 064a

Abbreviations CC = Crystallized Composite Conv = convergent DCCS = Dimensional Change Card Sort DEXT = Developmental Extension Disc = discriminant FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention ID = intellectual disability LS = List SortingWorkingMemory NEPSY-In =NEPSY Inhibition subtest NIHTB-CB =NIH Toolbox Cognitive Battery OR =Oral Reading and Recognition PC = Pattern Comparison ProcessingSpeed PSM = Picture Sequence Memory PVT = Picture VocabularyConvergent discriminant measure for each test Flanker Conners Kiddie Continuous Performance Test 2nd Edition (KCPT2) Woodcock Johnson 4th Edition Letter-Word Identification (WJ-LW) Flanker DEXT KCPT2 WJ-LWDCCS NEPSY-In WJ-LW DCCS DEXT NEPSY-In WJ-LW LS Stanford-Binet 5th edition (SB-5) Verbal WorkingMemory WJ-LW PC Wechsler Preschool and Primary Scale of Intelligence 4th Edition Bug Search WJ-LW PSM LeiterInternational Performance Scale 3rd Edition Forward Memory (Leiter-FM) WJ-LW PVT Peabody Picture Vocabulary Test 4th edition Leiter-FM OR WJ-LW Leiter-FM Fluid Composite SB-5 Fluid Reasoning IQ SB-5 Verbal IQCrystallized Composite SB-5 Verbal IQ SB-5 Fluid Reasoning IQa p lt 0001b p lt 001c When the DCCS low subgroup (n = 26) was removed in fragile X syndrome the convergent correlation was significant (n = 21 r = 049 p = 002)d p lt 005

Neurolo

gyorgN

Neurology

|Volum

e94N

umber

12|

March

242020e1237

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 10: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

standard scores) These findings provide further evidence thattest developers should consider and address the range andsensitivity of tests and scores for individuals with moderate tosevere ID2436

This study has some important limitations that warrant con-sideration Construct validity challenges are inherent with theID population in that fully adequate convergent validitymeasures are not always available or they present their ownfeasibility and psychometric limitations The lack of cleardiscriminant validity in most measures is another challengeWhile discriminant correlations are generally desired to bemarkedly lower than the convergent correlations in earlydevelopment domains of cognition (especially EFs) arethought to be unidimensional with increasing differentiationof constructs occurring through early adulthood37 In IDthere is likely even less differentiation between domains thanin typically developing children Our discriminant validityresults are similar to normative 3- to 6-year-old results38

Therefore the Fluid Composite may be a good outcomemeasure choice on the basis of its psychometric performanceand some limitations of subtest construct differentiation inthis population The study was also limited by sample size inevaluations of group-specific results Future work with largersamples should provide more clarity about reliability andvalidity within individual ID subgroups

The NIHTB-CB is a promising outcome measure for IDclinical trials and for many types of nonintervention obser-vational studies Although not originally intended for clinicaluse or for special education purposes ongoing and futureresearch may be done to explore such applications23 such as inthe school psychology setting for accurate and feasible as-sessment of students with IDs In addition the SupplementalManual developed from this study provides key guidance toexaminers and researchers working with ID populations theprocedures on administration test environment fidelity andscoring not only improve examiner familiarity and comfortbut more important extend accessibility of the NIHTB-CBto more cognitively impaired or behaviorally challengedindividuals

Besides evaluating the NIHTB-CB as an appropriate assess-ment for ID in general the present results demonstrate thesensitivity of the battery to known syndrome-specific cogni-tive phenotypes such as the impairment of EFs in FXS relativeto other ID and to DS A critical remaining question is thedegree to which the battery is sensitive to change especially toeffects of intervention As an initial test of sensitivity tochange we are currently collecting longitudinal data from

Table 5 Ecologic validity (Pearson r)

Chronologicage

Mentalage FSIQ

VABS-3ABC

Flanker 025a 058a 050a 018b

FlankerDEXT

025 037c 018 010

DCCS 030a 063a 056a 014

DCCS DEXT 014 066a 049a 017

LS 020b 070a 066a 016b

PC 010 059a 057a 019b

PSM 004 058a 058a 027a

PVT 035a 076a 074a 039a

OR 017c 068a 069a 039a

FC mdash mdash 066a 015

CC mdash mdash 076a 044a

CFC mdash mdash 078a 025c

Abbreviations CC = Crystallized Composite CFC = Cognitive FunctionComposite DCCS = Dimensional Change Card Sort DEXT = DevelopmentalExtension FC = Fluid Composite Flanker = Flanker Inhibitory Control andAttention FSIQ = full-scale IQ LS = List Sorting Working Memory OR = OralReading and Recognition PC = Pattern Comparison Processing Speed PSM= Picture Sequence Memory PVT = Picture Vocabulary VABS-3 ABC =Vineland-3 Adaptive Behavior Compositea p lt 0001b p lt 005c p lt 001

Figure 2 Profile plot of NIH Toolbox Cognitive Battery test zscores by group

The z scores on each test (representing group performance relative to thegeneral population average performance) are shown derived from themixed-model analysis of variance adjusted for IQ A z score of 0 (horizontalline at top) represents the average performance in the general populationnormative sample The z scores lt0 represent the number of SDs below thegeneral population average for the chronologic age band Error bars rep-resent 95 confidence intervals Comparison between fragile X syndrome(FXS) and intellectual disability (ID) of other or unknown cause daggerComparisonbetween FXS and Down syndrome p lt 005 p lt 001 p lt 0001 daggerp lt005 daggerdaggerp lt 001 daggerdaggerdaggerp lt 0001 DCCS = Dimensional Change Card Sort LS =List Sorting Working Memory OR = Oral Reading and Recognition PC =Pattern Comparison Processing Speed PSM = Picture Sequence MemoryPVT = Picture Vocabulary

e1238 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 11: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

study participants to explore natural developmental changeswithin each NIHTB-CB test and the composites relative tomeasures already established as change sensitive such as theSB-5 and Vineland The present results warrant the next stepof evaluating the NIHTB-CB in ID for individuals down toa mental age of 5 years to demonstrate the treatment-specificsensitivity of the battery and to determine the degree to whichmeasure gains reflect functional improvements in daily lifeBelow a mental age of 5 years the NIHTB-CB performs morevariably and adaptations to the lower test ranges or scoringadaptations are needed for some measurement domainsStudies of the performance of the battery in older adults withID are needed especially focusing on those experiencingcognitive decline or dementia Overall the present validationresults represent an important step toward providing an ob-jective scalable and standardized method for successfullymeasuring cognition and tracking cognitive changes in ID

AcknowledgmentThe authors extend their appreciation to the families whogave their time and effort to participate in the study and toLeonard Abbeduto LeAnn Baer Ruth McClure Barnes KyleBersted Mikayla Brown Ana Candelaria Erin CarmodyDarian Crowley Suzanne Delap Randi Hagerman AnneHoffmann Londi Howard Paige Landau Caroline LeonczykMichael Nelson Jacklyn Perales Lacey Pomerantz ShanelleRodriguez Melanie Rothfuss Ryan Shickman AndreaSchneider Haleigh Scott Laurel Snider Rachel Teune TaliaThompson Denny Tran and Jamie Woods for theircontributions to the study

Study fundingFunding provided by the National Institute of Child Healthand Human Development (R01HD076189) Health andHuman Services Administration of Developmental Dis-abilities (90DD0596) the MIND Institute Intellectual andDevelopmental Disabilities Research Center (U54HD079125) and the National Center for Advancing Trans-lational Sciences NIH through grant UL1 TR000002

DisclosureR Shields A Kaat F McKenzie A Drayton S Sansone JColeman and C Michalak report no disclosures K Riley hasreceived compensation for consulting to Novartis regardingFXS clinical trials E Berry-Kravis has received funding (allfunding including consulting goes to Rush University MedicalCenter) from Seaside Therapeutics Novartis Roche AlcobraNeuren Cydan Fulcrum GW Neurotrope Marinus AcadiaZynerba BioMarin Ovid Acadia Yamo Ionis Lumos Ultra-genyx Vtesse Sucampo and Mallinkcrodt Pharmaceuticals toconsult on trial design or development strategies andor toconduct clinical trials in FXS or other neurogenetic disordersand from Asuragen Inc to develop testing standards and to dovalidation testing for FMR1 and SMN testing R Gershon andK Widaman report no disclosures D Hessl has receivedfunding for consulting from Novartis Roche Zynerba

Autifony and Ovid pharmaceutical companies regarding FXSclinical trials Go to NeurologyorgN for full disclosures

Publication historyReceived by Neurology August 14 2019 Accepted in final formOctober 31 2019

References1 Olusanya BO Davis AC Wertlieb D et al Developmental disabilities among children

younger than 5 years in 195 countries and territories 1990-2016 a systematic analysis forthe Global Burden of Disease Study 2016 Lancet Glob Health 20186e1100ndashe1121

2 Lee AW Ventola P Budimirovic D Berry-Kravis E Visootsak J Clinical developmentof targeted fragile X syndrome treatments an industry perspective Brain Sci 20188E214

3 Picker JD Walsh CA New innovations therapeutic opportunities for intellectualdisabilities Ann Neurol 201374382ndash390

4 Budimirovic DB Berry-Kravis E Erickson CA et al Updated report on tools tomeasure outcomes of clinical trials in fragile X syndrome J Neurodev Disord 2017914

5 Esbensen AJ Hooper SR Fidler D et al Outcome measures for clinical trials in Downsyndrome Am J Intellect Dev Disabil 2017122247ndash281

Appendix Authors

Name Location Contribution

Rebecca HShields MS

MIND InstituteSacramento CA

Authored the manuscriptperformed the statisticalanalysis and coordinated thestudy

Aaron J KaatPhD

NorthwesternUniversityChicago IL

Directed NIHTB-CB activities forthe study advised on protocoland analysis and authoredportions of the manuscript

Forrest JMcKenzie BS

MIND InstituteSacramento CA

Authored portions of themanuscript

AndreaDrayton BS

MIND InstituteSacramento CA

Authored portions of themanuscript

Stephanie MSansone PhD

MIND InstituteSacramento CA

Assisted in developing protocoland coordinated the study

JeanineColeman PhD

University ofDenver CO

Coordinated assessments at theUniversity of Denver and criticallyreviewed the manuscript

ClaireMichalak BS

Rush UniversityMedical CenterChicago IL

Coordinated assessments andadministered NIHTB-CB at RushUniversity

Richard CGershon PhD

NorthwesternUniversityChicago IL

PI and director of the NIHToolbox and critically reviewedthe manuscript

Karen RileyPhD

University ofDenver CO

PI at the University of Denver andcritically reviewed themanuscript

ElizabethBerry-KravisMD PhD

Rush UniversityMedical CenterChicago IL

PI at Rush University and criticallyreviewed the manuscript

Keith FWidamanPhD

University ofCaliforniaRiverside

Supervised analysis planperformed the statisticalanalysis and critically reviewedthe manuscript

David HesslPhD

MIND InstituteSacramento CA

Designed the study obtainedfunding directed the multisitestudy and authored portions ofthe manuscript

NeurologyorgN Neurology | Volume 94 Number 12 | March 24 2020 e1239

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 12: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

6 GershonRCWagsterMVHendrieHC FoxNA CookKFNowinski CJ NIHToolboxfor assessment of neurological and behavioral function Neurology 201380S2ndashS6

7 Weintraub S Bauer PJ Zelazo PD et al I NIH toolbox cognition battery (CB)introduction and pediatric data Monogr Soc Res Child Dev 2013781ndash15

8 Casaletto KB Umlauf A Beaumont J et al Demographically corrected normativestandards for the English version of the NIH Toolbox Cognition Battery J IntNeuropsychol Soc 201521378ndash391

9 Hessl D Sansone SM Berry-Kravis E et al The NIH Toolbox Cognitive Battery forintellectual disabilities three preliminary studies and future directions J NeurodevDisord 2016835

10 Berry-Kravis E Lindemann L Jonch AE et al Drug development for neuro-developmental disorders lessons learned from fragile X syndrome Nat Rev DrugDiscov 201817280ndash299

11 Parker SE Mai CT Canfield MA et al Updated national birth prevalence estimatesfor selected birth defects in the United States 2004-2006 Birth Defects Res A ClinMol Teratol 2010881008ndash1016

12 Hunter J Rivero-Arias O Angelov A Kim E Fotheringham I Leal J Epidemiology offragile X syndrome a systematic review and meta-analysis Am J Med Genet A 2014164A1648ndash1658

13 Roid GMiller L Stanford-Binet Intelligence Scales 5th ed San Antonio Pearson 200314 Sparrow S Cicchetti D Saulnier C Vineland Adaptive Behavior Scales 3rd ed San

Antonio Pearson 201615 Weintraub S Dikmen SS Heaton RK et al Cognition assessment using the NIH

Toolbox Neurology 201380S54ndashS6416 Korkman M Kirk U Kemp S NEPSY-II A Developmental Neuropsychological

Assessment San Antonio Pearson 200717 Conners C Conners Kiddie Continuous Performance Test 2nd ed Toronto Multi-

Health Systems Inc 201518 Wechsler D Wechsler Preschool and Primary Scale of Intelligence 4th ed San

Antonio Pearson 201219 Roid G Miller L Pomplum M Koch C Leiter International Performance Scale 3rd

ed Wood Dale Stoelting Co 201320 Dunn LDunnD Peabody Picture Vocabulary Test 4th ed San Antonio Pearson 200721 Woodcock R Johnston M Woodcock-Johnson Tests of Achievement 4th ed Itasca

Riverside Publishing 201422 McKenzie FJ Drayton A Shields R et al NIH Toolbox Cognitive Battery supple-

mental administratorrsquos manual for intellectual and developmental disabilities [online]Available at nihtoolboxdeskcomcustomerportalarticles2981718 AccessedSeptember 30 2019

23 Thompson T Coleman JM Riley K et al Standardized assessment accommodationsfor individuals with intellectual disability Contemp Sch Psychol 201822443ndash457

24 Sansone SM Schneider A Bickel E Berry-Kravis E Prescott C Hessl D ImprovingIQ measurement in intellectual disabilities using true deviation from populationnorms J Neurodev Disord 2014616

25 Akshoomoff N Beaumont JL Bauer PJ et al VIII NIH Toolbox Cognition Battery(CB) composite scores of crystallized fluid and overall cognition Monogr Soc ResChild Dev 201378119ndash132

26 Zelazo P Bauer P National Institutes of Health Toolbox cognition battery (NIHToolbox CB) validation for children between 3 and 15 years Monogr Soc Res ChildDev 2013781ndash172

27 Hooper SR Hatton D Sideris J et al Executive functions in young males with fragileX syndrome in comparison to mental age-matched controls baseline findings froma longitudinal study Neuropsychology 20082236ndash47

28 Wilding J Cornish K Munir F Further delineation of the executive deficit in maleswith fragile-X syndrome Neuropsychologia 2002401343ndash1349

29 Pennington BF Moon J Edgin J Stedron J Nadel L The neuropsychology of Downsyndrome evidence for hippocampal dysfunction Child Dev 20037475ndash93

30 Vicari S Bellucci S Carlesimo GA Implicit and explicit memory a functional dis-sociation in persons with Down syndrome Neuropsychologia 200038240ndash251

31 Wechsler D Weschler Intelligence Scale for Children 5th ed Bloomington Pearson2014

32 Abbeduto L Murphy MM Cawthon SW et al Receptive language skills of adoles-cents and young adults with down or fragile X syndrome Am J Ment Retard 2003108149ndash160

33 Godfrey M Lee NR Memory profiles in Down syndrome across developmenta review of memory abilities through the lifespan J Neurodev Disord 2018105

34 Dykens EM Hodapp RM Leckman JF Strengths and weaknesses in the intellectualfunctioning of males with fragile X syndrome Am J Ment Defic 198792234ndash236

35 Carlozzi NE Tulsky DS Kail RV Beaumont JL VI NIH Toolbox Cognition Battery(CB) measuring processing speed Monogr Soc Res Child Dev 20137888ndash102

36 Hessl D Nguyen DV Green C et al A solution to limitations of cognitive testing inchildren with intellectual disabilities the case of fragile X syndrome J NeurodevDisord 2009133ndash45

37 Wiebe SA Espy KA Charak D Using confirmatory factor analysis to understandexecutive control in preschool children I latent structure Dev Psychol 200844575ndash587

38 Mungas D Widaman K Zelazo PD et al VII NIH Toolbox Cognition Battery (CB)factor structure for 3 to 15 year olds Monogr Soc Res Child Dev 201378103ndash118

e1240 Neurology | Volume 94 Number 12 | March 24 2020 NeurologyorgN

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology

Page 13: ValidationoftheNIHToolboxCognitiveBattery in intellectual disability · iPad-based battery of brief memory, executive function (EF), processingspeed,andlanguagetests,wasdevelopedwithinthe

DOI 101212WNL0000000000009131202094e1229-e1240 Published Online before print February 24 2020Neurology

Rebecca H Shields Aaron J Kaat Forrest J McKenzie et al Validation of the NIH Toolbox Cognitive Battery in intellectual disability

This information is current as of February 24 2020

ServicesUpdated Information amp

httpnneurologyorgcontent9412e1229fullincluding high resolution figures can be found at

References httpnneurologyorgcontent9412e1229fullref-list-1

This article cites 28 articles 2 of which you can access for free at

Citations httpnneurologyorgcontent9412e1229fullotherarticles

This article has been cited by 1 HighWire-hosted articles

Subspecialty Collections

httpnneurologyorgcgicollectionexecutive_functionExecutive function

httpnneurologyorgcgicollectiondevelopmental_disordersDevelopmental disorders

httpnneurologyorgcgicollectionautismAutismfollowing collection(s) This article along with others on similar topics appears in the

Permissions amp Licensing

httpwwwneurologyorgaboutabout_the_journalpermissionsits entirety can be found online atInformation about reproducing this article in parts (figurestables) or in

Reprints

httpnneurologyorgsubscribersadvertiseInformation about ordering reprints can be found online

ISSN 0028-3878 Online ISSN 1526-632XWolters Kluwer Health Inc on behalf of the American Academy of Neurology All rights reserved Print1951 it is now a weekly with 48 issues per year Copyright Copyright copy 2020 The Author(s) Published by

reg is the official journal of the American Academy of Neurology Published continuously sinceNeurology