NIH Toolbox for Assessment of Neurological and Behavioral Function--Cognition Battery, V3

Last Updated

August 28, 2024

Discover Careers at Shirley Ryan С��

We're hiring! Browse jobs and apply today.

Learn More

Purpose

The purpose of NIHTB-CB was to design a brief assessment tool for variety of researchers that would measure neurological functions across the lifespan, but with a particular emphasis on longitudinal epidemiologic studies and intervention trails.

Link to Instrument

Acronym NIHTB-CB

Area of Assessment

Attention & Working Memory
Cognition
Executive Functioning
Language
Processing Speed

Assessment Type

Performance Measure

Administration Mode

Computer

Cost

Not Free

Actual Cost

$599.99

Cost Description

The annual subscription costs for the NIHTB (V3) is $599.99 plus the cost of equipment (iPad, Bluetooth keyboard, batteries).

The yearly subscription price is for up to 2 iPads. Additional subscription options are up to 6 devices for $1,499.99 and up to 10 for $2,499.99. It also includes the NIH Toolbox batteries for emotion, motor and sensory domains. Plus, access to PROMIS, Neuro-QOl, TBI-QOL, SCI-QLI and SCI-fI, and ECOG. The subscription is auto-renewing. There is a free 14 day trial period, but one is not able to save data or export reports.

The cost of the public V2 apps will increase to $749.99 with anticipated annual increases of $250 on June 1 from 2025 through 2027.

Neurological Disorders

Stroke

Multiple Sclerosis

Mixed Populations

Intellectual Disability

Brain Injury

Brief measures that were designed to provide a common currency between researchers in the cognitive domains to assess domains that are important for school and work.
The NIH Toolbox Version 3 (V3) update was released in 2023 with an enhanced user interface, advanced data management capabilities, and expanded features including addition of new and improved tests with updated normed scores. The NIH Toolbox V2 will be retired, with annual subscriptions for both the English and Spanish versions no longer sold after June 30, 2027 and the last subscription expiring on June 30, 2028. Official support for the V2 app will cease after August 2028.
Seven main sub tests that cover attention, executive functioning, language, memory, and processing speed.
A new normative sample was collected for the NIH Toolbox Cognition Tests and Standing Balance Test (Motor Domain) as part of the 2023 Version 3 update with sample demographics representative of the 2020 U.S. Census.
For those ages 4-6 there is an Early Childhood Composite score derived from the scores for the Dimensional Change Card Sort, Flanker, Picture Sequence Memory, Picture Vocabulary, and Speeded Matching tests.
For those age 7+ there is a Total Cognition Composite score that is the sum of the Crystalized Composite (Picture Vocabulary and Oral Reading Recognition tests) and Fluid Composite (Dimensional Change Card Sort, Flanker, Picture Sequence Memory, List Sorting, and Pattern Comparison tests) scores.
Supplemental tests not used in the calculation of the composite scores can be used to better understand the participant��s cognitive functioning.
Results can be accessed through a Score Report or .CSV file once the assessment has been completed.
The NIHTB-CB was designed to be quick, reliable, and easy to administer.
The instrument uses IRT and CAT to assess constructs quickly

Number of Items

? NIH Toolbox Picture Vocabulary Test: 25 items
? NIH Toolbox Flanker Inhibitory Control and Attention Test: 20 items
? NIH Toolbox List Sorting Working Memory Test: 12 items
? NIH Toolbox Dimensional Change Card Sort Test: 30 mixed items
? NIH Toolbox Pattern Comparison Processing Speed Test: 130 items or 85 seconds
? NIH Toolbox Picture Sequence Memory Test: 2 test sequences
? NIH Toolbox Oral Reading Recognition Test: 25 items
? *NIH Toolbox Oral Symbol Digit Test: 144 items
? *NIH Auditory Verbal Learning Test: 15 items, 3 trials

*Supplemental tests that can be used to better understand the participant��s cognitive functioning.

NIH Toolbox App from iTunes
11 inch iPad Air or iPad Pro
Bluetooth keyboard (including batteries)
Laminated sheet containing key and nine practice items on one side and the key and test items on the other side for oral symbol digit test
Home base (downloaded from NIH toolbox website)
Pronunciation guide
NIH Toolbox Oral Reading Recognition Test Training and Certification Materials (contact: cognition@nihtoolbox.org)

Up to 32 minutes

NIHTB (V3)-CB

Time to administer a complete NIH Toolbox (V3) Cognitive Battery takes up to 32 minutes, however, due to individual difference in test takers, test administrators and other unknowns the test time may differ. The times for the individual tests are:
? NIH Toolbox Picture Vocabulary Test: 3 minutes
? NIH Toolbox Flanker Inhibitory Control and Attention Test: 3 minutes
? NIH Toolbox List Sorting Working Memory Test: 7 minutes
? NIH Toolbox Dimensional Change Card Sort Test: 4 minutes
? NIH Toolbox Pattern Comparison Processing Speed Test: 4 minutes
? NIH Toolbox Picture Sequence Memory Test: 7 minutes
? NIH Toolbox Oral Reading Recognition Test: 4 Minutes
? *NIH Oral Symbol Digit Test: 3 minutes
? *NIH Rey Auditory Verbal Learning Test: 4 minutes
? *NIH Visual Reasoning Test: 7 minutes
? *NIH Face Name Associative Memory Exam Test: 7 minutes
? *NIH Speeded Matching Test: 3 minutes

*Supplemental tests can be used to better understand the participant��s cognitive functioning and are not included in the overall time estimate.

NIHTB (V2)-CB

Time to administer a complete NIH Toolbox (V2) Cognitive Battery takes up to 31 minutes, however, due to individual difference in test takers, test administrators and other unknowns the test time may differ. The times for the individual tests are:
? NIH Toolbox Picture Vocabulary Test: 4 minutes
? NIH Toolbox Flanker Inhibitory Control and Attention Test: 3 minutes
? NIH Toolbox List Sorting Working Memory Test: 7 minutes
? NIH Toolbox Dimensional Change Card Sort Test: 4 minutes
? NIH Toolbox Pattern Comparison Processing Speed Test: 3 minutes
? NIH Toolbox Picture Sequence Memory Test: 7 minutes
? NIH Toolbox Oral Reading Recognition Test: 3 Minutes
? *NIH Oral Symbol Digit Test: 3 minutes
? *NIH Auditory Verbal Learning Test: 3 minutes

*Supplemental tests can be used to better understand the participant��s cognitive functioning and are not included in the overall time estimate.

Required Training

Training Course

Required Training Description

There are two avenues for training, eLearning and Workshops for the NIH Toolbox.

Links to online videos for the administration of the NIH Cognition Battery, ��How To�� videos, and virtual conference and workshop recordings may be found here: https://www.healthmeasures.net/NIH_Toolbox_iPad_e-learning/story_html5.html.

In-person workshops are a day and half, and scheduled throughout the year at various locations. Upcoming scheduled workshops are listed here: https://www.healthmeasures.net/index.php?option=com_content&view=category&layout=blog&id=128&Itemid=934

For the cognition battery, the test administrator is required to have level C classifications.

Preschool Children

3 - 5

years

Child

6 - 12

years

Adolescent

13 - 17

years

Adult

18 - 64

years

Elderly Adult

65 +

years

Instrument Reviewers

Constance Richard, MS, CRC, University of Wisconsin-Madison doctoral student under the direction of Timothy Tansey, PhD, Rehabilitation Psychology & Special Education Department, School of Education, University of Wisconsin-Madison

Kevin Fearn, MS, Shirley Ryan С��

ICF Domain

Body Function
Body Structure
Activity
Participation

Measurement Domain

Cognition

Professional Association Recommendation

None found �C last searched 8/28/2024

Considerations

An NIH Infant and Toddler ��Baby�� Toolbox (ages 0-42 months) containing more than 30 assessments of Cognition, Motor, and Social-Emotional domains in one iPad app is expected to be released in the second half of 2024.

Normative Data

Mild & Moderate/Severe Stroke: (Carlozzi et al, 2017a; n = 131 (n = 71 mild stroke; n = 60 moderate/severe stroke��54 moderate and 6 severe); mean age = 57.5 (12.6) years; age range = 22-83 years; male = 51%; median time post CVA = 29.0 months (range = 12.5-87.3 months); mean = 31.5 (11.8) months)

National Institute of Health (NIH) Toolbox (NIHTB) �C Cognition Battery T Scores for Individuals with Mild vs Moderate/Severe Stroke

NIHTB scores	n	Mild stroke Mean (SD)	n	Moderate/severe stroke Mean (SD)
Composite scores*
Fluid	71	42.71 (12.64)	42	34.00 (9.57)
Crystallized	75	50.54 (11.73)	50	45.72 (10.85)
Subtest scores
Picture vocabulary	77	50.65 (12.77)	51	47.08 (11.38)
Oral reading recognition	75	50.10 (10.18)	50	45.22 (10.76)
Picture sequence memory test	73	45.86 (12.96)	45	35.98 (11.34)
Pattern comparison	74	45.10 (11.25)	49	38.05 (9.13)
List sorting	74	45.70 (10.26)	45	42.21 (10.86)
Flanker	77	44.74 (10.89)	50	37.76 (9.53)
DCCS	77	44.31 (10.11)	49	38.94 (8.83)

*Fluid cognitive composite score combines Dimensional Change Card Sort (DCCS) Test, Flanker Test of Executive Function Inhibitory Control and Attention, Picture Sequence Memory Test of episodic memory, List Sorting Working Memory Test, and Pattern Comparison Processing Speed Test. Crystallized cognitive composite score includes Picture Vocabulary and Oral Reading scores.

Stroke: (Carlozzi et al, 2017b; n = 211; mean age = 56.13 (12.97); female = 50.2%, mean time post CVA = 2.74 (2.46) years)

NIHTB Cognition Battery T Scores for Stroke Participants^a

NIHTB scores	n	Mean (SD)	% Impaired^b
Composite scores
Fluid	176	40.51 (11.59)	49.2
Crystallized	176	49.54 (10.85)	19.2
Subtest scores
Picture vocabulary	201	49.08 (12.06)	22.3
Oral reading recognition	201	48.34 (10.58)	23.3
Picture sequence memory test	175	42.64 (12.38)	43.8
Pattern comparison	175	43.13 (10.74)	39.2
List sorting	175	43.98 (9.96)	40.3
Flanker	175	42.08 (10.58)	40.3
DCCS	175	42.46 (9.47)	40.3

^aT scores are demographically adjusted for age, sex, education, and race/ethnicity.

^b% impairment reflects individuals with scores >1 SD beyond the mean on a given test in the negative direction.

Construct Validity

Convergent validity:

Stroke: (Carlozzi et al, 2017a; all correlations significant at p < 0.01)

Adequate convergent validity between Flanker and Delis Kaplan Executive Functioning System (DKEFS) Interference (r = 0.46; r = 0.50 with motor function included as a covariate)
Adequate convergent validity between Dimensional Change Card Sort (DCCS) and DKEFS Interference (r = 0.54; r = 0.40 with motor function included as a covariate)
Excellent convergent validity between List Sorting Working Memory and Wechsler Adult Intelligence Scale Letter Number Sequencing, Fourth Edition (WAIS-IV LN) (r = 0.66)
Adequate convergent validity between Pattern Comparison and WAIS-IV Coding (CD) (r = 0.59; r = 0.58 with motor function included as a covariate)
Excellent convergent validity between Pattern Comparison and WAIS-IV Symbol Search (SS) (r = 0.67; r = 0.66 with motor function included as a covariate)
Adequate convergent validity between Picture Sequence Memory and Auditory Verbal Learning Test (Rey) (AVLT) (r = 0.52)
Excellent convergent validity between Picture Sequence Memory and Brief Visuospatial Memory Test-Revised (BVMT-R) (r = 0.65)
Excellent convergent validity between Picture Vocabulary and Peabody Picture Vocabulary Test, Fourth Edition (PPVT-IV) (r = 0.87)
Excellent convergent validity between Oral Reading Recognition and Word Reading Achievement Test-Fourth Edition (WRAT-IV) (r = 0.88)

Discriminant validity

Stroke: (Carlozzi et al, 2017a)

Convergent and Discriminant Validity for NIHTB Scores for Combined Stroke Sample (n = 131)*

NIHTB Scores	AVLT	BVMT	WRAT	PPVT	WAIS CD	WAIS SS	WAIS LN	DKEFS
PS	0.45	0.62	0.29	0.36	0.50	0.54	0.38	0.30
OR	0.32	0.54	0.88	0.77	0.47	0.48	0.66	0.48
PV	0.40	0.61	0.74	0.87	0.47	0.52	0.57	0.48
PC	0.28 (0.23)	0.38 (0.36)	0.25 (0.24)	0.38 (0.38)	0.59 (0.57)	0.67 (0.66)	0.35 (0.32)	0.47 (0.44)
LS	0.40	0.60	0.61	0.58	0.50 (0.49)	0.54 (0.54)	0.64	0.47
DCCS	0.45 (0.41)	0.57 (0.55)	0.51 (0.50)	0.55 (0.55)	0.60 (0.57)	0.65 (0.64)	0.49 (0.47)	0.54 (0.51)
Flank-er	0.33 (0.28)	0.50 (0.48)	0.37 (0.36)	0.50 (0.50)	0.61 (0.58)	0.61 (0.59)	0.41 (0.38)	0.46 (0.43)

*All correlations p < .01

Note: NIHTB scores: PS = Picture sequencing; OR = Oral reading recognition; PV = Picture vocabulary; PC = Pattern comparison; LS = List sorting; DCCS = Dimensional Change Card Sort; Flanker = Flanker inhibitory control and attention; AVLT = Auditory Verbal Learning Test (Rey); BVMT = Brief Visuospatial Memory Test-Revised; WRAT = Word Reading Achievement Test; PPVT = Peabody Picture Vocabulary Test, Fourth Edition; WAIS CD = Wechsler Adult Intelligence Scale �C Digit Symbol Coding; WAIS SS = Wechsler Adult Intelligence Scale �C Symbol Search; WAIS LN = Wechsler Adult Intelligence Scale �C Letter Number Sequencing, Fourth Edition; DKEFS = Delis Kaplan Executive Functioning System

NIHTB scores discriminated between mild and moderate/severe stroke (see Normative Data for mean T scores):
- Individuals with moderate/severe stroke performed significantly worse on the Fluid composite score and subtests: Picture sequence memory test, Pattern comparison, Flanker, and DCCS (p <0.01)
- Persons with moderate/sever stroke also performed significantly worse on the Crystallized composite score and subtest for Oral reading recognition (p < 0.05)

Content Validity

The Cognition Battery (CB) team selected the subdomains of attention, executive function, episodic memory, language, working memory, and processing speed based on two Requests for Information (RFIs) that were solicited online to obtain the input from 293 experts. The RFIs were then followed-up by telephone interviews of a subset of 44 experts. After selecting the specific tests for the subdomains, the NIH Toolbox was presented to an expert advisory panel who provided written critiques of the subdomains and instruments that were reviewed and addressed by the CB team. Lastly, several conference calls were conducted with 16 expert consultants to present the version of the CB created for validation testing and to invite feedback prior to initiating the validation study (Weintraub et al., 2013a).

Responsiveness

Stroke: (Carlozzi et al, 2017a)

Effect sizes (Cohen��s d) by stroke severity for NIHTB �C Cognition Battery composite and subtest scores

NIHTB scores	Mild stroke	Moderate/severe stroke
Composite scores
Fluid	-0.64	-1.64
Crystallized	0.05	-0.41
Subtest scores
Picture vocabulary	0.06	-0.27
Oral reading recognition	0.01	-0.46
Picture sequence memory test	-0.36	-0.98
Pattern comparison	-0.46	-1.25
List sorting	-0.42	-0.75
Flanker	-0.50	-1.25
DCCS	-0.57	-1.18

Cut-Off Scores

Brain Injury: (Tulskey et al, 2017; n = 182 (mild or moderate, n = 83 or severe, n = 99); mean age = 38.6 (17.4) years; mean time since TBI = 5.8 years (5.6) years).

16^th percentile (T score �� 40) is the cutoff for a low score. Complicated mild/moderate TBI and severe TBI respectively had individuals with 59% and 75% of individuals have one or more low scores. Normative base rate of 46% suggests that .54 specificity is poor.

Normative Data

Brain Injury: (Tulskey et al, 2017)

National Institute of Health (NIH) Toolbox (NIHTB) Demographically Adjusted T Scores for Individuals with Complicated Mild/Moderate TBI, Severe TBI, and Matched Controls

NIHTB scores	Comp mild/moderate TBI Mean (SD) (n = 74)	Severe TBI Mean (SD) (n = 84)	Control Mean (SD) (n = 158)
Composite scores*
Fluid	92.2 (18.7)	82.4 (18.4)	101.6 (16.0)
Crystallized	103.2 (15.9)	96.2 (14.7)	101.7 (15.2)
Subtest scores
Picture vocabulary	102.9 (15.4)	97.1 (14.2)	101.5 (15.4)
Oral reading recognition	103.2 (15.8)	96.4 (15.3)	102.0 (15.5)
Picture sequence memory test	92.5 (16.2)	84.2 (15.5)	100.2 (14.3)
Pattern comparison	96.7 (15.9)	89.8 (17.2)	101.8 (15.9)
List sorting	96.3 (15.5)	91.2 (15.0)	102.1 (15.3)
Flanker	94.0 (17.2)	86.2 (16.4)	100.8 (14.6)
DCCS	94.9 (16.7)	90.1 (16.6)	101.1 (15.5)

Brain Injury: (Carlozzi et al, 2017); n = 184; mean age = 39.83 (17.4) years; male = 64.1%; mean time since TBI = 5.95 (5.54) years)

NIHTB Cognition T Scores for TBI Participants^a

NIHTB scores	n	Mean (SD)	% Impaired^b
Composite scores
Fluid	163	41.75 (13.00)	41.1
Crystallized	163	49.79 (9.54)	16.6
Subtest scores
Picture vocabulary	177	49.41 (9.60)	15.8
Oral reading recognition	177	49.86 (10.82)	16.9
Picture sequence memory test	161	42.35 (10.88)	43.5
Pattern comparison	161	45.44 (10.86)	28.0
List sorting	161	44.75 (10.55)	32.3
Flanker	161	43.21 (11.10)	38.5
DCCS	161	44.42 (10.49)	33.5

^aT scores are demographically adjusted for age, sex, education, and race/ethnicity.

^b% impairment reflects individuals with scores >1 SD beyond the mean on a given test in the negative direction.

Construct Validity

Convergent validity:

Brain Injury: (Tulskey et al., 2017)

Excellent correlation between Oral Reading Recognition and Wide Range Achievement Test, 4^th Edition (r = 0.83)
Excellent correlation between Picture Vocabulary and Peabody Picture Vocabulary Test, 4^th Edition (r = 0.80)
Adequate correlation between List Sorting and WAIS-IV Letter Number Sequencing (r = 0.56)
Excellent correlation between Picture Sequence Memory and Brief Visuospatial Memory Test-Revised (r = 0.68)
Excellent correlation between Pattern Comparison and WAIS-IV Coding (r = 0.69)
Adequate correlation between Flanker Inhibitory Control and DKEFS Color-Word Interference-Inhibition (r = -0.46)
Adequate correlation between Dimensional Change Card Sort and Wisconsin Card Sort Test (r = -0.42)

Discriminant validity:

Repetitive Head Impact (RHI): (Amadon et al., 2023; n = 176, mean age = 21.19 (1.63), female = 60 (34%) (n = 115 in contact sport, mean age = 21.46 (1.65), female = 20 (17%); and 61 in non-contact sport, mean age = 20.67 (1.45), female = 40 (66%))

Significant discriminate ability across the whole sample of the Picture Sequence Memory Test (episodic memory) to distinguish between contact sport and non-contact sport athletes (F(1,171) = 7.16, p = 0.008), years of exposure (F(1,171) = 4.19, p = 0.042), and extensive vs. non-extensive RHI exposure based on traumatic encephalopathy syndrome (TES) criteria (F(1,171) = 5.12, p = 0.025)
- A similar��although not significant--effect was observed for age of first exposure (AFE), with athletes with AFE < 12 having better episodic memory than those with AFE 12+/none (F(1,171) = 3.88, p = 0.05)
Significant discriminate ability among females of the Picture Sequence Memory Test (episodic memory) to distinguish between contact sport and non-contact sport athletes (F(1,56) = 5.05, p = 0.03), years of exposure (F(1,56) = 4.54, p = 0.04), athletes with AFE < 12 vs. those with AFE 12+/none (F(1,56) = 4.76, p = 0.03), and extensive vs. non-extensive RHI exposure based on TES criteria (F(1,56) = 4.99, p = 0.03)
Although the direction of the relationships was the same as those seen in women, among males there were no significant relationships between RHI measures and episodic memory performance.

Content Validity

Test/Retest Reliability

Intellectual Disability: (Shields et al., 2020; n = 242, mean age = 15.71 (5.15) years (Down syndrome = 91, mean age = 15.88 (5.17) years; Fragile X syndrome = 75, mean age = 16.16 (4.92) years; Other ID = 76, mean age = 15.05 (5.35) years); chronological age from 6 through 25 years; full-scale IQ <80 on Stanford-Binet, 5^th edition (SB-5); mental age of at least 3.0 years on the SB-5)

Acceptable Total test-retest reliability for Flanker Inhibitory Control and Attention (ICC = 0.74)
Acceptable test-retest reliability for Dimensional Change Card Sort (ICC = 0.71)
Acceptable Total test-retest reliability for List Sorting Working Memory (ICC = 0.74)
Acceptable Total test-retest reliability for Pattern Comparison (ICC = 0.77)
Poor Total test-retest reliability for Picture Sequence Memory (ICC = 0.47 with form A-A and ICC = 0.55 with form A-B)
Acceptable Total test-retest reliability for Picture Vocabulary (ICC = 0.85)
Excellent Total test-retest reliability for Oral Reading and Recognition (ICC = 0.96)
Acceptable Total test-retest reliability for Fluid Composite (ICC = 0.83)
Excellent Total test-retest reliability for Crystalized Composite (ICC = 0.93)
Excellent Total test-retest reliability for Cognitive Function Composite (ICC = 0.92)

Intellectual Disability: (Hessl et al., 2016; Fragile X Syndrome: n = 63, mean age = 19.3 (8.3) years, mean mental age = 5.3 (1.6) years )

Acceptable test-retest validity for Flanker Inhibitory Control and Attention (ICC = 0.75)
Acceptable test-retest validity for Dimensional Change Card Sort (ICC = 0.88)
Acceptable test-retest validity for List Sorting Working Memory (ICC = 0.84)
Excellent test-retest validity for Pattern Comparison (ICC = 0.90)
Acceptable test-retest validity for Picture Sequence Memory (ICC = 0.76)
Acceptable test-retest validity for Picture Vocabulary (ICC = 0.77)
Excellent test-retest validity Oral Reading and Recognition (ICC = 0.99)

Construct Validity

Convergent validity:

Intellectual Disability: (Shields et al., 2020)

Adequate convergent validity of Flanker Inhibitory Control and Attention with Conners Kiddie Continuous Performance Test, 2^nd Edition (r = -0.52*)
Adequate convergent validity of Dimensional Change Card Sort with NEPSY Inhibition (NEPSY-IN) subtest (r = 0.48*)
Excellent convergent validity of List Sorting Working Memory with Stanford-Binet, 5^th Edition (SB-5) Verbal Working Memory (r = 0.65*)
Excellent convergent validity of Pattern Comparison with Wechsler Preschool and Primary Scale of Intelligence, 4^th Edition Bug Search (r = 0.66*)
Adequate convergent validity of Picture Sequence Memory with Leiter International Performance Scale, 3^rd Edition Forward Memory (Leiter-FM)(r = 0.47*)
Excellent convergent validity of Picture Vocabulary Test with Peabody Picture Vocabulary Test, 4^th Edition (r = 0.83*)
Excellent convergent validity of Oral Reading and Recognition with Woodcock Johnson 4^th Edition Letter-Word Identification (WJ-LW) (r = 0.92*)
Excellent convergent validity of Fluid Composite with SB-5 Fluid Reasoning IQ (r = 0.60*)
Excellent convergent validity of Crystalized Composite with SB-5 Verbal IQ (r = 0.75*)

*Significant at p < 0.001

Discriminant validity:

Intellectual Disability: (Shields, et al, 2020)

Adequate discriminant validity of Flanker Inhibitory Control and Attention with WJ-LW (r = -0.53*)
Adequate discriminant validity of Dimensional Change Card Sort with WJ-LW (r = 0.36*)
Adequate discriminant validity of List Sorting Working Memory with WJ-LW (r = 0.49*)
Adequate discriminant validity of Pattern Comparison with WJ-LW (r = 0.45*)
Adequate discriminant validity of Picture Sequence Memory with WJ-LW (r = 0.50*)
Adequate discriminant validity of Picture Vocabulary Test with Leiter-FM (r = 0.47*)
Adequate discriminant validity of Oral Reading and Recognition with Leiter-FM (r = 0.58*)
Poor discriminant validity of Fluid Composite with SB-5 Verbal IQ (r = 0.61*)
Poor discriminant validity of Crystalized Composite with SB-5 Fluid Reasoning IQ (r = 0.68*)

*Significant at p < 0.001

Content Validity

Test/Retest Reliability

Mixed populations: (Weintraub et al, 2013b; n = 476, age range = 3-85 years, female = 53.1%, stratified sample of community-dwelling individuals)

Excellent test-retest validity for Flanker Inhibitory Control and Attention (ICC = 0.9?6)
Excellent test-retest validity for Dimensional Change Card Sort (ICC = 0.94)
Acceptable test-retest validity for List Sorting Working Memory (ICC = 0.89)
Acceptable test-retest validity for Pattern Comparison (ICC = 0.82)
Acceptable test-retest validity for Picture Sequence Memory (ICC = 0.78)
Excellent test-retest validity for Picture Vocabulary (ICC = 0.94)
Excellent test-retest validity for Oral Reading and Recognition (ICC = 0.99)

Construct Validity

Convergent validity:

Mixed populations: (Weintraub et al., 2013b, age range = 8-85)

Adequate convergent validity of Flanker Inhibitory Control and Attention with WISC-IV/WAIS-IV Letter-Number Sequencing, Coding, Symbol Search average^a (r = -0.48^b)
Adequate convergent validity of Dimensional Change Card Sort with D-KEFS Inhibition (r = -0.51^b)
Adequate convergent validity of List Sorting Working Memory with WISC-IV/WAIS-IV Letter-Number Sequencing^a/Paced Auditory Serial Addition Test average (r = 0.58^b)
Adequate, convergent validity of Pattern Comparison with WISC-IV/WAIS-IV Coding/Symbol Search average^a(r = 0.49^b)
Excellent convergent validity of Picture Sequence Memory with BVMT-R/Rey Auditory Verbal Learning Test average^a (RALVT) 0.69^b)
Excellent convergent validity of Picture Vocabulary with PPVT-4 (r = 0.78^b)
Excellent convergent validity of Oral Reading and Recognition with Wide Range Achievement Test, 4^th Editon (WRAT-4) (r = 0.93^b)

Discriminant validity:

Mixed populations: (Weintraub et al., 2013b, age range = 8-85)

Excellent discriminant validity of Flanker Inhibitory Control and Attention with PPVT-4 (r = 0.15^c)
Excellent discriminant validity of Dimensional Change Card Sort with PPVT-4 (r = 0.14^d)
Excellent discriminant validity of List Sorting Working Memory with PPVT-4 (r = 0.30^b)
Excellent discriminant validity of Pattern Comparison with PPVT-4 (r = 0.12^d)
Excellent discriminant validity of Picture Sequence Memory with PPVT-4 (r = -0.08)
Excellent discriminant validity of Picture Vocabulary with BVMT-R/RAVLT average^a (r = 0.08)
Excellent discriminant validity of Oral Reading and Recognition with BVMT-R/RAVLT average^a (r = 0.19^b)

^aMeasure used dependent upon subjects�� age.

^bp < 0.001

^cp < 0.01

^dp < 0.05

Content Validity

Normative Data

Multiple Sclerosis: (Manglani et al., 2022; n = 87; age range = 30-59; subjects diagnosed with relapsing-remitting MS; participating in a randomized control trial comparing effects of a physical activity intervention with a water consumption intervention on cognition)

Unadjusted standard scores on NIH Toolbox Cognition Battery

Cognitive Test	Mean	Std. Dev.
Pattern Comparison	98.5	16.8
List Sorting	104	9.58
Picture Sequence Memory	105	15.0
DCCS^a	104	7.79
Flanker	97.5	6.52
Oral Reading	108	5.76
Picture vocabulary^b	111	7.86
NIH Fluid Cognition	102	10.5

^aDimensional Change Card Sort

^bDiscriminant validity measure

Construct Validity

Convergent validity:

Multiple Sclerosis: (Manglani et al., 2022)

Adequate convergent validity of the pattern comparison (Processing Speed) subtest with the WAIS-IV Cancelation Test (mean concordance correlation coefficient (Mccc) = 0.46)
Adequate convergent validity of the list sorting (Working Memory) subtest with the WAIS-IV Digit Span Test (Mccc = 0.34)
Adequate convergent validity of the picture sequence (Episodic Memory) subtest with the California Verbal Learning Test, Second Edition (Mccc = 0.33)
Poor convergent validity of the DCCS and the Flanker Test (Mccc = 0.20) (both Executive Function) with the Delis-Kaplan Executive Function System Sorting Test (DKEFS)

Discriminant validity:

Multiple Sclerosis: (Manglani et al., 2022)

Excellent discriminant validity of the pattern comparison subtest (Processing Speed) with the Picture Vocabulary Test (concordance correlation coefficient (CCC) = 0.071)
Excellent discriminant validity of the list sorting (Working Memory) pattern comparison subtest with the Picture Vocabulary Test (CCC = 0.29)
Excellent discriminant validity of the picture sequence memory subtest with the Picture Vocabulary Test (CCC = 0.19)
Excellent discriminant validity of the DCCS (CCC = 0.16) and the Flanker Test (CCC = 0.11) (both Executive Function) with the Picture Vocabulary Test

Content Validity

Normative Data

Pediatric Epilepsy: (Matuska et al., 2024; n = 47; male = 55%; mean age = 10.0 (3.7) years; age range = 4.2-18.3 years; mean age of seizure onset = 4.9 (3.2) years; number of antiseizure medications at testing: 0 = 8.5%, 1 = 36.2%, 2 = 31.9%, 3 = 14.9%, 4 = 8.5%; history of Electrical Status Epilepticus in Sleep (ESES) = 46.8%)

Average performance of pediatric epilepsy patients on two NIH Toolbox Cognition Battery tests (n = 46)

Cognitive Test	Mean (SD)	Range
Flanker Inhibitory	87.2 (15.1)	54-124
Pattern Comparison	81.0 (24.6)	33-163

Criterion Validity (Predictive/Concurrent)

Concurrent validity:

Pediatric Epilepsy: (Matuska et al., 2024)

Adequate concurrent validity between Flanker Inhibitory Test and Pattern Comparison Test (r = 0.514, p < 0.001)
Adequate concurrent validity between Pattern Comparison Test and Test of Everyday Attention for Children (TEA-Ch) (Sky Search �C Accuracy) (r = 0.459, p < 0.05)
Adequate concurrent validity between Flanker Inhibitory Test and Wechsler Intelligence Scale for Children (WISC) (Working Memory Index (WMI) (r = 0.455, p = 0.009)
Adequate concurrent validity between Pattern Comparison Test and WISC (WMI) (r = 0.381, p = 0.035)
Adequate concurrent validity between Flanker Inhibitory Test and Semantic Fluency (age-appropriate version of A Developmental NEuroPSYchological Assessment, Second Edition (NEPSY-II) or the Delis-Kaplan Executive Function System (D-KEFS)) (r = 0.373, p = 0.023)
Adequate concurrent validity between Pattern Comparison Test and Semantic Fluency (age-appropriate version of NEPSY-II or D-KEFS (r = 0.365, p = 0.028)
Adequate concurrent validity between Flanker Inhibitory Test and Comprehensive Test of Phonological Processing, Second Edition (CTOPP-2) (r = 0.577, p = 0.001)
Adequate concurrent validity between Flanker Inhibitory Test and Wechsler Intelligence Scale for Children (WISC) (Processing Speed Index (PSI) (r = 0.378, p = 0.030)
Adequate concurrent validity between Flanker Inhibitory Test and Grooved Pegboard w/dominant hand (r = 0.554, p < 0.001) and non-dominant hand (r = 0.420, p = 0.008)
Adequate concurrent validity between Flanker Inhibitory Test and Verbal IQ (r = 0.458, p = 0.002)
Adequate concurrent validity between Flanker Inhibitory Test and Nonverbal IQ (r = 0.495, p = 0.002)
Adequate concurrent validity between Pattern Comparison Test and Nonverbal IQ (r = 0.483, p = 0.002)
Agreement statistics conducted between the two NIHTB-CB tasks and the above clinical measures found that overall agreement ranged from 57 to 74% for Flanker and from 52 to 72% for Pattern Comparison.
- Sensitivity was lower for Flanker (42-67%) than for Pattern Comparison (45-83%)
- Specificity was higher for Flanker (70-83%) than for Pattern Comparison (48-83%)

Content Validity

Bibliography

Amadon, G. K., Goeckner, B. D., Brett, B. L., & Meier, T. B. (2023). Comparison of various metrics of repetitive head impact exposure and their associations with neurocognition in college-aged athletes. Archives of Clinical Psychology, 38(5), 714-723.

Carlozzi, N. E., Tulsky, D. S., Wolf, T. J., et al. (2017a). Construct validity of the NIH Toolbox Cognition Battery in individuals with stroke. Rehabilitation Psychology, 62(4), 443-454.

Carlozzi, N. E., Goodnight, S., Casaletto, K. B., et al. (2017b). Validation on the NIH Toolbox in Individuals with Neurologic Disorders. Archives of Clinical Neuropsychology, 32(5), 555-573.

Hessl, D., Sansone, S. M., Berry-Kravis, E., et al. (2016). The NIH Toolbox Cognitive Battery for intellectual disabilities: three preliminary studies and future directions. Journal of Neurodevelopmental Disorders, 8, 35.

Holdnack, J. A., Iverson, G. L., Silverberg, N. D., et al. (2017). NIH Toolbox Cognition Tests Following Traumatic Brain Injury: Frequency of Low Scores. Rehabilitation Psychology, 62(4):474-484

Manglani, H. R., Fisher, M. E., Duraney, E. J., et al. (2022). A promising cognitive screener in multiple sclerosis: The NIH toolbox cognition battery concords with gold standard neuropsychological measures. Multiple Sclerosis Journal, 28(11): 1762-1772.

Matuska, E., Carney, A., Sepeta, L. N., et al. (2024, April). Clinical validation of selected NIH Cognitive Toolbox tasks in pediatric epilepsy. Epilepsy & Behavior, 153, 1-9.

Tulsky, D. S., Carlozzi, N. E., Holdnack J., et al. (2017). Using the NIH Toolbox Cognition Battery in individuals with traumatic brain injury. Rehabilitation Psychology, 62(4):413-424.

Shields, R. H., Kaat, A. J., McKenzie, F. J., et. al. (2020). Validation of the NIH Toolbox Cognition Battery in intellectual disability. Neurology, 94(12):1229-1240.

Weintraub, S., Bauer, P. J., Zelazo, P.D, et al. (2013a). NIH Toolbox Cognition Batter (CB): Introduction and Pediatric Data. Monogr Soc Res Child Dev., 78(4): 1-15.

Weintraub, S., Dikmen, S. S., Heaton, R. K., et al. (2013b). Cognition assessment using the NIH Toolbox. Neurology, 80(11, Supplement 3):s54-64.

С����

Last Updated

Discover Careers at Shirley Ryan С����

Purpose

Link to Instrument

Area of Assessment

Assessment Type

Administration Mode

Cost

Actual Cost

Cost Description

Number of Items

Required Training

Required Training Description

Instrument Reviewers

ICF Domain

Measurement Domain

Professional Association Recommendation

Considerations

Normative Data

Construct Validity

Content Validity

Responsiveness

Cut-Off Scores

Normative Data

Construct Validity

Content Validity

Test/Retest Reliability

Construct Validity

Content Validity

Test/Retest Reliability

Construct Validity

Content Validity

Normative Data

Construct Validity

Content Validity

Normative Data

Criterion Validity (Predictive/Concurrent)

Content Validity

Bibliography

Psychosocial Impact of Ass...

Work and Social Adjustment...

Closed Kinetic Chain Upper...

С��

Discover Careers at Shirley Ryan С��