Bayley Scales of Infant and Toddler Development, Fourth Edition

Link to Instrument

Acronym Bayley-4

Area of Assessment

Infant & Child Development

Assessment Type


Administration Mode

Paper & Pencil


Not Free

Actual Cost


Cost Description

$1,191.60 for the Bayley-4 Complete Kit (Digital); $1,325.20 for the Bayley-4 Complete Kit (Print); $1,499.60 for the Bayley-4 Complete Kit (Print) & Bayley-4 Screening Test Complete Kit (Print). Additional test booklets, manuals, etc. may be purchased for an additional cost.

CDE Status

The Bayley-4 was not listed as a CDE--last checked on 11/14/2023. The following recommendations were provided for the Bayley-III:

Supplemental - Highly Recommended: Congenital Muscular Dystrophy (CMD):
  • Highly recommended for developmental, psychological, and neuropsychological studies of infants and toddlers up to 42 months old.
  • Highly recommended as a means of characterizing study participants.
Supplemental - Highly Recommended: Cerebral Palsy (CP)
Supplemental: Epilepsy, Mitochondrial Disease (Mito), Neuromuscular Disease (NMD), Spinal Muscular Atrophy (SMA), Stroke, and Traumatic Brain Injury (TBI).
Exploratory: SCI-Pediatric (age 1 month-42 months)

Key Descriptions

  • The Bayley-4 consists of 419 items across 5 scales (subtests), each of which can be administered separately:
  • Cognitive Scale (CG): 81 items
  • Language Scale (LANG): 79 items
    ○ Receptive Communication (RC): 42 items
    ○ Expressive Communication (EC): 37 items
  • Motor Scale (MOT): 104 items
    ○ Fine Motor (FM): 46 items
    ○ Gross Motor (GM): 58 items
  • Social-Emotional Scale (SOEM): 35 items
    ○ Social-Emotional (SE)
    ○ Social-Emotional Sensory Processing (SP)
  • Adaptive Behavior Scale (ADBE): 120 items
    ○ Receptive Communication (REC): 23 items
    ○ Expressive Communication (EXP): 28 items
    ○ Daily Living Skills - Personal (PER): 30 items
    ○ Socialization - Interpersonal Relationships (IPR): 20 items
    ○ Socialization - Play and Leisure (PLA): 19 items
  • Cognitive, Language, and Motor subscales include structured items administered to the child, either across a table or on the floor, with the caregiver actively engaged to support as needed. Social-Emotional and Adaptive Behavior subscales are caregiver questionnaires pertaining to observations of the child’s abilities.
  • For standardized administration, the first three scales must be completed in person. The Social-Emotional and Adaptive Behavior scales may be administered remotely online via telehealth or emailed to caregivers as a secure link.
  • Items are scored from 0-2:
    ● 0 = not present
    ● 1 = emerging
    ● 2 = mastery

Number of Items


Equipment Required

  • Administration manual or device for digital administration/scoring
  • Stimulus book
  • Cognitive, Language, and Motor record forms
  • Motor-Response booklet
  • Social-Emotional and Adaptive-Behavior questionnaires
  • Caregiver reports
  • Observational checklist
  • Proprietary manipulatives set

Time to Administer

30-70 minutes

Time dependent on child's age.

Required Training

Reading an Article/Manual

Required Training Description

Masters or doctorate level degree and appropriate training required. Optional online training included with purchase of test kit.

Age Ranges

Infant (days to months)

16 - 23


Preschool Children (months)

24 - 42


Instrument Reviewers

Reviewed by University of Washington Master of Occupational Therapy Students Gwen Drolet, Hamdi Hassan, Nico Lis, and Aga Przysucha under the supervision of Danbi Lee, PhD, OTD, OTR/L (faculty mentor); Division of Occupational Therapy, Department of Rehabilitation Medicine, University of Washington

Body Part

Upper Extremity
Lower Extremity

ICF Domain

Body Structure
Body Function

Measurement Domain

General Health

Professional Association Recommendation

No recommendations found--last checked 11/14/2023.


  • Bayley-4 normative sample was stratified according to the 2017 census data by age, sex, race/ethnicity, and parent education level. The normative sample was also validated using average zip code income which was close to the national average income in 2017 (Aylward & Zhu, 2019).
  • Except for 21 children diagnosed with Down syndrome (1.2%), children identified as at-risk have been excluded from the normative sample of Bayley-4 (Aylward & Zhu, 2019). Inclusion of children with Down syndrome increased the reliability of the Cognitive, Language, and Motor Scales (Bayley & Aylward, 2019). The inclusion of children with Down syndrome in the normative sample did not inflate norms, which was a concern raised with Bayley III (Aylward & Zhu, 2019).
  • The means of clinical groups (children with Down syndrome, autism spectrum disorder, language delay, language impairment, developmental delay, motor impairment, children born moderate/late premature and very/extremely premature, and with prenatal alcohol and drug exposure) were significantly below the matched Bayley-4 normative controls (Aylward & Zhu, 2019).
  • Unlike Bayley III, Bayley-4 uses a polytomous scoring approach (i.e., 2, 1, 0) where 2 = Mastery, 1 = Emerging, 0 =  Not Present.
  • Bayley-4 Social-Emotional Scale is based on the Greenspan Social-Emotional Growth Chart and it is completely consistent with Bayley III (Bayley & Aylward, 2019).
  • Unlike Bayley III, items in the Adaptive Behavior scale are derived from the Vineland Adaptive Behavior Scales –Third Edition (Vineland-3)” (Pearson Education, 2019).
  • Social-Emotional and Adaptive Behavior Questionnaire can be administered remotely via Q-global. Q-global offers web-based administration, scoring, and reporting for Bayley-4.
  • Adjustment for prematurity is recommended to age 3 for language and motor composites, regardless of the degree of prematurity. For cognitive composite, correction for prematurity is needed over the first 24 months. At the most extreme level of prematurity, correction is needed at 3 years of age "when the uncorrected cognitive score is 0.33 to 0.47 SD below the baseline score" (Aylward, 2020).

Mixed Populations

Standard Error of Measurement (SEM)

Pediatric Normative Sample (Bayley, 2019; = 1,700; age range = 16-42 months) 

  • Cognitive (CG), Language (LANG), Motor (MOT) Scales (n = 1,700; Down syndrome, n = 21)
    • SEM for CG: 0.69
    • SEM for RC: 0.96
    • SEM for EC: 0.89
    • SEM for FM: 0.83
    • SEM for GM: 0.80
  • Social-Emotional Scale (n = 320)
    • SEM for SE: 0.97
    • SEM for SP: 1.72
  • Adaptive Behavior Scale (n = 750)
    • SEM for REC: 0.77
    • SEM for EXP: 0.56
    • SEM for PER: 0.78
    • SEM for IPR: 0.89
    • SEM for PLA: 0.80
    • SEM for COM: 2.60
    • SEM for DLS: 3.90
    • SEM for SOC: 3.28
    • SEM for ADBE: 2.12

Note: Age-based SEM available in the manual.

Minimal Detectable Change (MDC)

Pediatric Normative Sample (MDC calculated from Bayley, 2019) 

  • Cognitive, Language, and Motor Scales
    • MDC for CG: 1.91
    • MDC for RC: 2.66
    • MDC for EC: 2.47
    • MDC for FM: 2.30
    • MDC for GM: 2.22
  • Social-Emotional Scale
    • MDC for SE: 2.69
    • MDC for SP: 4.77
  • Adaptive Behavior Scale
    • MDC for REC: 2.13
    • MDC for EXP: 1.55
    • MDC for PER: 2.16
    • MDC for IPR: 2.47
    • MDC for PLA: 2.22
    • MDC for COM: 7.21
    • MDC for DLS: 10.81
    • MDC for SOC: 9.09
    • MDC for ADBE: 5.88 

Normative Data

Pediatric Normative Sample: (Bayley, 2019; subgroup n = 184; mean age = 19.2 (11.8) months)

  • Mean Standard Scores:
    • CG: 10.4
    • RC: 10.3
    • EC: 10.3
    • FM: 10.1
    • GM: 10.3
    • LANG: 101.9
    • MOT: 101.0

Test/Retest Reliability

Pediatric Normative Sample: (Bayley, 2019; subgroup n = 152; mean age = 17.0 (13.2) months; 1-43 days between assessments, mean = 11.8 days))

  • Acceptable test-retest reliability for CG, LANG, and MOT scales (= 0.81-0.85)

Pediatric Normative Sample: (Bayley, 2019; n = 67; mean age = 1.5 (1.0) years; 12-35 days between assessments, mean = 20.2 days)

  • Adaptive Behavior Scale (with caregivers)
    • Acceptable test-retest reliability for all subscales (REC, EXP, PER, IPR, PLA, COM, DLS, SOC, ADBE (r = 0.72-0.87)

Interrater/Intrarater Reliability

Pediatric Normative Sample (Bayley, 2019; Adaptive Behavior Scale (with caregivers), n = 47, mean age = 1.6 (1.0) years)

  • Adequate to Excellent inter-rater reliability for all subscales:
    • REC corrected r = 0.81
    • EXP corrected = 0.80
    • PER corrected = 0.73
    • IPR corrected r = 0.67
    • PLA corrected r = 0.72
    • COM corrected r = 0.78
    • DLS corrected r = 0.74
    • SOC corrected = 0.70
    • ADBE corrected = 0.79

Internal Consistency

Pediatric Normative Sample (Bayley, 2019)

  • Excellent: Reliability coefficient between 0.93 and 0.96 for CG, LANG, and MOT Scales)

Criterion Validity (Predictive/Concurrent)

Concurrent validity:

Pediatric Normative Sample: (Bayley, 2019; subgroup n = 184; mean age = 19.2 (11.8) months)

  • Excellent correlation with the Bayley-III for Cognitive, Language and Motor Scales (r = 0.69-0.75)

Construct Validity

Convergent validity:

Pediatric Normative Sample: (Bayley, 2019; subgroup n = 104; mean age = 36.6 (3.7) months)

  • Adequate to excellent convergent validity with Wechsler Preschool and Primary Scale of Intelligence (4th edition) (WPPSI-IV) Full Scale IQ (= 0.45-0.79)

Pediatric Normative Sample: (Bayley, 2019; subgroup n = 100; mean age = 18.6 (12.1) months)

  • Poor to Excellent construct validity with Peabody Developmental Motor Scales (2nd edition) (PDMS-2) Total Motor Scale (= 0.19-0.66)

Content Validity

Content validity of the Bayley-4 was determined by ensuring that the items and subtests of the Bayley-4 adequately measure what they intend to measure. Literature and expert reviews were conducted to examine the content of the Bayley-4 and to evaluate any proposed new item designed to improve content coverage and relevance. The test also provided an adequate amount of tasks for the entire age range and the relevance to the constructs measure (Bayley, 2019, p. 37).

Developmental staff and advisory panel reviewed items to confirm that the child engaged in the items and subtest was appropriately responding in the expected cognitive process: 

  • The task focused on the intended skill (e.g., expressive responses required either an imitative or answer-the-question response).
  • The task did not require skills (e.g., pointing to a picture) that were not acquired by children at the target ages.
  • The task included supports to minimize confounding processes (e.g., picture supports were provided to minimize auditory memory load).

The content of the task focused on child-friendly and child-familiar themes.


Pediatric Disorders

Standard Error of Measurement (SEM)

Autism Spectrum Disorders (SEM calculated from Bayley, 2019; = 31; mean age = 36.1 (4.8) months)

  • SEM for CG: 0.34
  • SEM for RC: 0.48
  • SEM for EC: 0.54
  • SEM for FM: 1.05
  • SEM for GM: 0.99 

Developmental Delay: (SEM calculated from Bayley, 2019; = 57; mean age = 23.2 (11.5) months)

  • SEM for CG: 0.32
  • SEM for RC: 0.55
  • SEM for EC: 0.30
  • SEM for FM: 0.60
  • SEM for GM: 0.30

Language Delay: (SEM calculated from Bayley, 2019; n = 25; mean age = 26.4 (4.2) months months)

  • SEM for CG: 0.41
  • SEM for RC (n = 24): 0.78 
  • SEM for EC: 0.54
  • SEM for FM: 0.97
  • SEM for GM: 0.63

Language Impairment: (SEM calculated from Bayley, 2019; n = 25; mean age = 39 (2.2) months)

  • SEM for CG: 0.39
  • SEM for RC: 0.78
  • SEM for EC: 0.38
  • SEM for FM: 0.89
  • SEM for GM: 1.20

Motor Impairment (SEM calculated from Bayley, 2019; = 40; mean age = 26.6 (10.1) months; includes cerebral palsy and developmental coordination disorder diagnoses)

  • SEM for CG: 0.41
  • SEM for RC: 0.58
  • SEM for EC: 0.43
  • SEM for FM: 0.51
  • SEM for GM: 0.29

Prenatal Drug/Alcohol Exposure: (SEM calculated from Bayley, 2019; n = 44; mean age = 24.4 (12.4) months)

  • SEM for CG: 0.41
  • SEM for RC: 0.52
  • SEM for EC: 0.37
  • SEM for FM: 0.35
  • SEM for GM: 0.37

Moderate/Late Premature: (SEM calculated from Bayley, 2019; n = 70; mean age = 19.6 (12.1) months)

  • SEM for CG: 0.34
  • SEM for RC: 0.33
  • SEM for EC: 0.34
  • SEM for FM: 0.40
  • SEM for GM: 0.31

Very/Extremely Premature: (SEM calculated from Bayley, 2019; = 66; mean age = 20.6 (11.8) months)  

  • SEM for CG: 0.41
  • SEM for RC: 0.33
  • SEM for EC: 0.32
  • SEM for FM: 0.34

Minimal Detectable Change (MDC)

Autism Spectrum Disorders: (MDC calculated from Bayley, 2019) 

  • MDC for CG: 0.94
  • MDC for RC: 1.33
  • MDC for EC: 1.50
  • MDC for FM: 2.91 
  • MDC for GM: 2.74

Developmental Delay: (MDC calculated from Bayley, 2019) 

  • MDC for CG: 0.89
  • MDC for RC: 1.52
  • MDC for EC: 0.83
  • MDC for FM: 1.66
  • MDC for GM: 0.83

Language Delay: (MDC calculated from Bayley, 2019)

  • MDC for CG: 1.14
  • MDC for RC (n = 24): 2.16 
  • MDC for EC: 1.50
  • MDC for FM: 2.69
  • MDC for GM: 1.75

Language Impairment: (MDC calculated from Bayley, 2019)

  • MDC for CG: 1.08
  • MDC for RC: 2.16
  • MDC for EC: 1.05
  • MDC for FM: 2.47
  • MDC for GM: 3.33

Motor Impairment: (MDC calculated from Bayley, 2019)

  • MDC for CG: 1.14
  • MDC for RC: 1.61
  • MDC for EC: 1.19
  • MDC for FM: 1.41
  • MDC for GM: 0.80

Prenatal Drug/Alcohol Exposure: (MDC calculated from Bayley, 2019)

  • MDC for CG: 1.14
  • MDC for RC: 1.44
  • MDC for EC: 1.03
  • MDC for FM: 0.97
  • MDC for GM: 1.03

Moderate/Late Premature: (MDC calculated from Bayley, 2019)

  • MDC for CG: 0.94
  • MDC for RC: 0.91
  • MDC for EC: 0.94
  • MDC for FM: 1.11
  • MDC for GM: 0.86

Very/Extremely Premature: (MDC calculated from Bayley, 2019)  

  • MDC for CG: 1.14
  • MDC for RC: 0.91
  • MDC for EC: 0.89
  • MDC for FM: 0.94
  • MDC for GM: 0.86

Normative Data

Autism Spectrum Disorders: (Bayley, 2019)

  • Mean Standard Scores:
    • CG: 4.4
    • RC: 3.4
    • EC: 3.7
    • FM: 4.6
    • GM: 5.5
    • LANG: 62.8
    • MOT: 71.4

Developmental Delay (Bayley, 2019)

  • Mean Standard Scores:
    • CG: 6.5
    • RC: 6.5
    • EC: 6.1
    • FM: 7.0
    • GM: 6.7
    • LANG: 78.8
    • MOT: 81.5

Language Delay (Bayley, 2019)

  • Mean Standard Scores:
    • CG: 8.4
    • RC: 7.1
    • EC: 5.5
    • FM: 8.7
    • GM: 7.8
    • LANG: 79.9
    • MOT: 90.0

Language Impairment (Bayley, 2019)

  • Mean Standard Scores:
    • CG: 7.6
    • RC: 7.3
    • EC: 6.3
    • FM: 7.8
    • GM: 7.2
    • LANG: 81.8
    • MOT: 85.2

Motor Impairment (Bayley, 2019)

  • Mean Standard Scores:
    • CG: 5.5
    • RC: 6.0
    • EC: 6.5
    • FM: 6.5
    • GM: 4.2
    • LANG: 78.4
    • MOT: 72.9

Prenatal Drug/Alcohol Exposure (Bayley, 2019)

  • Mean Standard Scores:
    • CG: 5.2 
    • RC: 6.4
    • EC: 6.5
    • FM: 7.2
    • GM: 7.0
    • LANG: 80.5
    • MOT: 83.4

Internal Consistency

Autism Spectrum Disorders: (Bayley, 2019)

  • Excellent: Reliability coefficients range from 0.92-0.99 for CG, LANG, and MOT Scales

Developmental Delay: (Bayley, 2019)

  • Excellent: Reliability coefficients range from .96 - .99 for CG, LANG, and MOT Scales 

Language Delay: (Bayley, 2019)

  • Excellent: Reliability coefficients range from 0.88-0.98 - for CG, LANG, and MOT  Scales

Language Impairment: (Bayley, 2019)

  • Excellent: Reliability coefficients range from 0.90-0.99 for CG, LANG, and MOT Scales 

Motor Impairment: (Bayley, 2019)

  • Excellent: Reliability coefficients range from .98-.99 for CG, LANG, and MOT Scales 

Prenatal Drug/Alcohol Exposure: (Bayley, 2019)

  • Excellent: Reliability coefficients range from 0.98-0.99 for CG, LANG, and MOT Scales

Moderate/Late Premature: (Bayley, 2019)

  • Excellent: Reliability coefficients range from 0.98-0.99 for CG, LANG, and MOT Scales

Intellectual Disability

Standard Error of Measurement (SEM)

Children with Down Syndrome: (SEM calculated from Bayley, 2019; = 54; mean age = 20.3 (11.0) months)

  • SEM for CG: 0.41
  • SEM for RC: 0.64
  • SEM for EC: 0.52
  • SEM for FM: 0.31
  • SEM for GM: 0.34

Minimal Detectable Change (MDC)

Children with Down Syndrome: (MDC calculated from Bayley, 2019)

  • MDC for CG: 1.14
  • MDC for RC: 1.77
  • MDC for EC: 1.44
  • MDC for FM: 0.86
  • MDC for GM: 0.94

Normative Data

Children with Down Syndrome: (Bayley, 2019)

  • Mean Standard Scores:
    • CG: 4.0
    • RC: 4.0
    • EC: 4.5
    • FM: 4.6
    • GM: 3.3
    • LANG: 67.6
    • MOT:  65.1

Internal Consistency

Children with Down Syndrome: (Bayley, 2019)

  • Excellent: Reliability coefficients range from 0.94-0.98 for CG, LANG, and MOT Scales 


Aylward, G. P. (2020). Is it correct to correct for prematurity? Theoretic analysis of the Bayley-4 normative data. Journal of Developmental & Behavioral Pediatrics, 41(2), 128-133.

Aylward, G. P., & Zhu J. (2019). The Bayley Scales: Clarification for Clinicians and Researchers. NCS Pearson. 

Bayley, N., & Aylward, G. (2019). Bayley Scales of Infant And Toddler Development: Technical manual (4th ed.). NCS Pearson.

Pearson Education. (2019). Bayley-III vs. Bayley-4: What’s changed? Pearson.