Background: Extremely-low-birth-weight (ELBW; ≤1,000 g) infants are at high risk for neurodevelopmental impairments. Conventional brain MRI at term-equivalent age is increasingly used for prediction of outcomes. However, optimal prediction models remain to be determined, especially for cognitive outcomes. Objective: The aim was to evaluate the accuracy of a data-driven MRI scoring system to predict neurodevelopmental impairments. Methods: 122 ELBW infants had a brain MRI performed at term-equivalent age. Conventional MRI findings were scored with a standardized algorithm and tested using a multivariable regression model to predict neurodevelopmental impairment, defined as one or more of the following at 18-24 months' corrected age: cerebral palsy, bilateral blindness, bilateral deafness requiring amplification, and/or cognitive/language delay. Results were compared with a commonly cited scoring system. Results: In multivariable analyses, only moderate-to-severe gyral maturational delay was a significant predictor of overall neurodevelopmental impairment (OR: 12.6, 95% CI: 2.6, 62.0; p < 0.001). Moderate-to-severe gyral maturational delay also predicted cognitive delay, cognitive delay/death, and neurodevelopmental impairment/death. Diffuse cystic abnormality was a significant predictor of cerebral palsy (OR: 33.6, 95% CI: 4.9, 229.7; p < 0.001). These predictors exhibited high specificity (range: 94-99%) but low sensitivity (30-67%) for the above outcomes. White or gray matter scores, determined using a commonly cited scoring system, did not show significant association with neurodevelopmental impairment. Conclusions: In our cohort, conventional MRI at term-equivalent age exhibited high specificity in predicting neurodevelopmental outcomes. However, sensitivity was suboptimal, suggesting additional clinical factors and biomarkers are needed to enable accurate prognostication.
Premature infants, especially those born with extremely low birth weight (ELBW; ≤1,000 g), are at high risk for developmental deficits. Neurodevelopmental impairment (NDI), a composite outcome that includes significant motor, cognitive, and neurosensory impairments, occurs in approximately 25-55% [1,2,3], with the most substantial contribution to NDI made by cognitive deficits. Cranial ultrasound, though used extensively, exhibits poor sensitivity for prediction of NDI, especially cognitive outcomes [4,5], with the exception of a reasonable ability to predict cerebral palsy (CP). Not surprisingly, brain MRI has become increasingly utilized in this population. Early evidence with conventional MRI at term-equivalent age (TEA) suggests that MRI may improve prognostication and identification of those at highest risk for impairments when compared to cranial ultrasound [4,6,7,8,9,10]. Yet, more objective and accurate models for predicting NDI, especially cognitive outcomes, remain to be determined.
Specific MRI findings at TEA such as focal parenchymal injury (particularly cystic periventricular leukomalacia) strongly correlate with motor impairments, especially CP [4,7]. More recent investigations have linked white matter (WM) and gray matter (GM) abnormalities with cognitive deficits [6,9,11,12]. Prior MRI prediction studies have subjectively weighted the importance of different structural MRI findings for outcome prediction [4,9]. This could result in suboptimal prediction accuracy. Our objective was to assess the accuracy of a data-driven MRI scoring system using multivariable statistical modeling to predict NDI in a cohort of ELBW infants.
The Children's Memorial Hermann Hospital and University of Texas Medical School at Houston joint Institutional Review Board approved the study. The need for written informed consent was waived because MRI scans and standardized neurodevelopmental testing were already part of usual clinical care for all ELBW infants.
Between August 2005 and November 2007, 202 ELBW infants were born/transferred to the Children's Memorial Hermann Hospital NICU within 14 days of birth. Of these, 140 ELBW infants that survived to 36 weeks' postmenstrual age (PMA) were screened for eligibility. Six infants with major congenital anomalies and 1 infant discharged without an MRI scan were excluded. An additional 11 infants with poor-quality MRI scans, largely due to motion artifacts, were also excluded, resulting in a final study population of 122 infants. Normative values for the corpus callosum, lateral ventricles, and subarachnoid space were obtained from MRI scans performed in 13 healthy full term newborns (see online suppl. Methods and suppl. References; for all online suppl. material, see www.karger.com/doi/10.1159/000444179) . No other data was used from this control population.
Structural MRI was routinely performed on all ELBW infants at approximately 38 weeks' PMA or prior to discharge for infants discharged before then. A large majority of the MRI scans were obtained under natural sleep conditions, with infants previously fed and swaddled and outfitted with silicone ear plugs. MRI was performed on a 1.5-tesla GE-LX scanner. Basic scan parameters were as follows: axial proton density/T2: TE 15/175 ms, TR 10,000 ms, voxel size 0.36H × 0.36W × 1.98D mm; axial SPGR: TE/TR 2/20 ms, matrix 256 × 256, slice thickness 1 mm with no gap; DWI (diffusion-weighted imaging): TE/TR 83/8,000 ms, matrix 128 × 128, slice thickness 4 mm, 0 gap. Field of view was 18 × 18 cm for all the sequences. See online supplementary Methods for details on imaging parameters for additional sequences.
One of two dedicated neuroradiologists read all studies using a standardized MRI scoring form (see online suppl. fig. 1 and Methods for details). Measured radiographic variables included normal or abnormal WM maturation, normal or abnormal gyral/GM maturation, number and type of focal or multifocal WM signal abnormalities, number and type of GM signal abnormalities, cystic changes (focal or diffuse), lateral ventricle size, subarachnoid space size, corpus callosum size, and presence of overall volume loss/atrophy (online suppl. fig. 2). White or gray matter maturation was defined as abnormal if there was a 2- to 4-week or greater delay in maturation and categorized as ‘moderate-to-severe' delay if there was a more than 4-week delay in development [9,14,15,16,17]. Please see figure 1, online supplementary eMethods and online supplementary figures 3-5 for further definitions and examples of WM and GM abnormalities.
At approximately 18-24 months of corrected age, all ELBW infants were tested using the Bayley Scales of Infant and Toddler Development III , by a certified examiner blinded to the MRI results. All patients also had a standardized neurologic examination to ascertain the presence of CP. NDI was defined as one or more of the following: CP, cognitive delay (Bayley Mental score <80), bilateral blindness, and/or bilateral hearing loss requiring amplification (see online supplementary material for additional detail on definitions of these outcome components). To facilitate comparison with the study by Woodward et al.  that reported Bayley II scores, we calculated a ‘Bayley Mental' score , defined as the average of the Bayley III Cognitive and Language scores. One infant was uncooperative for cognitive testing and 3 were uncooperative for language testing. For these 4 infants we imputed the scores for the missing tests using their respective Language or Cognitive score to calculate their Bayley Mental score. Mental delay was defined as a Bayley Mental score <80. This cutoff corresponds to the traditional definition of severe mental delay/deficit on the Bayley II of <70, which averaged approximately 10 points lower in a comparison study .
We used one-way ANOVA for continuous outcomes and χ2 analysis or Fisher's exact test for categorical variables as appropriate. All imaging variables significant at the p < 0.10 level in univariate analyses were included in the regression model, as well as any variables felt likely to have biologic plausibility or a known correlation with NDI. For the imaging multivariable analyses, MRI variables were eliminated in a backward stepwise fashion if they did not change the model's overall significance when dropped. Bayley scores did not fit assumptions for linear regression (non-Gaussian distribution); therefore, scores were dichotomized and models built using multivariable logistic regression. The final imaging multivariable models were adjusted for four clinical risk factors: birth gestational age, presence of bronchopulmonary dysplasia, low maternal educational level, and socioeconomic status (see online suppl. Methods for variable definitions). To facilitate comparison to a widely cited MRI scoring system, we combined variables to generate a WM and GM score as reported by Woodward et al. [9 ]and correlated them to our developmental outcomes. Considering the multiple comparisons made, two-sided p values of <0.01 indicated statistical significance.
Of our cohort of 122 ELBW infants, 12 infants died after the MRI scan in early infancy and 14 were lost to follow-up (11%). Baseline characteristics for the overall cohort and the infants lost to follow-up were similar, with two significant differences - a higher proportion of subjects who were lost to follow-up were small for gestational age and fewer received antenatal steroids (table 1). Deceased infants had a significantly lower birth weight and higher PMA at the time of MRI.
Three of the 96 survivors that returned for follow-up at 18-24 months' corrected age could not be tested with the Bayley III due to behavioral problems. Mean age at Bayley testing was 21.8 months (SD 5.7). The mean (SD) Bayley Cognitive score was 89.4 (17.4) and the mean Language score was 82.3 (19.2). The mean (SD) Bayley Mental score was 85.7 (17.5). CP was present in 6 of 96 (6%), mental delay in 28 of 93 (30%), and NDI in 33 of 93 infants (35%; table 2).
Outcomes Prediction Based on Data-Driven MRI Predictors
Intrarater agreement for the majority of qualitatively assessed imaging variables was good to very good; however, it was only fair for WM maturation and diffuse excessive high signal intensity (see online suppl. material for details) . The following imaging variables from the standardized MRI form (online suppl. fig. 1) were associated with one or more neurodevelopmental outcomes in unadjusted analyses: degree of WM maturation, focal WM signal abnormalities, diffuse WM signal abnormalities, moderate-to-severe gyral maturational delay, diffuse cystic abnormality, and corpus callosum size <25th percentile. In multivariable logistic regression prediction models, the presence of diffuse cystic abnormality (n = 12, 9.8%) was the only variable independently predictive of CP and CP/death (table 3). Moderate-to-severe gyral maturational delay was present in 19 subjects (15.5%) and was the only independent predictor for NDI, NDI/death, mental delay and mental delay/death. Mean Bayley scores for subjects with moderate-to-severe gyral maturational delay were significantly lower by approximately 20 points than those with no or mildly delayed gyral maturation (fig. 2). Both of these predictors exhibited high specificity (range: 94-99%), but low sensitivity (30-67%) in predicting neurodevelopmental outcomes (table 4). All multivariable MRI prediction models remained significant when adjusted for gestational age, bronchopulmonary dysplasia, high-risk socioeconomic status, and low maternal educational level. Notably, the odds ratios increased for 4 of the 6 prediction models after adjustment for the clinical risk factors.
Outcomes Prediction Based on a Widely Cited MRI Scoring System
Using the previously published scoring system , we found no WM abnormality in 19% of the infants, mild WM abnormality in 49%, moderate in 25%, and severe in 7%. GM abnormalities were present in 23% of infants. We found no significant associations between the severity of WM abnormality and Bayley scores (online suppl. table 1). The severity of WM abnormality also did not predict mental delay, CP, NDI, or composite end points. Inclusion of diffuse excessive high signal intensity on T2-weighted MRI, or collapsing WM abnormality into two categories - none-mild and moderate-severe - did not identify new significant relationships. GM abnormality, as previously defined , also did not correlate with neurodevelopmental outcomes (online suppl. table 2).
We used a data-driven approach to evaluate the relative importance of various conventional MRI findings at TEA for prediction of neurodevelopmental outcomes in ELBW infants. We found that moderate-to-severe gyral maturational delay emerged as a novel independent predictor for neurodevelopmental outcomes, most importantly mental delay (a combination of cognitive and language outcomes) and overall NDI.
The significant association of GM maturational delay with mental delay and NDI has only been reported by two previous studies, and most neuroradiologists do not report this finding for clinical MRI scans. Woodward et al.  examined gyral maturation as 1 of 3 components of their GM score. While GM score predicted severe cognitive delay in unadjusted analyses, it was no longer significant after adjustment for WM abnormalities or clinical risk factors. Rathbone et al.  reported an association between delayed growth rate of cortical surface area and early childhood neurocognitive outcomes.
Our findings are consistent with several studies that reported the likelihood of developing CP to be extremely high in infants diagnosed with cystic periventricular leukomalacia [4,7]. However, our study differs from those that have found associations of WM injury on near-term MRI with cognitive outcomes. Moderate-to-severe WM injury on structural MRI was a predictor of NDI in several relatively large cohorts [6,8,9,10]. Woodward et al.  studied 167 very preterm infants and observed that moderate-to-severe WM abnormality predicted severe cognitive delay, severe motor delay, CP, neurosensory impairment, and NDI at 2 years' corrected age. Notably, sensitivity was 65% each for predicting severe motor delay and CP, with specificity of 85 and 84%, respectively . Sensitivity and specificity for predicting severe cognitive delay was 41 and 84%, respectively. After adjustment for neonatal risk factors, the association of moderate-to-severe WM abnormality with severe motor delay and CP remained significant, while the association with severe cognitive delay and neurosensory impairment did not. Skiold et al.  found a significant association between moderate-to-severe WM abnormality on MRI at TEA and lower Bayley Cognitive and Language scores and CP at 30 months' corrected age in unadjusted analyses. Hintz et al.  recently reported that a moderate or severe WM abnormality on MRI at TEA was associated with NDI or death at 18-22 months, but only in a limited prediction model that excluded late cranial ultrasound data. When ultrasound findings were added, it was no longer predictive of NDI or death.
These findings suggest that we must exercise caution in relying on MRI as a key predictor of NDI [23,24]. In our cohort, the inability of WM abnormalities to predict outcomes other than CP was true irrespective of the scoring system we employed. We speculate that defining the presence, type, and degree of WM abnormality on qualitative MRI is subjective and that this contributes to our inability to predict cognitive or language outcomes. Furthermore, conventional MRI is less sensitive in diagnosing delayed/aberrant brain development and microstructural abnormalities that are likely common contributors to cognitive and language development . Beyond conventional MRI data, advanced applications of MRI may prove to be more helpful in predicting outcomes, along with other forms of neurologic assessment.
Our study had several limitations. Due to the size of our cohort we were unable to control for additional medical or social risk factors that are also important contributors to neurodevelopmental outcomes. Our inability to use cognitive scores as continuous variables, due to a non-Gaussian distribution, may have also contributed to our inability to find association with WM abnormalities. Our data collection tool was not set up to specifically replicate the Inder/Woodward scoring system. However, we did collect all the relevant data, some of which were quantitative, for all categories of their WM and GM scoring system. Notably, as analysis of our study data was already in progress, an updated version of the Inder/Woodward scoring system became available  that may improve outcome prediction. However, this updated system does not include new or additional markers of cortical GM/gyral maturity. Intrarater reliability was only fair for WM maturation and DEHSI (see online suppl. Methods), supporting that these are somewhat subjective imaging findings. Intrarater values had fairly wide confidence intervals, which likely reflects the limited sample size of 25 subjects that were reassessed. Interrater variability was not assessed. Lastly, we were unable to assess the degree of PLIC myelination in a small fraction of infants who had their study MRI prior to 37-38 weeks' PMA, the expected age of myelination of the PLIC.
Of note, when we categorized our subjects by severity of WM abnormality as in the Inder/Woodward scoring system, there were only 4 subjects with severe WM abnormality, and the mean Bayley scores for this category were surprisingly slightly higher than other severity categories (see online suppl. table 1). This accurately reflects the data collected, as there was 1 subject in the severe WM category with exceptionally high scores on the Bayley scales (cognitive score 140). However, no statistical difference was found in Bayley scores between categories of WM severity. We attempted to compensate for low numbers in the severe category by examining a combined moderate-to-severe WM abnormality group and their outcomes (Bayley scores, CP, mental delay, NDI), but did not find any new significant associations.
A particular strength of this study was utilization of an approach that let individual MRI measures drive the development of multivariable statistical prediction models. By analyzing the contribution of individual imaging measures to outcomes, rather than using a subjectively defined and weighted composite scoring system, we were able to simplify the scoring system and identify the important contribution of delayed gyral maturation to outcome prediction. Existing neonatal MRI scoring systems have weighed the relative importance of brain metrics based on expert opinion rather than statistical correlations with neurodevelopmental deficits. Our multivariable statistical approach allowed us to eliminate several variables that did not contribute to improved predictions and highlighted those variables that are most predictive of outcome.
In our ELBW cohort, infants with moderate-to-severe gyral maturational delay exhibited the highest likelihood of developing cognitive and language deficits. Conventional MRI at TEA exhibited high specificity in predicting neurodevelopmental outcomes. However, sensitivity was suboptimal, suggesting additional clinical factors and biomarkers are needed to enable accurate prognostication.
This work was supported by the National Institutes of Neurological Disorders and Stroke K23-NS048152 grant (to N.A.P.). The funding agency played no role in the design, conduct, or analysis of the trial. Dr. Parikh had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Dr. Parikh gratefully acknowledges the significant contributions of Dr. Robert Lasky, a long-time mentor who died recently. He was instrumental in designing this study and was a continual source of encouragement and inspiration.
The authors have no conflicts of interest to declare.