Abstract
Background: Retinopathy of prematurity (ROP) is a disorder of the preterm newborn characterized by neurovascular disruption in the immature retina that may cause visual impairment and blindness. Objective: To develop a clinical screening tool for early postnatal prediction of ROP in preterm newborns based on risk information available within the first 48 h of postnatal life. Methods: Using data submitted to the Vermont Oxford Network (VON) between 1995 and 2015, we created logistic regression models based on infants born <28 completed weeks gestational age. We developed a model with 60% of the data and identified birth weight, gestational age, respiratory distress syndrome, non-Hispanic ethnicity, and multiple gestation as predictors of ROP. We tested the model in the remaining 40%, performed tenfold cross-validation, and tested the score in ELGAN study data. Results: Of the 1,052 newborns in the VON database, 627 recorded an ROP status. Forty percent had no ROP, 40% had mild ROP (stages 1 and 2), and 20% had severe ROP (stages 3-5). We created a weighted score to predict any ROP based on the multivariable regression model. A cutoff score of 5 had the best sensitivity (95%, 95% CI 93-97), while maintaining a strong positive predictive value (63%, 95% CI 57-68). When applied to the ELGAN data, sensitivity was lower (72%, 95% CI 69-75), but PPV was higher (80%, 95% CI 77-83). Conclusions: STEP-ROP is a promising screening tool. It is easy to calculate, does not rely on extensive postnatal data collection, and can be calculated early after birth. Early ROP screening may help physicians limit patient exposure to additional risk factors, and may be useful for risk stratification in clinical trials aimed at reducing ROP.
Introduction
Retinopathy of prematurity (ROP) is a disorder of the preterm newborn that can lead to visual impairment and blindness [1,2]. The retina and retinal vasculature develop during gestation, linking the extent of retinal immaturity to the extent of prematurity of the infant. As a result, extremely preterm newborns are more susceptible to ROP than moderate to late term infants [1,3]. The pathogenesis of ROP involves 2 distinct postnatal phases and potentially a pre-phase involving stressors within the intrauterine environment [1,4]. Exposure to perinatal infection, inflammation, and oxidative stress may influence the postnatal retinal neovascularization process [5,6].
Prominent risk factors for ROP include low birth weight, low gestational age, and exposure to supplemental oxygen during the first few weeks after birth [7,8]. Other potential antenatal and postnatal risk factors include chorioamnionitis, pre-eclampsia, placental infection, sepsis, the presence of patent ductus arteriosis, the use of surfactant and transfusions, poor postnatal weight gain, and genetic factors [9,10,11].
The current screening process recommended by the American Academy of Pediatrics suggests at least 2 dilated fundus exams for all infants with a birth weight of less than 1,500 g or born before or at 30 weeks of gestation [3]. As onset and progression of ROP are related to post menstrual age (PMA), the recommended screening time for premature newborns is formulated to screen at 31 weeks' PMA. It is not recommended to examine preterm newborns for ROP before 31 weeks' PMA, as the lens and vitreous humor are not translucent before that age. Nonetheless, extremely preterm newborns (<25 weeks' gestation) should be considered for earlier screening for serious ROP due to the high likelihood of severe comorbidities. However, beginning screening at 31 weeks' PMA for a premature infant <28 weeks' gestation may be missing a critical opportunity to alter the infant's risk factors for ROP [1].
Our goal was to design a clinical screening tool to predict ROP in extremely premature infants (<28 weeks) based on risk factor information that is available within the first 48 h after birth. By excluding later risk factors (e.g., sepsis and supplemental oxygen), this score will be less accurate than scores such as ROPScore [12], WINROP [13], and CHOP ROP [14]. However, it may improve long-term outcomes by giving an opportunity to alter early risk factors and starting screening and treatment sooner. Additionally, screening examinations are time-intensive for the physician and stressful for the infant. This score could potentially reduce the total number of screening examinations by being more sensitive than birth weight and/or gestational age alone.
Methods
Patient Cohorts
We used data from infants born before 28 completed weeks' gestation between 1995 and 2015 collected at Tufts Medical Center, Boston, submitted to the Vermont Oxford Network (VON).
Selection of Variables
The dependent variable for this analysis was ANYROP, defined as the occurrence of any stage of ROP (1-5). ROP status (stages 0-5) was recorded in our VON database for 627 children. Forty percent of these had no ROP, 40% had mild ROP (stages 1 and 2), and 20% had severe ROP (stages 3-5).
Information on all variables was available before 48 h after birth. Twenty-six variables were initially considered: BW, GA, head circumference, race, ethnicity, antenatal magnesium sulfate, antenatal steroids, chorioamnionitis, maternal hypertension, mode of delivery, multiple gestations, gender, 1-min APGAR, 5-min APGAR, respiratory distress syndrome (RDS), major birth defects, meconium aspiration syndrome, and temperature measured within 1 h of admission to NICU. The following data were collected during initial resuscitation in the delivery room: highest oxygen during initial resuscitation, face mask ventilation, endotracheal tube ventilation, epinephrine, cardiac compression, nasal CPAP, surfactant, and inhaled nitric oxide.
Development of the Model
All statistical analyses were conducted using the Statistical Package for the Social Sciences (SPSS) version 21 for Windows (SPSS Inc., Chicago, IL, USA). After conducting univariable logistic regression analyses with the above variables, a multivariable logistic regression model was developed to predict the presence of any ROP (stages 1-5). This most parsimonious model included 5 variables: gestational age below 25 weeks (GA25), birth weight below 750 g (BW750), RDS, non-Hispanic ethnicity (NONHISP), and multiple gestations (MULT).
Score Development and Validation
Based on the odds ratios for each variable, a weighting was assigned in order to create an additive score. The score was developed using a random 60% of the cases, and then tested on the remaining 40% of the cases. In order to assess the quality of fit, the predicted and actual risks of ROP were compared at each score value.
After developing an additive score, the model was validated on 90% of the cases selected randomly, and the predicted and actual risks were compared at each score value. This procedure was conducted 10 times with a new and randomly selected 90% of the cases.
After establishing our score using VON data, we tested it using ELGAN data [15]. The ELGAN study has followed over 1,000 children born less than 28 weeks' gestation between 2002 and 2004 at 14 different hospitals in 5 states in the United States, aiming to learn if those infants with early inflammation would have developmental problems at age 2. The testing sample contained 1,242 infants, 73% of whom had any stage ROP. Using the same model, sensitivity, specificity, positive predictive value, and negative predictive value were calculated.
Establishing Score Accuracy
Receiver operating characteristic (ROC) curves were used to identify the appropriate cut off to maximize the sensitivity of the model (Fig. 1). As the score was designed to be a screening tool, the goal was to maximize sensitivity, without sacrificing positive predictive value.
The Institutional Review Board at Tufts Medical Center committee approved the usage of data submitted to the VON, between 1995 and 2015 (n = 1,052) as all data were de-identified.
Results
Of the 1,052 cases from the VON database, ROP status (stages 0-5) was recorded for 627 children. Forty percent of these had no ROP, 40% had mild ROP (stages 1 and 2), and 20% had severe ROP (stages 3-5).
The demographic characteristics of the cases are shown in Table 1. The prevalence of mild ROP (stages 1 and 2) and severe ROP (stages 3-5) in our sample was 40 and 20%, for the following variables: birth weight, gestational age, head circumference, maternal ethnicity, maternal race, chorioamnionitis, maternal hypertension, multiple gestations, APGAR scores, face mask ventilation, nasal CPAP, temperature, and RDS.
The score predicting any ROP was out of 12 points (Table 2). Based on the odds ratios for each variable, a weighting was assigned in order to generate an additive score: BW750 - 2 points, GA25 - 3 points, NONHISP - 2 points, RDS - 3 points, MULT - 2 points. Possible score values were 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 12.
A cutoff score of 5 had the best sensitivity (95%, 95% CI 93-97), while still maintaining a strong positive predictive value (63%, 95% CI 58-67; Table 3). When applying the same model and cutoff score to the ELGAN data, sensitivity was lower (72%, 95% CI 69-75), but PPV was much higher (80%, 95% CI 77-83; Table 3).
The area under the ROC curve indicates how accurate the score is. Area under the curve for the VON sample was 0.730 and 0.721 for the ELGAN sample.
Discussion
Our score correctly identified 95% of infants that later developed ROP. Birth weight and gestational age are commonly accepted as the best early predictors of ROP [7,8]; however, our model has a higher sensitivity than GA and BW alone (0.95 vs. 0.61). When a model was created using just GA25 and BW750, statistical measures were optimized with a score cutoff of 2 (SE = 0.61 [0.55-0.068], SP = 0.72 [0.65-0.79], PPV = 0.76 [0.70-0.82], and NPV = 0.56 [0.49-0.63]). Based on this comparison, it is important to include maternal ethnicity, respiratory status, and multiple gestations, as these variables add additional information to the model.
Along with low birth weight and gestational age, severity of illness (including RDS, sepsis, patent ductus arteriosus, interventricular hemorrhage, requiring blood transfusions, etc.) is also a contributor to ROP risk [16]. These risk factors are controversial, however, and tend to occur more frequently in the youngest infants. It is likely that there is a degree of overlap between the effects of BW and GA and the effects of RDS. When a logistic regression is run with RDS as the only predictor of ROP, the odds ratio increases drastically (OR 4.0, 95% CI 1.6-9.7). However, when calculated with RDS and GA25 as predictors, the odds ratio drops (OR 3.0, 95% CI 1.2-7.4). Similarly, when calculated with RDS and BW750 as predictors, the odds ratio drops as well (OR 3.0, 95% CI 1.2-7.4). These odds ratio reductions suggest that there is some overlap in the effect of RDS, GA, and BW on ANYROP. However, when all 5 variables are reentered into the model, RDS retains an odds ratio of 2.4. This is an important effect, suggesting that RDS contributes information above and beyond BW and GA. We decided to include RDS, although its 95% CI includes 1.0 (OR 2.40, 95% CI 0.94-6.12) because RDS is one of the most important proxy variables for neonatal severity of illness.
The risk of ROP may vary by race and ethnicity based on genetic differences between populations [17]. For example, high grade ROP occurs more frequently in low BW white infants than in low BW black infants with similar risk [18].
Finally, infants from multiple gestations are at increased risk of prematurity and low birth weight [19,20]. The mechanism is unclear, but multiple gestations appear to be an independent risk factor for ROP. In addition to causing decreased BW and GA, the mechanism may involve assisted pregnancy and in vitro fertilization [19,21,22,23].
Additionally, we compared those newborns with and without ROP status on the basis of the 5 variables in the model, gender, and whether or not the newborns died in the delivery room. Of those who had an ROP status (stages 0-5), 0% of those children died in the delivery room, whereas 23.1% of those without ROP status died in the delivery room. The no-ROP status survivors were otherwise very similar to the ROP status children in terms of the above variables. The survivors and non-survivors without ROP status were similar in terms of gender, MULT, and NONHISP, but not in terms of BW and GA. The survivors without ROP status had a smaller percent of children with BW <750 and GA <25, an intuitive finding given that birthweight and gestational age can be general markers of neonatal health. RDS was not recorded for the non-survivors.
The VON data used were collected over a 20-year span, during which 2 seminal papers were published in the field of ROP (STOP-ROP Trial published in 1999, SUPPORT Trial published in 2009). These papers could have influenced perinatal management of preterm newborns, so the 20-year span was broken into 4 time windows (1995-1999, 2000-2004, 2005-2009, and 2010-2015) to understand if newborns across each time window were similar in respect to the variables in the model, and to see if STEP-ROP predicted the ANYROP status more accurately in certain time windows. The newborns were similar in terms of BW <750, GA <25, RDS, NONHISP, MULT, and gender, with the exception of those newborns born from 2010 to 2015 with regard to BW <750, GA <25, and RDS. These babies had higher birthweights, older gestational ages, and a lower prevalence of RDS than newborns born from 1995 to 2009, potentially indicating an improvement in neonatal care. The score was then applied to each age bracket individually, and its ability to predict ANYROP remained high throughout (area under the ROC curve for each year bracket = 0.768, 0.675, 0.778, 0.715). The AUC dips lowest from 2000 to 2004, but no overall trend is observed across time windows. This suggests that despite the 20-year span and changes in neonatal management, the STEP-ROP score remains a consistent predictor of ANYROP in newborns <28 weeks' gestation.
Moons et al. [24] discussed the benefit of using regression coefficients instead of odds ratios when creating a clinical score. When a score is derived from a logistic regression model, the regression coefficients are on an additive scale, while the odds ratios are on a multiplicative scale. We tried this method using the same 5 variables from the original model. The regression coefficients were multiplied by 10 and rounded to the nearest integer. These became the new weightings for each variable. SE, SP, PPV, and NPV were examined at each score value, and a cutoff score of 15 optimized SE and PPV. However, when comparing the model using odds ratios and regression coefficients, the SE, SP, PPV, and NPV were virtually identical. SE did not change using the regression coefficient model, but SP, PPV, and NPV each decreased by 1%. We chose to continue with the model based on odds ratios for simplicity and ease of calculation.
Other Scores
In the past, similar scores have been created to predict ROP risk in preterm infants (ROPScore [12], WINROP [13], and CHOP ROP [14]); however, these models rely on risk factors that present up to 6 weeks after birth, or only take birth weight and gestational age into account. These scores do not capture perinatal risk factors, and may be applied too late to decrease early risk of ROP. Ideally, this score could be used in conjunction with a score based on later variables to catch the most cases of ROP. We are not aware of any other ROP screening tool solely based on data available in the first 48 h after birth.
Strengths of the Score
Given the long-term visual and cognitive health implications for premature infants with ROP, effective screening and treatment are crucial [25,26]. By developing a screening tool based on variables available in the first 48 h after birth, the neonatologist can identify infants at higher risk of developing ROP early after birth. These infants can receive more targeted postnatal care with the intent of minimizing their later risk factors (O2 exposure, sepsis, etc.). A more informed approach to postnatal care for those infants with high risk of ROP might constitute an effective prevention strategy in itself [12]. Furthermore, the score is based on very simple data that do not require additional effort to collect and is very easy to calculate, making it a feasible approach for nurses, neonatologists, ophthalmologists, or supporting staff.
The STEP-ROP score could be implemented as a way to discern those preterm newborns <25 weeks' gestation at highest risk for ROP who require early screening, as the American Pediatric Society advises. Of course, the use of supplemental O2 should be minimized and infection precautions taken with every child. However, we hope that our score will bring attention to the particularly high-risk patients to ensure their safety in the NICU. Furthermore, a secondary goal of this score was to identify high-risk newborns for the purposes of future intervention trials.
Weaknesses of the Score
Although this score is an effective way to screen extremely premature infants for ROP in the first 48 h after birth, using such early variables has its shortcomings. The retinal pathology develops over the course of the first few weeks of postnatal life, which are not taken into account by the score presented here. An infant with a high score may have extremely low gestational age, birth weight, be a multiple, and so on, but this does not necessarily imply that she will have a complicated hospital course in the next few weeks of life. Conversely, an infant that scores relatively low may develop complications in the first few weeks of life, which would not be captured by the score. As a clinical screening tool with high SE and PPV, but low SP and NPV, this score should not be used to rule out ROP or withhold screening that is indicated according to current AAP guidelines.
This score needs extensive, prospective validation. Validation of the score using the ELGAN cohort showed that although SE suffered slightly, PPV increased, suggesting effective screening in an independent data set. To further test external validity, the score should be applied to new cohorts of patients in different hospitals, states, and countries. Furthermore, a larger study would most certainly generate more accurate estimates of sensitivity and specificity. Finally, we welcome any suggestion how the score could be improved. It is our hope that the score, or a modification of it, will become a helpful tool for those caring for the most vulnerable newborns.
Acknowledgments
This work was supported by a Williams Research Fellowship (C.A.R.). Many thanks to Elizabeth Allred at Children's Hospital and Harvard Medical School for her assistance with score validation using data from the ELGAN study. The ELGAN study was supported by the National Institutes of Health (5U01NS040069-05; 2R01NS040069-06A2; 1-R01-EY021820-01, and 5P30HD018655- 28).
Disclosure Statement
None of the authors have any sponsorships, funding arrangements, or conflicts of interest to disclose.