Abstract
Introduction: Prediction of neurodevelopmental outcome in infants with hypoxic-ischemic encephalopathy remains an important challenge. Various studies have shown that the predictive ability of different modalities changed after the introduction of therapeutic hypothermia. This paper reviews the diagnostic test accuracy of the different modalities that are being used to predict neurodevelopmental outcomes following therapeutic hypothermia. Methods: A systematic literature search was performed using Embase and PubMed. Two reviewers independently included eligible studies and extracted data. The quality of the studies was assessed using the Quality in Prognosis Studies Tool. Meta-analyses were performed where possible. Results: Forty-seven articles and 3 conference abstracts were included, reporting on 3,072infants of whom 39% died or had an adverse neurodevelopmental outcome. A meta-analysis could be performed using 37 articles on (amplitude-integrated) electroencephalography (EEG), conventional magnetic resonance imaging (MRI), diffusion-weighted imaging (DWI), and proton magnetic resonance spectroscopy (1H-MRS). Amplitude-integrated EEG (aEEG) at 24 and 72 h showed similar high diagnostic OR, while aEEG at 6 h and EEG performed less, both due to a low specificity. For MRI, most studies reported scoring systems in which early (<8 days) MRI performed better than late (≥8 days) MRI. Injury to the posterior limb of the internal capsule on MRI or to the thalami on DWI were strong individual predictors, as was an increased lactate/N-acetylaspartate peak on 1H-MRS. Conclusions: In the era of therapeutic hypothermia, the different modalities remain good predictors of neurodevelopmental outcome. However, timing should be taken into account. aEEG may initially be false positive and gets more reliable after 24 h. In contrast, MRI should be used during the first week, as its predictive value decreases afterwards.
Introduction
Hypoxic-ischemic encephalopathy (HIE) following perinatal asphyxia is the most common cause of acquired perinatal brain injury and may lead to death or long-term neurologic sequelae [1, 2]. During the last decade, therapeutic hypothermia (TH) has become standard treatment for infants with moderate to severe HIE and it has been shown to reduce both mortality and morbidity [1].
Early prognostication remains challenging but essential for parental counseling and intensive care management, including the use of future neuroprotective strategies. A wide variety of neurophysiologic and neuroimaging modalities are currently available to assess the degree of brain injury and predict long-term outcomes. The predictive value of these tests was first studied as part of large randomized controlled trials [3-5] studying the effect of TH on outcomes, and many studies have followed since. Following TH, the predictive abilities of some tests have been reported to have changed, although they remain useful tools [6, 7]. Various factors may have contributed to the changed predictive characteristics following TH, including differences in inclusion criteria for studies prior to and during the TH era and changes in the extent and time course of injury following TH [7].
So far only 1 meta-analysis [8] has reported the predictive characteristics of amplitude-integrated electroencephalography (aEEG) regarding TH studies only, and no meta-analysis has been performed reporting all neurophysiologic and neuroimaging modalities. Therefore, we performed a systematic review and meta-analysis to provide an overview of the prognostic values of the techniques that are most commonly used in clinical practice for predicting neurodevelopmental outcomes in HIE. Specifically, electroencephalography (EEG), aEEG, near-infrared resonance spectroscopy (NIRS), evoked potentials, different magnetic resonance imaging (MRI) modalities, and cranial ultrasound (cUS) were evaluated.
Methods
Information Sources
We conducted a systematic review and meta-analysis following Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines [9] and the Meta-Analysis of Observational Studies in Epidemiology [10]. A systematic search was performed in PubMed and Embase by 2 authors (L.C.A.S. and S.O.) on November 29, 2018, and was based on the search by van Laerhoven et al. [11] (online suppl. Fig. 1; for all online suppl. material, see www.karger.com/doi/10.1159/000505519). The reference lists of included studies were screened in order to identify any additional relevant studies. Conference abstracts were also eligible for inclusion to minimize publication bias. When an article or conference abstract lacked the required clinical data or data to perform a meta-analysis, additional data was requested from the authors via e-mail.
Study Eligibility
Studies were selected based on the following inclusion criteria: (1) studies on term and near-term infants (gestational age ≥35 weeks) with HIE treated with TH; (2) relationship between neurodevelopmental outcomes and at least 1 of the following prognostic tests described: aEEG, EEG, NIRS, evoked potentials, MRI, and cUS; and (3) neurodevelopmental follow-up during at least 18 months. Neurodevelopmental outcomes had to be defined by at least 1 of the following criteria: (1) death, (2) development of cerebral palsy, and (3) developmental outcome using validated tools such as the Bayley Scales of Infant Development. Studies were excluded when there were: (1) no separate results on hypothermic infants if the study contained data on hypothermic and normothermic infants; (2) when the number of infants with outcome data as described above was lower than 15; or (3) when an additional treatment other than TH was investigated. When studies from the same authors reported overlapping populations for the same test, the study with fewer infants was excluded. Studies published in a language other than English, Dutch, French, Spanish, or German were also excluded. Studies that did not provide data for 2 × 2 tables after a request to the authors were included in the systematic review but not in the meta-analysis. Conference abstracts were not included for the review if no sufficient clinical data were available after contacting the authors.
Data Extraction
Information regarding the study design, the setting, the number of included infants and follow-up, infants’ baseline characteristics, outcome measurements, prognostic factor methods, and results were independently extracted by 2 of the authors (L.C.A.S. and S.O.) and discussed with a third author (N.E.A.) in case of any doubt. When articles provided data on both abnormal Bayley III scores <85 and <70, we used <85 as indicative of abnormal outcomes, since Bayley III scores have been reported to be higher than Bayley II scores [12]. When articles described separate results for motor and cognitive or language impairment, we chose cognitive impairment for the meta-analysis.
Quality Assessment
The quality of the included studies was assessed using the Quality in Prognosis Studies Tool on the risk of bias in the following 6 domains: patient selection, study attrition, measurement of prognostic factors, outcome measurements, study confounding, and statistical analysis and reporting [13]. Two reviewers (S.O. and L.C.A.S.) independently rated the methodological quality of each study and in case of disagreement a third author (N.E.A.) was consulted.
Statistical Analysis
Receiver operating characteristic curve analysis using MedCalc Software (version 19.0.7; Ostend, Belgium) was used to identify the optimal cut-off value when the predictor was reported as continuous data or with more than 2 abnormal classifications (e.g., for MRI scoring methods).
If 3 or more studies reported a predictor of outcome, the predictor was included in the meta-analyses. Meta-analyses were performed using the Meta4Diag package in R (www.r-project.org, version 3.6.0) [14]. The Meta4Diag package allows modelling of pooled logit sensitivity and logit specificity using a bivariate random effects approach with a binomial distribution. This model has been shown to perform better in meta-analysis of diagnostic test accuracy studies, especially when including studies with smaller sample sizes [15]. The diagnostic OR (DOR) was calculated for direct comparison of the diagnostic utility of individual tests (DOR = [true positives/false positives] / [false negatives/true negatives]). In contrast to sensitivity and specificity, the DOR is less dependent on the threshold used and therefore more constant.
As the predictive value of a modality may change over time after perinatal asphyxia, separate meta-analyses were performed for the different time points at which the modality was applied. The results of the meta-analyses were demonstrated in forest plots. No indicator of heterogeneity was calculated, but its potential causes were explored.
Some predictors or time points were reported by only 1 or 2 studies. Those studies were not included in the meta-analyses but were reported in a separate forest plot.
Results
The search identified 1,279 articles after excluding for duplicates. Ninety-nine potentially relevant articles and 26 conference abstracts were identified based on the abstract. After reading the full text, 47 of the 99 articles were included, 41 of which contained data for 2 × 2 tables. Additional data was sent by the contacted authors for 4 out of 16 articles and 3 out of 22 conference abstracts. Forty-seven articles and 3 conference abstracts were eventually selected for this systematic review and 37 were eligible for the meta-analysis (online suppl. Fig. 2).
The study characteristics and test classifications are shown in Tables 1 and 2, respectively. Of all of the studies, 40% had a prospective design. Most studies scored a moderate risk of bias in at least 2 of the graded categories (online suppl. Table 1). The included studies concerned 3,072 infants. The degree of HIE was reported in 30 studies and it was mild in 150, moderate in 1,182, and severe in 386 infants based on the Sarnat score. The median Thompson scores used in 4 studies ranged between 9 and 12. The age at neurodevelopmental follow-up varied between 18 and 84 months and the follow-up rate was 93%. Death or an abnormal neurodevelopmental outcome was seen in 39% of the infants.
Amplitude-Integrated Electroencephalography
Seventeen articles reported on aEEG recordings [6, 16-31]. Nine of those articles used the classification for background pattern by Hellström-Westas et al. [32], 3 used the voltage pattern of al Naqueeb et al. [33], and 3 studies used both methods. The classification by Hellström-Westas et al. [32] had higher sensitivity but lower specificity scores in studies using both methods. The predictive value of the aEEG classification was reported at different time points, including during the first 6 h after birth, throughout the 72 h of cooling and after rewarming. Ten studies had aEEG recordings during the whole cooling period [6, 17, 18, 21, 23, 24, 26-28, 30]. The meta-analysis of aEEG at different time points is shown in Figure 1, and the results of individual studies not included in the meta-analysis in Figure 2. Overall, sensitivity decreased after 36 h with an increase of specificity. This was reflected by the lowest DOR at 6 h and the highest DOR at 36 h (9 vs. 101).
Overall, a higher voltage on aEEG was associated with a better outcome. In a number of studies quantitative analyses of the aEEG were performed. Shellhaas et al. [26] used quantified aEEG margins and found that higher lower margins and mean aEEG voltages at 24–48 h were associated with good outcomes. Three studies quantified the duration until normalization of the aEEG, and they reported that longer times were predictive of poor outcomes [6, 24, 27].
Development of sleep-wake cycling on aEEG was reported in 5 studies, in which early development of sleep-wake cycling was significantly associated with normal outcomes [6, 17, 18, 23, 28]. Out of the 17 aEEG studies, the conference abstract by Aeby et al. [30] was the only study that reported no association between the aEEG pattern and outcomes. Aeby et al. [30] used the classification of al Naqueeb et al. [33], which was also used by Lally et al. [20] with lower reported sensitivities. The used classification pattern may therefore play a role in the observed heterogeneity for aEEG during the first 6 h after birth.
Electroencephalography
Eleven studies reported data on EEG as a predictor of neurodevelopmental outcomes [30, 34-43]. Different EEG scoring systems were used (2 studies used the classification by Watanabe et al. [44], 2 used the classification by Murray et al. [45], 2 used the classification in Lamblin et al. [46], 1 used that of Pressler et al. [47], and another one following the American Clinical Neurophysiology Society guidelines [48]).
Four articles reported continuous EEG recordings during the cooling period [30, 35, 42, 43]. When comparing EEG at different time points in the meta-analysis, sensitivity was comparable at 24 and 48 h after birth while specificity was slightly lower at 48 h, resulting in a lower DOR. While most studies reported EEG findings during hypothermia, 4 studies reported EEG patterns after hypothermia [36, 38, 39, 41], which were found to be predictive (Fig. 2).
Near-Infrared Spectroscopy
Four studies focused on the predictive value of NIRS [21, 23, 26, 49], but at different time points, so no meta-analysis was performed. Lemmers et al. [21] calculated the optimal cutoff and found that regional cerebral oxygenation (rScO2) values, measured with a small adult transducer, higher than 77% at 24–48 h were associated with abnormal neurodevelopmental outcomes. Similarly, Niezen et al. [23] concluded that rScO2 values above 90% at 48 h were associated with adverse outcomes. They used a pediatric NIRS sensor. The other articles concluded that rScO2 was not associated with neurodevelopmental outcomes [26, 49].
Evoked Potentials
The use of somatosensory evoked potentials (SEPs) was studied in 3 articles [41, 50, 51]. Only Nevalainen et al. [50] performed SEPs during hypothermia, which was found to be predictive. In 2 other studies SEPs were performed after rewarming, but with a lower sensitivity or specificity [41, 51]. In the study by Cainelli et al.[41] both visual evoked potentials and SEPs were performed around day 7 in infants with a normal MRI. They found that visual evoked potentials were a better predictor of outcomes than SEPs.
Magnetic Resonance Imaging
Twenty-two studies reported scoring methods to assess cerebral injury using T1- and T2-weighted imaging [4, 18, 20, 23, 27, 34, 35, 38, 39, 51-63]. Thirteen of those studies also used diffusion-weighted imaging (DWI) or diffusion-tensor imaging (DTI) to assess injury [18, 23, 34, 38, 51-54, 56, 58, 61-63]. Different scoring methods were used, including the Barkovich score in 7 studies [64], the Rutherford score in 6 studies [65], the National Institute of Child Health and Human Development score in 2 studies [60], a score by van Rooij et al.[66] in 2 studies, a score by Bednarek et al. [67] in 1 study, and a new score in the study of Weeke et al. [63]. Four articles did not refer to a previously described scoring method but reported injury to the basal ganglia or thalami (BGT), the posterior limb of the internal capsule (PLIC), the cortex, and the white matter (WM) and/or the watershed regions. The postnatal age at scanning differed between studies, with median ages ranging from the first week to day 15.
The results of the meta-analysis of the studies using a scoring method are shown in Figure 3 and the data of studies not included in the meta-analysis, including those reporting injury to individual structures, are depicted in Figure 2. The MRI injury scoring methods were more predictive during the first week than in the second week, with higher sensitivity, specificity and DOR values. This was in line with 2 individual studies, which investigated the timing of conventional MRI. Charon et al. [56] performed MRI on days 4 and 11, and Rutherford et al.[4] performed it before or after day 8. Both studies reported higher predictive values for the early MRI [4, 56].
Meta-analyses could also be performed for injury to the BGT and PLIC on conventional MRI. BGT injury showed a low sensitivity during the first and second week but a high specificity in the second week. Injury to the PLIC, observed in the second week, was found to be very predictive. Only 2 studies reported injury to the PLIC during the first week in relation with outcomes, showing results similar to those reported in the second week [56, 62]. Injury to the WM, watershed areas or cortex as individual predictors was found to have a low sensitivity [20, 38, 52, 54, 62].
Three studies showed a significant association between MRI injury scores and continuous Bayley scores in cognitive, motor, and language domains [27, 57, 61], whereas Schreglmann et al. [59] found no such association.
Quantitative Analysis of Diffusion Imaging
Eleven studies performed quantitative analyses on diffusion imaging data, i.e., 5 using DWI within the first week of life [56, 58, 68-70] and 6 using DTI [20, 53, 57, 62, 71, 72]. Multiple studies reported that lower apparent diffusion coefficient (ADC) values during the first 7–10 days were associated with adverse outcomes. This included lower ADC values in the basal ganglia [69, 72], the centrum semiovale [56, 70], the caudate nucleus [53], the PLIC [53, 56, 70, 71], the frontal or parietal WM [53, 70], and the posterior WM [56]. DWI of the thalami could be included in the meta-analyses. ADC values of the thalami were especially useful for identifying those with a good outcome (showing normal ADC values) with a high specificity and a DOR of 119. Injury to the (posterior part of the) corpus callosum was related to adverse outcomes in 3 studies [53, 58, 68]. ADC values in the cerebellum and brainstem [70], as well as cortical ADC values [71], were not related to outcomes.
Al Amrani et al. [71] analyzed DTI data acquired at different time points during the first months after birth. In the adverse outcome group, ADC values in the BGT and PLIC were significantly lower on days 2–3, followed by significantly higher values on day 10. On days 6–10, fractional anisotropy values were significantly lower in the PLIC [20, 53, 62, 71], the anterior limb of the internal capsule [62], the corpus callosum [53, 57, 62], the corticospinal tract [57], the frontal WM [53, 62], and the BGT in the adverse outcome group in 2 out of 3 studies [53, 71, 72].
Proton Magnetic Resonance Spectroscopy
Six studies used proton magnetic resonance spectroscopy (1H-MRS) as a predictor of neurodevelopmental outcomes with different types of metabolites and regions of interest [20, 53, 54, 69, 72, 73]. The lactate/N-acetylaspartate (NAA) ratio in the BGT was included in the meta-analyses and it was associated with adverse outcomes with a DOR of 18. NAA/choline ratios in the BGT were not associated with outcomes in one study [69] but it was moderately associated with outcomes in another study [20]. Ancora et al. [53] also studied the basal ganglia and reported lower NAA and higher lactate and myo-inositol ratios in infants with adverse outcomes [53]. Sijens et al. [73] only found significantly lower NAA in the gray matter of infants who died [73]. A study by Barta et al. [54] studied 36 different metabolites in the thalamus, in which myo-inositol/NAA ratios had the strongest correlation with outcomes [54].
Ultrasonography
Resistive indices (RI) in transfontanellar duplex ultrasonography as a measurement of outcome prediction was studied in 2 articles [39, 74]. A RI below 0.60 prior to the start of TH was significantly associated with abnormal outcomes according to Gerner et al. [74]. In that study, the RI after cooling was only significantly positively correlated to the raw gross motor function measure score but not to abnormal outcomes. Another study demonstrated significantly lower RI values in the adverse outcome group, with a RI value <0.46 on day 3 [39]. No studies reported on the predictive value of cerebral injury observed using ultrasound.
Discussion
In this article we have reviewed, and where possible performed meta-analyses of the currently used neurophysiologic and neuroimaging techniques for the prediction of neurodevelopmental outcomes following HIE and TH. In contrast to previous meta-analyses, we have also statistically compared the diagnostic characteristics of the different modalities. The best predictors of outcomes were aEEG at 36 h, PLIC abnormalities on MRI, ADC values of the thalamus, and MRS of the BGT. Early MRI was more predictive than MRI performed after the first week of life. This is relevant for clinical practice, since early prognostication is preferred.
According to our findings, aEEG had the highest DOR at 36 h. The prognostic utility of aEEG at 24 and 72 h was similar to that of EEG at 24 h. The least predictive were aEEG at 6 h and EEG at 48 h, reflected by the lower DOR. aEEG at 6 h did have a high sensitivity, however, which may be more important than a high specificity at this early stage in order to identify infants potentially at risk for an adverse outcome.
EEG was found not to be superior to a single- or 2-channel aEEG. This could be due to differences in the length of recording, as we found that aEEG was more often recorded continuously compared to EEG. The conference abstract by Aeby et al. [30] was the only study using both methods, in which the aEEG was not related to outcomes and EEG had sensitivity and specificity scores of 100 and 9%, respectively. However, that study included only 20 infants and not all infants had an EEG.
The American College of Obstetricians and Gynecologists guideline on HIE advises MRI between 24 and 96 h after birth to delineate the timing of the injury, whereas MRI 10 days after birth is recommended for optimal delineation of the extent of the injury [75]. While we did not study which time point is best for identifying the extent of the injury, we found that early MRI (up to day 7) had a better predictive value than late MRI (beyond the first week). As most studies used a scoring method, describing which areas are involved, this suggests that the optimal time point for delineation of the extent of injury may also be in the first week.
Many studies included DWI in their injury scoring method as it allows detection of ischemic injury during the first 7–10 days after the hypoxic event. The age at scan in the later MRI group ranged from 8 to 21 days. DWI abnormalities may still be present early in the second week, while changes in T1 and T2 will take time to evolve. The diagnostic information of an MRI on day 8 is therefore different from an MRI acquired on day 14, which was confirmed in our meta-analyses. This was also demonstrated by 2 studies that compared early and late MRI [4, 56]. It is therefore recommended to perform MRI including DWI and 1H-MRS during the first week after birth, preferably after rewarming as TH slows the evolution of diffusion abnormalities.
Injury to the PLIC diagnosed in the first or second week after birth was found to be predictive for abnormal outcomes. This was, however, partially influenced by the study of Charon [56], which reported no false-negative or false-positive cases. The large study of Lally et al. [20] reported a lower sensitivity of injury to the PLIC.
DWI abnormalities were quantified using ADC values in a number of studies. Meta-analyses of decreased ADC values in the thalami resulted in a DOR above 100. Similarly, other studies reporting decreased ADC values in other structures such as the PLIC and the corpus callosum were found to be predictive (Fig. 2). It is therefore highly recommended to include DWI in the standard MRI protocol, as it can be used to assess the extent of injury in an MRI score or quantify the ADC values.
The 1H-MRS derived lactate-NAA ratio was similarly predictive, as also reported by Alderliesten et al. [69], who compared both methods.
Meta-analyses could not be performed for studies reporting evoked potentials, NIRS, or cUS. SEPs were the most frequently reported evoked potentials and reflect the integrity of the somatosensory pathway. Outcomes may also be poor without injury to these pathways, which may explain the reported low predictive value of SEPs, especially at later time points.
The studies on NIRS reported that a high cerebral oxygenation is associated with adverse outcomes. In the studies of Lemmers et al. [21] and Niezen et al. [23], the majority of the infants with an adverse outcome died, most likely due to very severe brain injury resulting in a low cerebral metabolism and high cerebral oxygenation. The predictive value of NIRS for neurodevelopmental outcomes in those surviving the neonatal period seems limited.
We found no studies reporting the pattern of injury on cUS in relation to outcomes. As cUS can be easily used at the bedside and it can be used sequentially, it would be interesting to study its predictive value in this population.
In this review we focused on the diagnostic modalities, including neurophysiological or neuroimaging modalities, that can be used to predict outcomes. Some studies have also reported other predictors, such as biochemical biomarkers, physical examinations, or heartrate variability. We decided not to include these in the current review but, given the increasing number of papers on these predictors, they might be included in future reviews.
Comparison to Previous Reviews
Chandrasekaran et al. [8] performed a meta-analysis on aEEG in the cooling era concerning different postnatal time points and included 9 studies [8]. They found the highest diagnostic OR (i.e., 67) at 48 h after birth. This is comparable to our results, as they only reported aEEG at 24 and 48 h and not at 36 h. Chandrasekaran et al. [8] included studies with less than 15 infants and a neurodevelopmental follow-up of 12 months. We only included studies with at least 18 months of follow-up, as development of motor deficits may be difficult to diagnose before this age [76, 77].
Del Río et al. [78] also performed a meta-analysis on aEEG including studies with and without TH and a follow-up time of 12 months. They found the optimal timing of aEEG to be at 72 h [78]. A meta-analysis on conventional MRI performed by Sánchez Fernández et al. [79] included 5 studies with TH and reported an OR of 14 for an abnormal neonatal MRI predicting unfavorable neurodevelopmental outcomes [79]. They did not, however, analyze timing of MRI. A meta-analysis on 1H-MRS by Zou et al. [80] revealed potential predictive values of NAA/creatine and NAA/choline in BGT and also myo-inositol/choline in the cerebral cortex for adverse outcomes [80]. However, the meta-analysis only included 2 studies with TH patients, which were also included in our study.
Strengths and Weaknesses
This review provides an overview of the published literature up to now. However, there are several limitations that need to be addressed. Firstly, we observed a high level of heterogeneity among the included studies. This was mainly due to the use of different outcome classifications, definitions of abnormal test measurements, and timing of the test. We therefore performed the meta-analysis on data with similar timing in an attempt to reduce heterogeneity. Another limitation is a small sample size, including 28 studies with a study population of less than 50 infants.
Finally, the presence of bias is also a concern when performing a systematic review. To minimize bias we chose not to eliminate studies if they were not eligible for the meta-analysis, but we included them for the review. Inclusion of conference abstracts for the meta-analysis might prevent publication bias, but it also results in inclusion of abstracts with limited information and of which the study design and results have not been peer-reviewed.
Conclusions
This meta-analysis shows that aEEG at 36 h, PLIC abnormalities on MRI, ADC values of the thalamus, and MRS are most predictive of adverse (neurodevelopmental) outcomes. According to our findings, aEEG at 24–72 h and EEG at 24 and 72 h after birth were superior to aEEG at 6 h and EEG at 48 h. Early conventional MRI in the first week of life was preferred over late MRI.
Future studies might benefit from combining neurophysiological and neuroimaging modalities such as aEEG between 24–48 h and early MRI to further improve prediction of outcomes. Combining diagnostic modalities in the neonatal period with neurological examination on early follow-up has been shown to result in very accurate prediction of outcomes in other neonatal populations at risk for adverse outcomes and warrant further studies in infants with HIE [81].
Statement of Ethics
This study was conducted ethically according to the World Medical Association Declaration of Helsinki.
Disclosure Statement
The authors have no conflict of interests to declare.
Funding Sources
No funding was secured for this study.
Author Contributions
S.O. conceptualized and designed this study, collected data, and drafted the initial version of this paper. L.C.A.S. conceptualized and designed this study and collected data. N.E.A. conceptualized and designed this study, supervised data collection, performed the analysis, and reviewed and revised this paper. F.G. conceptualized and designed this study, performed the analysis, and reviewed and revised this paper. L.S.V., M.J.B., and J.D. conceptualized and designed this study and reviewed and revised this paper. All of the authors approved the final version of this paper as submitted and agree to be accountable for all aspects of this work.