Abstract
Background: Progression is believed to be a common and important complication in acute stroke, and has been associated with increased mortality and morbidity. Reliable identification of predictors of early neurological deterioration could potentially benefit routine clinical care. The aim of this study was to identify predictors of early stroke progression using two independent patient cohorts. Methods: Two patient cohorts were used for this study – the first cohort formed the training data set, which included consecutive patients admitted to an urban teaching hospital between 2000 and 2002, and the second cohort formed the test data set, which included patients admitted to the same hospital between 2003 and 2004. A standard definition of stroke progression was used. The first cohort (n = 863) was used to develop the model. Variables that were statistically significant (p < 0.1) on univariate analysis were included in the multivariate model. Logistic regression was the technique employed using backward stepwise regression to drop the least significant variables (p > 0.1) in turn. The second cohort (n = 216) was used to test the performance of the model. The performance of the predictive model was assessed in terms of both calibration and discrimination. Multiple imputation methods were used for dealing with the missing values. Results: Variables shown to be significant predictors of stroke progression were conscious level, history of coronary heart disease, presence of hyperosmolarity, CT lesion, living alone on admission, Oxfordshire Community Stroke Project classification, presence of pyrexia and smoking status. The model appears to have reasonable discriminative properties [the median receiver-operating characteristic curve value was 0.72 (range 0.72–0.73)] and to fit well with the observed data, which is indicated by the high goodness-of-fit p value [the median p value from the Hosmer-Lemeshow test was 0.90 (range 0.50–0.92)]. Conclusion: The predictive model developed in this study contains variables that can be easily collected in practice therefore increasing its usability in clinical practice. Using this analysis approach, the discrimination and calibration of the predictive model appear sufficiently high to provide accurate predictions. This study also offers some discussion around the validation of predictive models for wider use in clinical practice.
Introduction
Early neurological deterioration within the first few hours or days of stroke onset has been referred to as stroke progression. It is a common event that has been reported to occur in 20–40% of acute strokes, and is associated with increased mortality and morbidity [1]. Stroke progression can be defined as a patient admitted to hospital with acute stroke whose neurological condition deteriorates over 48–72 h [2]. Stroke progression is often characterised by a fall of ≥2 points on a neurological scale measuring stroke severity, such as the National Institute of Health Stroke Scale. An alternative, validated approach is to define progression as any significant deterioration in conscious level, speech, arm or leg power, or facial weakness [2]. It is accepted that in stroke progression, neurological worsening takes place slowly with an amplification of previous deficits or appearance of new symptoms corresponding to the same vascular territory. Progression has been shown to be a common and important complication in acute stroke, which has also been linked to poor long-term outcome.
Reliable identification of predictors of early neurological deterioration could potentially be of benefit for routine clinical care. Therefore, it is important to understand the mechanism by which progression in stroke occurs and identify the factors that may be able to predict such an occurrence. Yet, the causes of stroke progression are largely unknown and the predictors identified in previous studies vary [3,4]. A previous case-control study using 873 consecutive acute stroke admissions identified a history of diabetes and raised systolic blood pressure as predictors of stroke progression [4]. However, another prospective study of 868 patients with acute stroke found that the risk of early progression decreased as systolic blood pressure increased [5]. This same study also showed that diabetes was a risk factor for early progression and initial stroke severity for late progression. A smaller study of 152 consecutive patients with first-ever ischaemic strokes showed that high levels of glucose on admission and brain swelling on the first computed tomography (CT) were the predictors of progression [6]. A more recent study of 196 patients with ischaemic stroke identified blood urea nitrogen or creatinine as a predictor of stroke progression [7]. Another study investigating 133 lacunar stroke patients suggested that hypertriglyceridaemia may be a possible predictor of stroke progression [8]. The aim of this study was to identify reliable predictors of early neurological deterioration in stroke.
Patients and Methods
Consecutive patients admitted to an urban teaching hospital with a diagnosis of stroke between 2000 and 2002, and subsequently between 2003 and 2004, were registered to a research database. Baseline data on stroke characteristics, stroke severity and early neurological deterioration were collected prospectively during both periods. This created two patient cohorts for this study – the first cohort formed the training data set and included 1,029 consecutive patients admitted to hospital, and the second cohort formed the test data set and included 219 patients admitted to the same hospital. Only patients with complete outcome data were assessed in the analysis – 863 patients in the training data set and 216 patients in the test data set. With respect to acute medical management, patients received aspirin after CT, hypertension was not actively managed and premorbid antihypertensives were usually withheld for at least 72 h. No patients in the study group received thrombolysis.
Stroke progression was defined as a ≥2-point worsening in the Scandinavian Stroke Scale (SSS) in either conscious level, arm, leg or eye movement scores, and/or a ≥3-point SSS worsening in speech score within 72 h of hospital admission [9]. Outcome assessment was conducted 3 days after stroke.
The training data set was used to develop the predictive model. Clinical opinion and the existing literature were used to reduce the number of variables collected at baseline to those considered potentially predictive of stroke progression. Univariate analysis was then conducted between these potentially predictive variables and the outcome variable to determine which should be included in the model [10,11]. The variables that were statistically significant (p < 0.1) on univariate analysis were included in the multivariate model. Logistic regression was the technique employed using backward stepwise regression to drop the least significant variables (p > 0.1) in turn.
The performance of the model was investigated using the test data set and assessed in terms of both calibration and discrimination [12]. It is the area under a receiver-operating characteristic (ROC) curve which provides an assessment of how good the model is in discriminating between individuals with and without the outcome [10]. The calibration of the model, which is comparing the observed proportion of events against the predicted probabilities, was tested using the Hosmer-Lemeshow test.
Multiple imputation methods were used for dealing with the missing values in both data sets. It was assumed that the missing data on the potentially prognostic variables were missing at random. This assumes that the probability that a value is missing is dependent on the values of variables that were actually collected and not independent of patient characteristics. Multiple imputation allows for the uncertainty about the missing data by creating 10 different plausible imputed data sets and appropriately combining results obtained from each of them [13].
Two predictive models were developed and validated using observed and imputed data sets. All analyses were carried out in Stata 11.
Results
Stroke progression was recorded in 197 (23%) and 40 patients (18.5%) in the training and test data sets, respectively. Clinical opinion identified 27 variables as potentially predictive of stroke progression (table 1). Overall, the characteristics of the patients in the two cohorts were similar (table 1). The majority of patients were previously independent, with approximately 50% of the patients with hypertension and one third of patients having had a previous stroke. Co-morbidities such as atrial fibrillation, cardiac failure and diabetes were recorded in a minority of the patients. Patients who had a haemorrhage stroke were not recruited to the test cohort of patients.
Data on age, gender, conscious level, living alone, Oxfordshire Community Stroke Project (OCSP) classification, previous independence, type of CT lesion and the side of the lesion were complete for all patients. Of the 27 prognostic variables, 18 variables had incomplete data. Data on arm power, atrial fibrillation, abnormal physiological features, diabetes, history of coronary heart disease, hypoxia, hypertension, leg power, maximum systolic blood pressure, minimum systolic blood pressure, pyrexia, hyperglycaemia, previous stroke, previous transient ischaemic attack, smoking status, alcohol intake, osmolarity and verbal score were missing in 0.1–33.4% of the patients. The variables with the highest level of missing data were alcohol use (33.4%), smoking status (19.0%), hyperglycaemia (11.0%) and hyperosmolarity (12.6%; table 1).
On univariate analysis, age, alcohol intake, living alone, previous independence, verbal score, arm power, leg power, conscious level, history of coronary heart disease, smoking status, CT lesion, OCSP classification, pyrexia, hyperosmolarity, one abnormal physiological sign, maximum and minimum systolic blood pressure, and side of lesion were significantly associated with stroke progression. The results of logistic regression of the imputed data set are shown in table 2. Conscious level, history of coronary heart disease, presence of hyperosmolarity, living alone on admission, OCSP classification and presence of pyrexia were associated with a significant increase in the odds of stroke progression. In particular, patients who experienced a total anterior circulation syndrome (ACS) were twice as likely to have a progression of stroke as those that had a partial ACS. In contrast, patients who had no lesion on their CT scan had significantly reduced odds of stroke progression in comparison to those with an infarction on CT scan; a similar effect was found in patients who were current or ex-smokers in comparison to those that were non-smokers.
A similar predictive model was developed, based only on the observed data in the training data set (table 2). Both models shared common independent predictors, with the exception of lesion on CT and living alone; these were not shown to be predictive in the observed data model.
When testing the performance of the predictive model based on the 10 test data sets generated by missing imputation, the median ROC value was 0.72 (range 0.72–0.73). This implies that the model generated is moderately accurate and better than chance alone in predicting stroke progression. The median p value for the Hosmer-Lemeshow goodness-of-fit test was 0.90 (range 0.50–0.92), indicating a good fit between the model and the observed data. The observed number of patients with and without stroke progression is compared with the number expected in table 3. The performance of the predictive model was also tested using the training data set. The median ROC value 0.73 (range 0.72–0.74) was similar to that evaluated in the test data set. However, the median p value of 0.38 (range 0.06–0.70) for the Hosmer-Lemeshow goodness-of-fit test was lower indicating a poorer fit between the model and the observed data.
Discussion
This analysis suggests that conscious level, history of coronary heart disease, the lesion shown on CT scan, OCSP classification, living alone on admission, presence of hyperosmolarity, presence of pyrexia and smoking status are independent predictors of stroke progression. Some of the variables identified in this study have also been found to be predictive of stroke progression in other studies. However, some discrepancies do exist. For example, in this study, patients who experienced a total ACS were twice as likely to have a progression of stroke than those who experienced a partial ACS, while a previous study showed that neurological progression was observed twice as often in patients with posterior circulation infarction than in those with anterior circulation infarction [14]. Similarly, although reduced level of consciousness and a history of coronary heart disease have been associated with increased risk of stroke progression in a few other studies [2,15], in this study it was those patients that had a normal conscious level that appeared to have an increased odds of stroke progression. The way in which these variables were measured clinically and how they were put into the predictive model could account for these differences. Finding the variables not living alone and having a normal level of consciousness being predictive of stroke progression is considered unexpected. It may be that not living alone is associated with a higher level of frailty. This may also be the result of over-fitting (including too many variables in) the model resulting in the inclusion of apparently important predictors which are not actually independent predictors [16]. Stroke care may have changed since the study was conducted between 2000 and 2004. Therefore, it would be interesting to assess the performance of the model in a more recent cohort of patients. However, differences in performance may be difficult to attribute to changes in stroke care.
In terms of calibration, the model showed good agreement between the predicted probabilities of stroke progression and those actually observed. Evaluating the performance of predictive models in a new cohort of patients as opposed to the original cohort used to develop the model is an important feature when testing the validity of predictive models. In this study, the calibration properties of the model were different when evaluated in the test data set as when evaluated in the training data set. One reason for this may be that there were differences although a comparison of the demographics of the patients in the two cohorts suggests that they are similar. The test cohort contained only patients who had an ischaemic stroke and the care pathway between these two cohorts did differ. During the recruitment of the first cohort of patients, the hospital implemented a new acute stroke unit which was in place for the second cohort. This may have resulted in a different distribution of outcomes in this patient population, which may have influenced the calibration of the model. The model appeared to be able to discriminate fairly accurately between those patients likely to progress and those who were not when using the test data set. A very similar ROC value was derived when the model was tested in the training set. However, it is important to extend beyond the internal validation approach used in this study and investigate the model in not only a separate patient cohort, but a cohort that is from a different population. This would challenge the external validity and generalisability of the model as opposed to assessing internal validation, which may result in the performance of the model being overly optimistic. This reinforces the inherent difficulty with predictive modelling – how feasible is it to have one model apply to all patient subgroups across different models of care and countries [17]? Indeed, a few studies attempted to investigate how prognostic variables may vary across subgroups of patients or by the different causes of stroke. One study identified that different factors were associated with neurological worsening in different causes of stroke [15.]
Predictive models for patients with acute stroke can be useful in informing patient management and a number of predictive models in stroke research exist. A common limitation in this area is the management of data sets with data on potential prognostic factors missing. A standard approach to manage this is to conduct a complete case analysis whereby patients that have missing data are excluded from the analysis. This could lead to the exclusion of a defined subset of patients, for example unconscious patients, where it has not been possible to ascertain their smoking habits. The alternative approach, multiple imputation, as used in this study, has not only increased the statistical power of the analysis but also eliminated the bias associated with excluding patients in a complete case analysis. The increased power in the imputed data set resulted in two additional factors found to be significant. Also, on comparing the models based on imputed and observed training data, the confidence intervals of the odds ratios are narrower in model based on imputed data, especially for conscious level and OCSP classification. This is to be expected since these variables did not have any missing data. By using multiple imputation, a technique not yet considered standard in managing missing data, this analysis minimises the bias associated with the more commonly used complete case approach. This study also highlights some of the recognised methodological challenges in prognostic research, such as validity and credibility of models in clinical practice.
Conclusion
The predictive model developed in this study contains variables that can be easily collected in practice, therefore increasing its clinical applicability. This analysis shows that conscious level, a history of coronary heart disease, presence of hyperosmolarity, the lesion shown on CT scan, living alone on admission, OCSP classification, presence of pyrexia and smoking status are independent predictors of stroke progression. Using this analysis approach, the discrimination and calibration of the predictive model appear sufficiently high to provide accurate predictions.
Acknowledgments
L.E.C. is supported by a Stroke Association Research Fellowship. M.B. was a NHS Education for Scotland/Chief Scientist Office Clinical Research Fellow during the period of data collection for the test data set.
Disclosure Statement
The authors have no conflicts of interest.