Abstract
The present critical review was conducted to evaluate the clinimetric properties of the Charlson Comorbidity Index (CCI), an assessment tool designed specifically to predict long-term mortality, with regard to its reliability, concurrent validity, sensitivity, incremental and predictive validity. The original version of the CCI has been adapted for use with different sources of data, ICD-9 and ICD-10 codes. The inter-rater reliability of the CCI was found to be excellent, with extremely high agreement between self-report and medical charts. The CCI has also been shown either to have concurrent validity with a number of other prognostic scales or to result in concordant predictions. Importantly, the clinimetric sensitivity of the CCI has been demonstrated in a variety of medical conditions, with stepwise increases in the CCI associated with stepwise increases in mortality. The CCI is also characterized by the clinimetric property of incremental validity, whereby adding the CCI to other measures increases the overall predictive accuracy. It has been shown to predict long-term mortality in different clinical populations, including medical, surgical, intensive care unit (ICU), trauma, and cancer patients. It may also predict in-hospital mortality, although in some instances, such as ICU or trauma patients, the CCI did not perform as well as other instruments designed specifically for that purpose. The CCI thus appears to be clinically useful not only to provide a valid assessment of the patient’s unique clinical situation, but also to demarcate major diagnostic and prognostic differences among subgroups of patients sharing the same medical diagnosis.
Introduction
The term comorbidity has a Latin origin and results from the combination of two words: “co” meaning “along with” and “morbus” meaning “disease.” It was Alvan R. Feinstein who provided the first clinical definition of this concept, which refers to “any distinct additional clinical entity that has existed or that may occur during the clinical course of a disease that is under study” [1]. Further, he noted that a comorbid condition has the potential not only to impact a patient’s prognosis, but also to alter therapeutic plans and outcomes [1]. Prior to his paper, comparability of patients was judged primarily on similarities in age, gender, race, and anatomic stage and not on comorbid conditions. Since this resulted in comorbid conditions confounding outcomes, failure to catalog it led to the exclusion of patients with any diseases other than the index disease from studies.
The term “comorbidity” has been defined in many different ways [2]. The specific reason for defining a comorbid condition is crucial and there is no universally correct answer. Most agree that comorbidity is not a measure of overall health status, self-rated health, performance status (NY Heart Association criteria, Eastern Cooperative Oncology Group), psychological well-being, or stage of disease (i.e., Tumor-Node-Metastasis – TNM classification). Some measures have counted body systems involved or medications.
Central to the definition of a comorbid condition is the question: For what purpose? Co-occurrence is not sufficient to define a comorbid condition. If co-occurrence is the only criterion to define a comorbid condition, then color blindness, hangover, upper respiratory infection, an injured ankle after a car accident, pain in the thumb, and an elevated white blood cell count, would all be comorbid conditions.
The definition of a comorbid condition depends on the key questions involved: diagnostic comorbidity (conditions that confound diagnosis), treatment comorbidity (conditions that alter therapy), and prognostic comorbidity (conditions that impact outcomes). Diagnostic and treatment comorbidity must be defined in the context of a specific condition(s). In medicine, most major diseases are defined by specific criteria independent of the presence of another chronic disease, although the actual criteria may differ by groups or settings.
Diagnostic comorbidity is more complex in psychiatry than in medicine. In psychiatry, diagnostic comorbidity, even with the Diagnostic and Statistical Manual of Mental Disorders (DSM) handbook to improve inter-observer reliability, is a complex issue and considers a number of potential relationships between disorders: (a) one specific disorder preceding or increasing the risk for another one, (b) two coexisting disorders that may predispose to the development of another disease, (c) antecedent factors specific for different disorders, and (d) a complex interaction of one or more distinct antecedent factors [3]. This was shown in a 12-month study of 9,300 adults [4]; of the 2,500 adults who had one DSM-IV disorder, almost half (45%) had more than two more DSM-IV disorders [4]. This complexity resulted in the development and promulgation of psychometric methods of assessment of psychiatric scales. In psychiatry, psychometrics has dominated assessments of disease [5]. Psychometric analytics used strategies for assessment such as Cronbach’s alpha, where more items in the scale lead to higher correlations [5]. Fava and Bech [5, 6] pointed out the difference between psychometrics and clinimetrics in psychiatry, and the clinimetric properties of existing and widely used measures were examined [7]. Fava et al. [3, 8] raised the issue that patterns of symptoms, severity of illness, timing of disease, rate of progression, response to treatment or impact of conditions are often not considered in the usual taxonomy of psychiatry.
Wright and Feinstein [9] pointed out that psychometric and clinimetric scales had different purposes. For psychometric scales, a number of homogeneous items for assessing the diagnosis of a single condition may be important, but for measuring a phenomenon like change in status or more complex phenomena, the index cannot be homogeneous and redundant [9]. Clinimetrics is the term originally coined by Alvan R. Feinstein [10] to introduce an innovative approach that has been redefined as the science of clinical measurements [11]. Such a clinically based evaluation method is particularly useful for testing a number of measurement properties (e.g., inter-rater reliability, concurrent validity, sensitivity, incremental, and predictive validity). This paper will focus on the clinimetric assessment of prognostic comorbidity, which is a broader concept than diagnostic or treatment comorbidity, and specifically on chronic conditions that impact on survival outcomes, especially long-term survival. Comorbidity has also been used to predict a variety of outcomes: functional status, quality of life, complications, readmissions, and health care utilization [12]. Kaplan and Feinstein [13] proposed an innovative method for classifying and staging comorbidity in relation to long-term survival. They focused on cogent comorbid conditions excluding such conditions as varicose veins and hemorrhoids, as well as completed illnesses such as previous fracture, and evaluated hypertension, congestive heart failure (CHF), or myocardial infarction (MI), stroke, pulmonary insufficiency, renal disease, chronic liver disease, gastrointestinal bleeding, amputation, cancer, alcohol, or physical impairment on a grade 1–3 scale, assigning 3 to patients who had a 3 in any of the areas. The 5-year mortality rates for new onset diabetics range from 7% for patients with grade 0 to 69% for those with grade 3 [13]. Subsequently, the Charlson Comorbidity Index (CCI) developed in 1987 became the most widely used index, and is often considered to be the gold-standard measure to assess comorbidity in clinical research [14].
The present critical review was conducted to evaluate the clinimetric properties of the CCI, including reliability, concurrent validity, sensitivity, incremental validity, and predictive validity.
Methods
In view of the amount of literature on this topic (e.g., the number of citations of the original version of the CCI exceeds 36,925 on Scopus, accessed on September 30, 2021), this review cannot be systematic. We will analyze the most relevant studies concerned with its reliability, concurrent validity, sensitivity, incremental and predictive validity.
Search Strategy
A comprehensive search of the literature was performed using the following databases: MEDLINE, Embase, PsycINFO, PubMed, ScienceDirect, Scopus, and Web of Science. Each database was searched from inception to September 30, 2021. A manual search of the literature was also conducted, and reference lists of the retrieved articles were examined for further studies not yet identified. Further, the articles citing the original study [14] in the Web of Science were also considered to identify further potentially relevant studies. The search terms used were “comorbidity index,” “reliability,” “reproducibility,” “concurrent validity,” “predictive,” and “mortality.”
Eligibility Criteria
To be included in this review, studies had to meet the following criteria: (1) English-language article published in a peer-reviewed journal; (2) the full text of the article was available; (3) the article was an original study (e.g., research article, meta-analysis); (4) the study evaluated the clinimetric properties of the CCI or used a clinimetric approach to analyze the measurement properties of its different versions.
Study Selection
Three reviewers (C.P., D.C., and M.E.C.) independently performed the search, screened titles and abstracts, selected studies, evaluated the full text of articles appearing potentially relevant, and extracted data from studies meeting the eligibility criteria. In case of disagreement, a consensus was reached through discussion.
Results
The initial search of the literature yielded a total of 36,925 articles, but only those studies which best displayed the clinimetric properties of the various versions of the CCI were included and analyzed in this critical review. The different versions of the CCI have been extensively used in a wide range of medical settings and were found to entail the clinimetric properties of reliability, concurrent validity, sensitivity, incremental and predictive validity.
The CCI
The original version of the Comorbidity Index (Table 1) developed by Mary E. Charlson consisted of 19 items corresponding to different medical comorbid conditions [14] displaying different clinical weights on the basis of the adjusted risk of 1-year mortality, controlling for severity of illness of 559 patients admitted to the general internal medicine service (i.e., New York Hospital-Cornell Medical Center). The 19 conditions and associated weights, combined with age, were used to predict mortality of 685 patients with breast cancer from comorbid disease over 10 years [14]. The total score of the CCI consists in a simple sum of the weights, with higher scores indicating not only a greater mortality risk but also more severe comorbid conditions [14]. The CCI was developed to be used in different populations as a prognostic measure in longitudinal studies to predict mortality [14].
Different Adaptations of the CCI for Use with Different Data Sources
Over the years, several adaptations of the CCI for use with different sources of data have been proposed for coding medical records, electronic health records (EHR), problem lists and ICD-9 and ICD-10 data, and different versions have been developed and come into use [15-19].
Age-Comorbidity Index
The Age-Comorbidity Index (age-CCI) was designed for use in small studies and was a highly significant predictor of mortality [14, 15]. The age-CCI has been most often used in oncology. In 5,643 patients with colorectal cancer, the age-CCI predicted survival over 5 years [20], as well as perioperative and 18-month mortality in a smaller study of 279 patients [21]. The age-CCI predicted 5-year mortality in 2,257 patients with gastric cancer [22] and 379 patients with resected pancreatic cancer [23]. The age-CCI predicted 10-year mortality in 1,598 men with prostate cancer [24], 793 patients with ovarian cancer [25], and 567 patients with advanced ovarian cancer who had debulking surgery [26]. In 4,508 lung cancer patients, the age-CCI was a better predictor of 3-year mortality than either the CCI alone or the Elixhauser index [27]. In 698 patients with resected renal cell cancer, the age-CCI better predicted long-term mortality than the CCI [28]. In 1,132 women with early endometrial cancer, the age-CCI predicted 4-year survival [29].
The age-CCI has also been used in other types of patients. In 1,057 hip fracture patients, the age-CCI was the most significant predictor of 5-year survival [30] and in 142 patients undergoing revision hip arthroplasty, the age-CCI predicted 2-year survival [31]. In 515 incident dialysis patients, the age-CCI predicted 15-month mortality [32]. In 529 patients who had emergency general surgery, the age-CCI predicted 30-day survival [33].
Adaptations of the CCI Using ICD-9 Codes
Deyo et al. [16] proposed the first modified version of the CCI assessed through the application of ICD-9 diagnostic codes using fairly strict criteria. As Romano et al. [18] noted, Dartmouth-Manitoba created an ICD-9 interpretation using more ICD-9 codes. Not surprisingly, when the Deyo and Dartmouth-Manitoba versions were directly compared, the Deyo CCI scores were lower; however, the Deyo and Dartmouth-Manitoba CCI versions still had 90% agreement, and 95% agreement within one CCI point [34]. Both the Deyo and Romano adaptations had quite similar predictions of 6-month mortality in medical and surgical patients [35]. Roos et al. [36] also found that the Dartmouth-Manitoba Index was a significant predictor of 1-year mortality after coronary artery bypass graft (CABG), pacemaker, or hip fracture surgery. Along similar lines, Khan et al. [37] translated the comorbidity index into the Read/OXMIS codes used in British primary care. D’Hoore et al. [38] adapted the comorbidity index to ICD-9 using only the first three codes of ICD-9 and showed it predicted inpatient death in 62,456 patients with one of four medical conditions [39]. However, D’Hoore did not fare as well as the Deyo or Romano versions in direct comparison of prediction of 1-year mortality in 141,161 participants enrolled in epidemiologic studies [40].
Adaptations of the CCI Using ICD-10
Sundararajan et al. [19] adapted the Deyo version of CCI for use with ICD-10 codes, classifying the CCI for more than 400,000 patients hospitalized in each of 4 years, in comparison to 2 years of ICD-9 codes, as a predictor of in-hospital death. Using the area under the receiver operating characteristic (ROC) curve as a measure of the CCI’s ability to discriminate between those subjects who experienced the outcome of interest (i.e., in-hospital mortality) and those who did not [41], Sundararajan et al. [19]showed that ROC values for the revised CCI were found to range from 0.85 to 0.86. Halfon et al. [42], using clinical judgment, also mapped the Deyo adaption to ICD-10 codes, with codes that differ somewhat from Sundararajan’s to study readmission of 3,473 Swiss patients (finding that comorbidity predicted readmission).
Quan et al. [43] took both the Halfon and Sundararajan codes and added a third list developed by coders to formulate a new set of ICD codes in 158,805 patients discharged in Calgary in 1 year; their ICD-10 CCI map had a statistic of 0.845 for in-hospital mortality [43]. Translating the ICD-9 codes used by the Deyo and Dartmouth-Manitoba CCI versions to ICD-10 codes by using a World Health Organization ICD-9 to ICD-10 translator [44], Nuttall et al. [45] compared their performance in ICD-10 and found small differences in performance between Deyo and Dartmouth (Deyo c = 0.71; Dartmouth, c = 0.73) in predicting inpatient mortality.
The Halfon, Sundararajan, and Quan ICD-10 versions of the CCI were compared in one paper using different numbers of patients between 225,000 in Japan and 37,057 in Switzerland and the distribution of scores was similar across the different versions [46]. The ROC curve for predicting in-hospital mortality was higher for the Quan and Sundararajan versions than Halfon [46].
Inter-Rater Reliability of Data from Different Sources: Self-Report, Interview, or Medical Chart
The clinimetric property of inter-rater reliability refers to the degree of agreement or concordance between different raters assessing the same clinical phenomena using different methods [47]. Waite et al. [48] were among the first authors who evaluated the inter-rater reliability of the original version of the CCI; they found a 58% absolute agreement in CCI scores across five trained raters with a 99% agreement if the difference was only a weight of one [48]. Hall et al. [47] also evaluated inter-rater reliability of CCI assessed by four separate raters in 40 patients with head and neck cancer finding an intra-class correlation coefficient (ICC) of 0.80. Bernardini et al. [49] showed a kappa of 0.93 for two raters of CCI in dialysis patients.
Self-Report versus Chart
Different self-reported versions have been compared to data obtained from medical charts [50]. Katz et al. [17] created a self-reported questionnaire that had excellent agreement with chart-extracted data in 170 patients (ICC = 0.92). Molto and Dougados [51] found high test-retest reliability (ICC = 0.94) for self-report compared to chart review. Olomu et al. [52] found that the self-report and the chart version were closely equivalent. However, in one study [53] more conditions were reported by interviewers than in charts (ICC = 0.51).
Chart Review versus Discharge Summary and Problem Lists
Medical chart data provided more complete data about the CCI than hospital discharge data; thus, discharge data underestimated total comorbidity [54]. Swartz et al. [55] also found that discharge data only had half the chronic conditions as medical records. In contrast, discharge data were found to have more conditions and more accurately predict mortality than Medicare-linked Surveillance, Epidemiology and End Results (SEER) data [56]. However, chart review had higher specificity but poorer sensitivity for conditions in the CCI than EHR index-based problem lists (kappa = 0.23), resulting in significant failure to accurately predict mortality in 1,596 men with prostate cancer over 15 years [57].
Self-Report versus Claims Data
The largest study of self-reported data versus claims data evaluated 7,761 patients admitted to a medical service over 4 years, and found that the area under the curve (AUC) for self-reported and ICD-9 diagnosis at the time of admission and the ICD-9 diagnoses over the year antecedent to admission were quite similar in predicting 1-year mortality (c = 0.70; c = 0.73, respectively) [58]. Zhang et al. [59] also found that both claim-based adjustment and survey-based adjustment resulted in similar predictions (c = 0.702 and c = 0.704, respectively) of 2-year mortality in 30,000 Medicare patients. On the other hand, agreement between self-report and 12 months of administrative data was poor in 520 emergency department patients [60].
Chart versus Claims Data
A review of seven papers comparing chart data with administrative data suggested that the claims data underestimated the CCI by 1 point [61]. Kieszak et al. [62] compared data extracted from charts versus ICD-9 codes for Medicare beneficiaries and found many more diagnoses on the charts, and that improved prediction of inpatient and 30-day mortality. In men who had prostatectomy, claims data underestimated CCI compared to chart data [63]. Quan et al. [64] showed that from the CCI, 4 conditions had similar prevalence between chart and administrative data; 10 had a lower prevalence in claims, and 3 had a higher prevalence [64]. For intensive care unit (ICU) patients, Li et al. [65] also found more comorbid illnesses by chart review than were coded in administrative data for ICU patients. Thygesen et al. [66] evaluated the positive predictive value of ICD-10 by reviewing charts, and found that overall positive predictive values were 98%.
However, there are some international differences; in 2,464,395 hospitalized Chinese adults, the discharge diagnosis-based CCI was a significantly better predictor of in-hospital mortality than the ICD-10 administrative data (kappa varied by specific condition) [67]. In 959 patients admitted to an ICU in Norway, the CCI from chart and administrative data were significantly correlated (r = 0.667), but the CCI from charts was more complete [68]. When conditions coded by the Deyo version of ICD-9 were compared to those found in 1,200 patients in a chart review, the agreement was quite high (weighted κ = 0.71), but comorbidity was underreported in administrative data [64]. Newschaffer et al. [69] (who also compared Kaplan-Feinstein and Satarino) found in 404 breast cancer patients that the inter-rater agreement was high (κ = 0.945) using either records or claims, but claims had fewer data than records. In 524 hospitalized adults, van Doorn et al. [70] also showed that chart-based scores differed from scores derived from claims data. In contrast, in 890 Swiss patients administrative data were more complete than single day chart review [71].
More Years and Reliability
Lee et al. [72] showed that in 1,808 hospitalized adults, adding prior years administrative data on comorbidity improved prediction of 1-year mortality. Increasing the “look-back” interval for ascertaining comorbid diseases improved accuracy [55, 58, 59, 68, 72, 73]. Wang et al. [74] studied 50,000 Medicare beneficiaries for 1-year survival, contrasting baseline comorbidity, prior year comorbidity, change between baseline and prior year comorbidity, rolling comorbidity, and change in rolling comorbidity, and found that using all the comorbidity data (and not truncating it by time period) provided the best prediction. Maringe et al. [73] suggested that 6 years of prior administrative data were optimal for validity and reliability.
Types of Visits and Reliability
Using ICD-9 codes from admission, Wang et al. [75] compared all outpatient visits in the antecedent 6 months, all inpatient visits in the antecedent 6 months versus all sources in 3,994 women with breast cancer and found there was poor agreement in diagnoses between specific types of visits.
Concurrent Validity
Concurrent validity is a clinimetric property that evaluates whether two scales that ostensibly measure or predict the same outcome are significantly correlated or result in concordant predictions. Most studies that evaluated the concurrent validity of the CCI found moderate or greater correlations with other measures that predicted mortality. When comparing the CCI to the Index of Coexistent Disease (ICED) in 1,789 patients with rheumatoid arthritis and osteoarthritis followed over 10 years (with a high mortality rate of 64%), Gabriel et al. [76] showed that the two measures had a correlation of 0.58 and were independent predictors of mortality. In 7,511 breast cancer patients and 1,482 prostate cancer patients, the CCI had a similar prediction in comparison to a count of the conditions in the CCI [77, 78]. In 347 men with prostate cancer, Alibhai et al. [79] compared the CCI, ICED, disease count, and the number of medications and found that the CCI and the ICED had similar predictors of 6.5-year mortality (c = 0.61 for both). Similarly, Albertsen et al. [80] evaluated 451 men with prostate cancer followed over 20 years and found that the CCI, the ICED, and the Kaplan-Feinstein had similar predictions of mortality. In 688 older community dwellers, both CCI and ICED predicted 5-year mortality [81]. In 330 spinal cord injured patients, Rochon et al. [82] compared a count of ICD diagnostic codes and the CCI, and found that both predicted 18-month mortality, yet the Charlson Index outperformed the ICD diagnostic codes. In mitral valve patients undergoing minimally invasive surgery, the CCI and the Society for Thoracic surgeons score were highly correlated; both predicted 1-year mortality [83].
The diagnostic cost group (DxCG), which was designed to predict cost, and the CCI were compared in 2,167 surgical patients and shown to have a correlation of 0.56 [84]. The American Society of Anesthesiologist classification had a similar but slightly better prediction of 30-day mortality compared to the CCI in 650,437 patients [85]. Ash et al. [86] predicted 1-year mortality after acute myocardial infraction in 5 different years, each with slightly over 300,000 patients using three different scales: diagnostic cost group with 122 variables with a vector of 118 hierarchies of seriousness, the Agency for HealthCare Research and Quality (AHRQ) clinical classification software (CCS) with 263 variables, and the CCI. Their validation analysis showed a more accurate prediction using the DCG (c = 0.81) and the CCS (c = 0.82) than the CCI (c = 0.74) [86].
In 2,728 Medicare beneficiaries with atrial fibrillation, the Deyo and Romano versions of the CCI were highly correlated (Spearman = 0.8), but much less so with the Elixhauser Index (r = 0.37-0.44); the two versions of the CCI had a slightly higher accuracy in predicting mortality [87]. Luchtenborg et al. [88] compared the Charlson and Elixhauser Index in 233,981 patients with lung cancer and found a correlation of 0.82. In 14,313 HIV patients, the CCI and the Elixhauser Index had similar performance in predicting 1-year mortality, and better performance in predicting 2-year mortality in the validation cohort [89].
Sensitivity
Clinimetric sensitivity is demonstrated by stepwise increases of mortality with stepwise increases in CCI. In one of the largest studies, Fraccaro et al. [90] evaluated 287,459 adults in Salford UK, and quantified CCI at baseline, 5 years and 10 years, showing that mortality increased stepwise with increasing CCI. Fraccaro et al. [90] also showed than 1 in 10 patients had an increase in CCI over 10 years, from 1.9% at 1 year, 10.4% at 5 years, and 15.9% at 10 years; those who had increases in CCI had a significantly higher mortality at the three specific time points. The larger the increase in CCI, the greater the increase in mortality; in addition, the higher the baseline CCI, the higher the mortality with subsequent increases in CCI over the 10 years [90]. Thus, the more increases in CCI and the more rapid the increases occur, the higher the mortality [90].
There have been similar findings in different populations. In 1,300 acutely hospitalized older adults, 90-day, 1-year, and 5-year mortality all increased stepwise with increasing comorbidity [91]. In 385 homeless adults followed over 9 years, controlling for cognitive function, for each 1-point increase in the CCI, the risk of mortality was 9.9% higher and was consistent across ages [92]. A total of 456,263 newly diagnosed hypertensive patients were followed over 12 years showing significant increases in comorbidity over time; the percent of patients who developed at least one new comorbid condition increased from 36 to 60% at the end of follow-up [93]. A Danish study enrolled 9,329 breast cancer patients during four different 2-year periods and followed them for 10 years and showed that the CCI predicted overall survival over the 10 years [94]. Over the follow-up period, stepwise increases in CCI predicted a stepwise increase in mortality, although overall survival improved in each of the CCI ranks over the secular intervals [94].
In 1,197 breast cancer patients, compared to women with a Charlson comorbidity score of 0 (no comorbidity), patients with scores of 1, 2, and 3+ had risk ratios for 10-year mortality of 1.23 (p = 0.10), 2.58 (p < 0.001), and 3.44 (p < 0.001), respectively [95]. In 3,102 patients discharged from the emergency department, over 1-year follow-up mortality increased stepwise from 7.0% with 0; 22% with 1–2; 31% with 3–4; and 40% ≥5 [96]. In a study of 13 geographic regions, patients with lung cancer with a lower Charlson score always had lower 1-year survival than those with a score >3 or more, although the magnitude varied across regions [88]. Similarly, in 588 patients with non-small cell lung cancer, with increasing CCI scores there were stepwise decreases in survival at 2, 3, and 5 years [97]. In 533 patients with diabetic nephropathy, Huang et al. [98] showed stepwise increases in mortality over 5 years with increasing CCI score. In 288 incident dialysis patients, the relative risk of death over 1 year increased by 1.54 for each increase of 1 in the CCI [99]. Among 18,179 adults who had been hospitalized in the ICU, Luben et al. [100] showed that the increased comorbidity predicted mortality at 5 and 10 years.
Stepwise increases also occur with in-hospital mortality. One study that involved more than 400,000 patients evaluated CCI in six separate yearly intervals and showed consistent gradients with increased in-hospital mortality as comorbidity increased (both ICD-9 and ICD-10): from 0.3% with 0; to 3% with 1; 5–6% with 2; 9–11% with 3; 13–14% with 4; 15–16% with 5, and 21–24% with ≥6 [19]. D’Hoore et al. [38] showed stepwise increases in mortality in 33,940 adults with ischemic heart disease from 3.1% inpatient death with 0; 10.1% with 1–2; 19.3% with 3–4; 32.6% with 5–6, and 37.1% with ≥6. There are similar findings in surgical populations. In 6,188 patients with radical cystectomy, postoperative mortality increased stepwise from 1.7% with 0; 3.0% with 1; 4.2% with 2; 4.3% with 3, and 12.1% with CCI ≥4 [101]. Early mortality also increased stepwise with increasing CCI in 1,062 patients with implantable defibrillators [102]. In addition, summary comorbidity measures work best; it has been shown that a focus on weighted measures of comorbidity, and not individual conditions separately evaluated, provides better prediction than individual conditions [103].
Incremental Validity
Sechrest [104] originally introduced the clinimetric concept of incremental validity, which refers to the unique contribution or incremental increase in predictive power associated with the inclusion of a particular assessment instrument in the clinical decision process [7, 8, 11, 104, 105]. An extremely large number of studies have shown that adding the CCI to the standard clinical assessment significantly increased the predictive value for many different populations and many different clinical outcomes. In 52,187 patients presenting to emergency departments for suspected infection, the CCI in addition to Sequential Organ System Failure Assessment (SOFA) significantly improved the prediction of in-hospital mortality [106]. Similarly, in 6,336 patients who had major surgery, the CCI added significantly to perioperative SOFA scores in predicting 30-day mortality [107]. In 1,202 patients with acute coronary syndrome, the CCI improved the ability of the Global Registry of Acute Coronary Events (GRACE) tool to predict in-hospital mortality [108]; in addition, findings demonstrated that the CCI contributed even more to prediction of mortality after discharge [108]. In a cohort of 201 critically ill patients, comorbidity improved the Acute Physiology and Chronic Health Evaluation (APACHE) prediction of in-hospital mortality [109], while in a larger study with 14,013 ICU patients, the CCI only slightly improved the APACHE predictions [110]. In 469 ICU patients, Christensen et al. [111] found that the CCI along with demographic and key clinical variables predicted in-hospital mortality resulting in similar predictions as physiologic variables. In 959 patients admitted to the ICU in Norway, the CCI significantly improved the ability of the Simplified Acute Physiology Score to predict both 30-day and 1-year mortality [68]. In 675 frail patients with inflammatory bowel disease both the CCI and frailty predicted 11-year mortality [112].
Predictive Validity
The clinimetric property of predictive validity refers to the ability of a rating scale or index to predict clinical outcomes [9]. As Wright and Feinstein [9] noted, this is an important clinimetric property as rating scales and indices “may be used to predict a future outcome or stratify patients into distinctively different prognostic groups.” Since the CCI [14] was designed to predict mortality (not disability, major morbidity, quality of life, health care costs, and hospitalization), this analysis focuses first on the CCI’s prediction of long-term mortality for which it was designed and then on in-hospital mortality for which it has often been used.
Long-Term Mortality
We will discuss the contribution of CCI in predicting long-term mortality among hospitalized patients, elderly patients, trauma, surgery and emergency patients, and cancer patients.
Hospitalized Patients
In 6,602,641 hospitalized adults in France, the CCI predicted 1-year mortality [113]. In 77,440 patients hospitalized in Finland, the CCI predicted mortality over the following 13 years [114]. Among 2,740 older women hospitalized with cardiovascular disease, the CCI predicted 15-year mortality [115]. In 712 patients who had an acute MI, the CCI predicted 18-month mortality [116], while in 29,620 patients hospitalized in Switzerland for acute coronary syndrome, the CCI predicted 1-year mortality as well as in-hospital mortality [117]. Among 32,916 German diabetic patients with incident cardiovascular events, the CCI predicted 1-year mortality [118]. In 533 patients with diabetic nephropathy, the CCI predicted increased 3-year mortality [98].
In 811 patients admitted with CHF, the CCI predicted 1-year mortality [119]; similar findings were reported in another study [120] where the CCI predicted 1-year mortality in 897 CHF patients. For 1,808 older patients after hospitalization for acute heart failure, the CCI predicted 1-year survival [72]. In 823 patients with CHF who received implantable defibrillators, the CCI also predicted 5-year mortality [121]. In 3,120 patients with infective endocarditis, CCI predicted 1-year mortality [122].
Among 4,204 patients hospitalized for their first ever-chronic obstructive pulmonary disease (COPD) exacerbation, the CCI predicted 1-year mortality as well as in-hospital mortality [123]. In 1,023 patients with pulmonary embolism, the CCI predicted 3-year mortality and in-hospital mortality [124]. The CCI also predicted 90-day mortality in 41,700 patients with venous thromboembolism [125]. In 2,131 patients with tuberculosis, mortality during treatment was predicted by CCI [126].
In 6,988 patients with ischemic stroke, the CCI predicted 1-year and in-hospital mortality [127], 1-year and 30-day mortality in another 3,605 patients after acute stroke [128], and 1-year mortality among 1,031 patients with stroke admitted in Denmark [129]. Among 950 patients with ischemic stroke, the CCI predicted increased risk of death significantly over 9 years [130].
In 779 patients with rheumatoid arthritis, the CCI predicted 3-year survival [131]. Among 6,591 patients with rheumatoid arthritis (and 6,591 controls) evaluated over 10 years, the CCI was a significant predictor of all-cause mortality [132]. In 669 patients with systemic lupus followed for 13 years, the CCI was a predictor of long-term survival [133].
In 7,391 incident dialysis patients, the CCI predicted 1-year survival [134]. In 456 incident dialysis patients, the CCI predicted mortality over follow-up [135], and in 10,759 elderly patients with incident hemodialysis, the CCI was a greater predictor of 10-year survival [136]. In 893 hemodialysis patients, the CCI predicted 6-year mortality which increased as CCI increased [137]. In 2,086 patients on the kidney transplant waiting list, the CCI predicted mortality prior to transplant [138]. In 6,324 kidney transplant recipients, Jassal et al. [139] compared several scales and found that the CCI had the best prediction of 5-year survival, compared to indices specifically designed to predict risk in the end-stage renal disease population. The CCI predicted 1.5-year mortality in 388 patients with chronic kidney disease [140], but several studies suggested that no measure of comorbidity was superior in predicting 1-year survival [141, 142].
Elderly Patients
In 487,197 older hospitalized adults from New South Wales, the CCI predicted 30-day mortality, and adding frailty did not improve the predictions [143]. A study in 1,313 hospitalized older adults found that the CCI predicted 1- and 5-year mortality [91]. In 628 older patients who underwent percutaneous coronary interventions, CCI and frailty together improved prediction of 3-year survival [144]. In 93,295 older patients discharged from the emergency department in Denmark, 30-day mortality increased with increasing CCI [145]. Among 4,849 older adults who had trauma, CCI predicted 30-day mortality [146]. In 2,624 elderly nursing home patients, CCI predicted 6-month mortality [147]. In 50,993 patients with dementia, the CCI predicted 1- and 3-year survival [148]. In 1,001 elderly patients dependent on home care, the CCI predicted 1-year survival [149]. In 8,425 geriatric patients with gastrointestinal cancer, the CCI was the most significant predictor of 3-month survival [150].
ICU Patients
Among 280 ICU patients who survived admission for acute respiratory failure, CCI predicted 1-year mortality [151]. In 959 patients admitted to a general ICU in Norway, the CCI predicted 1-year and 30-day mortality [68]. In 1,049 ICU patients, both the CCI and the APACHE (designed for ICU risk prediction) predicted 6-month mortality [152]. In 1,608 adults admitted to ICU, APACHE predicted mortality but the CCI together with age, sex, and the use of mechanical ventilation had similar predictions as APACHE for 1-year mortality as well as in-hospital mortality [153].
Trauma, Surgery and Emergency Department Patients
In 129,786 trauma patients, the CCI was the most significant predictor of 30-day mortality [154]. In 3,080 adults who had major trauma, the CCI predicted 1-year mortality [155]. Among 5,621 patients who had myocardial injury after non-cardiac surgery (i.e., increased troponin), those with higher CCI had increased 30-day mortality [156]. In 2,484 patients who had COPD exacerbation who were seen in the emergency department, the CCI predicted 1-year mortality [157].
After emergency abdominal surgery in 390 older adults, the CCI predicted 1-year survival [158]. In another group of 227 elderly surgery patients, increased CCI predicted increased 1-year and 30-day mortality [159]. In 14,522 patients who had transurethral prostate resection (TURP) for benign hyperplasia, CCI predicted 30-day mortality [160], and in 302 patients with TURP the CCI predicted 5-year mortality [161]. The CCI predicted mortality after percutaneous nephrolithotomy in 1,406 patients [162].
In 390 hip fracture patients, the CCI predicted 90-day mortality [163]; in 346 patients with hip fracture, CCI predicted 1-year mortality [164], and in 354 hip fracture patients, CCI predicted 2-year mortality [165]. In 485 older patients with hip fracture, the CCI predicted 30-day mortality [166]. One-year mortality after hip fracture was evaluated in 44,000 patients comparing the 259-item Clinical Classification software from AHRQ and the CCI, and the c value was 0.71 for Charlson and 0.76 for CCS [167]. In 42,354 patients with total hip replacement after femoral neck fracture, one study found that CCI had only modest predictive power (c = 0.68) for 2- and 5-year mortality [168]. Among 276,594 patients after total knee replacement or total hip replacement drawn from the Hospital Episodes database in the UK, the study found that the database underreported comorbidity in contrast to primary care records [169]; the authors also found that neither the CCI nor the Elixhauser predicted short-term or 1-year mortality [169]. In 3,480 patients with shoulder arthroplasty, the CCI predicted 90-day mortality [144].
Cancer Patients
In 8,445 patients with breast cancer, lung cancer, or colorectal cancer, the CCI was found to be the strongest predictor of 5-year survival [170]. In 250,985 patients with small bowel adenocarcinoma and colon cancer, the CCI predicted 5-year survival [171]. The CCI predicted 1-year survival in 6,964 patients with colorectal cancer surgery [172], 1-year survival of 1,945 patients with resected colon cancer [173], and 1-year survival of 743 patients with re-operation after complications of colorectal surgery [174]. In 11,524 patients with colon cancer, the CCI predicted increased 5-year mortality [175], as well as 5-year mortality in 308 treated colon cancer patients [176]. In 2,204 patients after colorectal surgery, the CCI predicted 30-day survival [177]. In 1,665 patients with colon cancer and liver cirrhosis, the CCI predicted 5-year survival [178]. In 8,597 patients with anastomotic leak after colon cancer resection, the CCI predicted increased 30-day mortality [179]. In 8,490 patients with pancreatic cancer after surgery, the CCI predicted 90-day mortality [180]. In 1,476 patients with gastric cancer and radical gastrectomy, the CCI predicted 5-year survival [22].
In 14,052 patients with radical prostatectomy followed for more than 7 years, CCI was a significant predictor of mortality [181]. Specifically, the CCI predicted 5-, 10-, and 15-year mortality [181]. In 542 patients after radical prostatectomy, CCI was the most powerful predictor of 6-year survival [182]. In 1,527 men with prostate cancer, the CCI was the most significant predictor of 10-year mortality [183]. In 345 men with a new diagnosis of prostate cancer, the CCI and ICED had similar predictive accuracy for 6-year survival [79]. In 451 men with localized prostate cancer followed over 20 years, the CCI and the Kaplan-Feinstein were both significant predictors of survival [80]. In 2,425 patients with localized prostate cancer, the CCI predicted survival over an 8-year median follow-up [184]. In 1,598 men, CCI was a significant predictor of 10-year mortality from causes other than prostate cancer [185]. In 5,207 with bladder cancer after cystectomy, CCI predicted 90-day survival [186]. The CCI also predicted 4-year survival after clear cell renal carcinoma [187]. The CCI also predicted 30-day mortality in 5,768 adults who had radical nephrectomy after a diagnosis of renal cell carcinoma [188]. Among 891 patients with bladder cancer, the CCI predicted 5-year mortality [189]. In 1,337 patients who had radical cystectomy, the CCI predicted 5-year survival [190].
In 77,971 breast cancer patients, CCI predicted 1-, 3-, 5-, and 10-year survival [191]. In 1,196 black and white women after breast cancer treatment, the CCI predicted 10-year survival [95]. In 9,208 patients with radical mastectomy, the CCI predicted 30-day mortality [192].
In 9,579 lung cancer patients, the CCI predicted lung cancer-specific mortality [193]. In several studies, comorbidity predicted 5-year survival in lung cancer [194]. In a total of 9,369 patients with lung cancer treated over 12 years in Denmark, the CCI predicted both 1- and 5-year survival [195]. Another study of 4,500 surgically resected lung cancer patients showed that 3-year survival was predicted by the CCI, but not by the Elixhauser Index [27]. In 233,981 patients with lung cancer from 9 different geographic areas, findings indicated that CCI predicted 1-year, but not 10-year mortality [88]. Comorbidity predicted 5-year survival in 433 patients with small cell lung cancer [196], and in 426 patients with surgically resected non-small cell lung cancer [197]. A study of 22,073 adults with non-small cell lung cancer showed that the CCI predicted 5-year survival [198]. In another study, 617 patients with small cell and non-small cell lung cancer, the CCI did not predict 3-year mortality [199].
In 50,668 patients with acute myeloid leukemia, the CCI predicted 30-day survival [200]. In 542 patients with multiple myeloma treated with a novel agent, the CCI predicted 1-year mortality [201]. In 2,117 older patients with Hodgkin’s lymphoma, the CCI predicted 1-year survival [202]. In 548 patients with laryngeal cancer, the CCI predicted 5-year survival [203].
In-Hospital Mortality
A variety of systems and severity measures to predict in-hospital mortality are discussed in more detail in Appendix 1. The CCI was not specifically designed to predict in-hospital mortality, even though it has often been used for this purpose. In 1,501,811 patients with CHF and COPD, the CCI predicted increased in-hospital mortality [204]. In 195,527 patients with atrial arrhythmias, the CCI was an independent predictor of in-hospital mortality [205]. The CCI predicted both in-hospital and 1-year survival in 29,620 patients admitted with acute coronary syndrome to any 1 of 69 Swiss hospitals over a 10-year period [117]. In 529 older cardiac patients, CCI predicted in-hospital death [206]. The CCI predicted in-hospital death after ischemic stroke in elderly patients [207]. In 606 patients with COPD, CCI predicted increased in-hospital mortality [208]. In 535 patients with community-acquired pneumonia, the CCI predicted increased in-hospital mortality [209], as well as in 488 older patients with community acquired pneumonia [210]. In 154,378 patients admitted with hyponatremia, the CCI was the most important predictor of in-hospital death [211]. The CCI also predicted in hospital mortality in 3,839 systemic lupus erythematosus (SLE) patients [212], and the age-CCI predicted in-hospital mortality in 847 SLE patients [27]. In 786 patients hospitalized for acute kidney injury, the CCI predicted in-hospital death [213]. In 356,425 patients after non-cardiac surgery, the age-CCI was the best predictor for inpatient mortality [214]. In 5,621 COVID patients [156], and in another study of 2,431 COVID patients [215], the CCI was found to predict in-hospital mortality.
In 5,731 CABG patients older than 80 years, the CCI predicted in-hospital mortality [216]. In 2,837 elderly trauma patients, the CCI predicted in-hospital mortality [217]. In 2,197 cirrhotic patients who had major surgery, the CCI predicted in-hospital mortality [218]. In 6,137,965 patients with hip fracture, in-hospital mortality increased with both CCI and age-CCI (ROC = 0.767) [219]. In 315,464 patients who had surgical resection for digestive cancer, the CCI predicted in-hospital mortality [220].
In 450,414 patients with pancreatic cancer, the CCI predicted in-hospital death [221]. The CCI predicted in-hospital mortality in 279 patients after surgery for colorectal carcinoma [21]. In 531 postoperative patients with oral cavity cancer, the CCI predicted in-hospital mortality [222]. In 8,080 dementia patients after hip fracture surgery, the CCI predicted in-hospital mortality especially in the oldest [223].
The predictive validity of the CCI is often compared to systems such as MEDIS groups [224], the Disease Staging system [225], APACHE [226], and the Trauma Injury and Severity Score [227] that were developed to predict in-hospital mortality. Elixhauser created a measure of comorbidity to predict hospital-related events including in-hospital mortality, length of stay, and hospital charge from administrative data from 1,779,167 patients hospitalized in California in 1992 [228]. Elixhauser included 30 conditions including acute problems such as blood loss and fluid and electrolyte disorders; these conditions were all equally weighted and models with none of the conditions were compared to those with all conditions [228]; predictions increased modestly with an increase in R2 from 0.06 to 0.13 for length of stay, and from 0.18 to 0.26 for total charges, but with no significant impact on predicting in-hospital mortality [228].
In 25,503 trauma patients, the CCI predicted in-hospital mortality [229]. In another study with 2,819 trauma patients, CCI did predict in-hospital death, but did not add significantly to trauma and injury severity score (TRISS) estimates [227]. In shoulder arthroplasty patients who had a very low (0.1%) risk of death, the Elixhauser was slightly better than the CCI in predicting the risk of death [230].
Quan et al. [43] compared ICD-9 and ICD-10 codes for the CCI and Elixhauser in predicting in-hospital mortality in hospitalized patients in Calgary over a 2-year period and for conditions present at admission; they showed slightly higher c statistics for Elixhauser than CCI for ICD-10 codes (Elixhauser c = 0.854 vs. CCI c = 0.845) [43]. Stukenborg et al. [231] evaluated 211,547 hospitalized patients comparing the Deyo CCI and the Elixhauser Index and found that the latter better predicted in-hospital death than the Deyo CCI. Southern et al. [232] compared the individual conditions in the CCI (without weights) to the individual conditions in the Elixhauser Index to predict in-hospital mortality in 4,833 patients hospitalized in Calgary for acute MI, finding that fluid and electrolyte disorders and other neurological disorders had an odds of 4 or more in predicting mortality, which is not surprising since they reflect inpatient issues; Elixhauser had a higher c value (c = 0.793; 95% CI 0.768–0.815) than the Deyo CCI (c = 0.704; 95% CI 686–732) [232]. Similar findings were reported in another study: the Elixhauser Index was a better predictor for in-hospital mortality than the CCI in 7,201,900 patients with acute coronary syndrome (Elixhauser c = 0.837 vs. CCI c = 0.822). However, over longer intervals the CCI and Elixhauser equally performed in predicting 30-day mortality among ICU patients (c = 0.65 for both) [233].
Re-Weighted “Charlson Comorbidity Indices”
Some investigators, such as the SEER program by the National Cancer Institute (NCI) Medicare team, used all the CCI codes and weights related to non-cancer comorbidities, thereby preserving the integrity of the CCI and its comparability across applications [234]. Other investigators have modified the original structure of the CCI, keeping some conditions, dropping some and adding other conditions, and re-weighting some or all conditions for their specific study, population, and outcome, often confusingly referring to it as the “Charlson Comorbidity Index.” Tables 2 and 3 show the specific conditions and weights of these re-weighted scales, which were selected because they received more than 200 citations; other less frequently cited re-weighted scales are described in Appendix 2.
Quan et al. [235] evaluated 1-year mortality in more than 55,000 patients admitted to hospital, comparing a re-weighted scale named the “Updated Charlson Comorbidity Index” and the CCI, and found a c statistic virtually identical for 1-year mortality (Quan c = 0.896 vs. Charlson c = 0.894). The Quan “Updated Charlson Comorbidity Index” dropped all cardiovascular disease, except CHF, from the model [235]. Quan et al. [235] also compared the “Quan Updated Charlson” and the CCI in predicting in-hospital mortality in 6,847,599 subjects in 6 countries, again showing extremely small differences in the c statistic, half favoring the CCI: Switzerland (Quan c = 0.869; CCI c = 0.876); France (Quan c = 0.878; CCI c = 0.882); New Zealand (Quan c = 0.831; CCI c = 0.836); Japan (Quan c = 0.727; CCI c = 0.723); Canada (Quan c = 0.828; CCI c = 0.825), and Australia (Quan c = 0.825; CCI c = 0.808).
Klabunde et al. [236] evaluated 14,943 patients with breast cancer and 28,868 with prostate cancer through Medicare claims data to develop two new “Charlson” indices (one for breast cancer and one for prostate cancer patients) in order to predict 2-year non-cancer mortality using completely different weightings for each type of cancer, as well as hazard ratios calculated separately for inpatient and physician claims – all validated in a split half sample. There was no comparison to the CCI [236].
Ghali et al. [34] created a new comorbidity index with 7 conditions from the CCI (mainly cardiovascular) to predict in-hospital deaths in 6,326 patients who underwent CABG surgery and validated it in 6,791 patients in 1993. They found that their re-weighted index had only a slightly better performance than the CCI (CCI c = 0.70 and Ghali c = 0.74). Chaudhury et al. [58] developed study-specific weights for 1-year mortality in 7,761 patients admitted to a medical service over 4 years and included age, gender, race, and a diagnosis-related group (DRG) from admission, and the analysis resulted in the incorporation of only 4 of the 17 conditions cited by Deyo. The CCI and the study-specific index had identical prediction for 1-year mortality (c = 0.73 for both). No validation study was done [58].
Schneeweiss et al. [40] evaluated several ICD-9 based versions of the CCI (Deyo, Romano, D’Hoore, Ghali) as predictors of 1-year mortality in 141,161 adults aged over 65 years who had received angiotensin-converting enzyme inhibitors or calcium channel blockers in British Columbia and found that the Deyo and Romano versions worked similarly (c = 0.768 and c = 0.771), but the Ghali and D’Hoore versions performed less well. The authors pointed out that the Ghali Index performed better in the original CABG study, but Ghali tailored the weights for the CABG populations, leading to loss of predictive utility when applied to different populations [40]. Schneeweiss et al. [237] then evaluated 235,881 Medicare enrollees who had prescription coverage and evaluated the Romano ICD-9-based versions of the CCI versus their new version with elderly-specific weight that re-weighted 10 of the conditions in the index as a predictor of 1-year mortality. They also compared the Chronic Disease Score [238], the Romano CCI, and their new version in a validation population of 230,913 Medicare patients with pharmacy plans. They found that the Romano version outperformed the Chronic Disease Score and that the Romano CCI version (c = 0.757; 95% CI 0.754–0.761) and their new elderly-specific version (c = 0.765; 95% CI 0.762 - 0.769) were similar in predicting 1-year mortality. They also added the number of distinct prescription drugs given in the baseline year as a predictor and showed minimal improvement in prediction [237]. Other examples of re-weighted CCI evaluated on smaller numbers of patients are described in Appendix 2 and Table 3.
A variety of systems, which use a wide range of predictors including chronic conditions to predict outcomes (some of which are called “comorbidity” systems), are discussed in detail in Appendix 3. Appendix 4 describes systems that are primarily designed to predict functional outcomes but are also sometimes referred to as comorbidity systems.
Discussion
This paper has focused on the clinimetric properties of the original version of the CCI [14], which was designed specifically to predict long-term mortality. The inter-rater reliability of the CCI was found to be excellent, with extremely high agreement between self-report and medical charts. There is some underreporting of comorbidity on discharge summaries and claims data, which tend to focus on the conditions leading to hospitalization. The more years of data (i.e., the longer the look-back interval) and the more sources of data (i.e., inpatient and outpatient), the more robust the CCI predictive value.
The CCI has been shown in several studies to either correlate with other indices, such as the ICED, Kaplan-Feinstein, the DxCG, the American Society of Anesthesiologists (ASA) classification and Elixhauser, or to result in concordant predictions. Clinimetric sensitivity has been demonstrated repeatedly – with a stepwise increase in the CCI, there are stepwise increases in mortality. Importantly, patients who have increased CCI over time have increased mortality rates: the larger the increase in CCI, the larger the increase in mortality. This has been found in adults hospitalized for many reasons, homeless adults, cancer patients, patients discharged from the emergency department, diabetes patients, dialysis patients, and ICU patients after discharge. Similar increases occur with in-hospital mortality: the higher the CCI, the higher the mortality. The incremental validity of the CCI has been shown in different populations; adding the CCI to other measures (i.e., SOFA, GRACE, APACHE) increases the overall predictive accuracy.
From a prognostic viewpoint, findings indicate that the use of the Charlson Comorbidity indices, particularly of the original [239] and revised versions covering the ICD-9 codes [16] and ICD-10 [19], all were accurate predictors of long-term mortality. The predictive validity of the CCI with regard to long-term mortality has been documented in thousands of studies involving millions of patients: hospitalized patients; elderly patients; trauma, surgery, and emergency patients; all types of medical patients, including cancer patients. The CCI also predicts in-hospital mortality in diverse populations such as CHF, arrhythmias, acute coronary syndrome, stroke, COPD, SLE, COVID, cancer, and dementia, as well as patients after CABG, non-cardiac surgery, and hip fractures. However, systems designed to predict mortality in specific circumstances, such as trauma mortality where the TRISS system is focused and the ICU mortality on which APACHE is focused, usually work better for those specific outcomes, but in some instances the CCI adds to the prediction.
From a diagnostic point of view, the evaluation of pre-therapeutic comorbidity is a crucial aspect considering that, as emphasized by Feinstein [1], “a co-morbid ailment may produce manifestations that simulate those of the index disease, so that the exact pre-therapeutic state of the index disease may be difficult to identify.” The presence of comorbidity may delay the correct diagnosis and influence treatment decisions [240]. The CCI [14] and one ICD-10 version [19], in view of their clinimetric sensitivity, appear to be clinically useful not only to provide a valid assessment of the patient’s unique clinical situation, but also to demarcate major diagnostic and prognostic differences among subgroups of patients who otherwise seem deceptively similar, because they have the same diagnosis [11].
As Feinstein [1, p. 455] noted: “To compare different modes of therapy, clinicians usually assemble groups of patients in whom the results of the treatments are then observed. For the comparison to be scientifically valid, the groups of patients must be initially comparable – they must have enough resemblance, before treatment, for their outcomes to be similar if treatment was not given. Without this pre-therapeutic similarity in the groups of patients, the different treatments cannot be properly evaluated.” The evaluation of pre-therapeutic comorbidity is therefore a crucial methodological aspect, which applies not only to research studies (i.e., randomized controlled trials) but also to daily clinical practice to capture the specific comorbid disease combinations affecting the “individual” patient [241].
This review did not focus on “multimorbidity,” which has been defined in many different ways, including risk factors for medical conditions [242-245]. Some reviews define the CCI as a measure of multimorbidity [245, 246]. One definition is two or more illnesses without identifying an index disease [240]. In the majority of studies on multimorbidity, the criteria for the selection of comorbid conditions were not provided, with substantial variability across studies [244]. Further, this review did not focus on other outcomes such as treatment complications, readmissions, length of stay, or cost, none of which were part of the intended outcomes for the comorbidity index.
There are a number of different reviews on the CCI, focusing on ICD-9, ICD-10, and re-weighted versions [175, 247]. One focused on the clinimetric properties of the CCI [50] and another compared the CCI and Elixhauser [51]. Yurkovich et al. [248] reviewed indices derived from administrative data finding that diagnosis-based indices were better predictors than medication-based indices of outcomes; the review included two re-weighted CCI scales: Ghali and Quan [248]. Sharabiani et al. [249] reviewed three re-weighted versions of the CCI.
A substantial body of evidence demonstrates that the CCI is a valid and widely used measure to predict the risk of mortality. These clinimetric indices can be applied not only to predict clinical outcomes, but are also highly sensitive screening instruments yielding important diagnostic and prognostic information.
Conclusions
In clinical research and practice, the time has come to focus on clinimetric indices to provide a timely and comprehensive assessment of comorbidity. The findings of this critical review indicate that the CCI is a reliable, highly sensitive, and valid index according to current clinimetric criteria [250]. Adding the Charlson Comorbidity Indices to standard diagnostic criteria may provide an innovative assessment approach that can be applied in a variety of medical settings to enable the early identification of a constellation of symptoms and syndromes in the individual patient, to improve the prognostic estimation of health risks, and also to provide a better prediction of clinical outcomes.
Appendix 1
Systems and Severity Measures Focused on Predicting In-Hospital Mortality
Systems such as MEDIS Groups assessed 260 key clinical findings at least twice during hospitalization to create groups that assessed risk of organ failure in 54,142 inpatients and showed that in-hospital death rose across the final five groups [224].
Gonnella et al. [225] took 420 diagnoses, coded them from discharge abstracts by system, etiology, and severity, and used the stages to predict length of hospital stay in 392,456 hospitalized patients. Naessens et al. [251] revised this Disease Staging System into a new scale with 16 body systems that had an increased risk of complications (rated 2 or more by Gonnella et al. [225]) and showed a relation to mortality after hospitalization.
APACHE, one of the most widely adopted systems for predicting in-hospital death for critically ill ICU patients, was heavily based on 17 acute physiologic values at admission weighted for total scores from 0 to 252 and predicted in-hospital mortality [226, 252].
Poses et al. [253] used only discharge diagnoses to code the Deyo CCI and compared it to the acute physiologic component of APACHE and to the Chronic Health Points of APACHE. They found that the comorbidity and APACHE predicted death in 201 ICU patients, but not the chronic component of APACHE [253].
Iezzoni et al. [254] compared MEDIS Groups, APACHE, Gonnella’s disease standing as predictors of in-hospital death in 11,880 patients from 100 hospitals in the MEDIS Groups database using data from the discharge abstracts (including such conditions as cardiac arrest, which vastly complicates interpretation); however, those which focused on acute instability in the hospital were better predictors of in-hospital mortality [254].
The patient management category (PMC) severity scale, a seven-level ordinal scale, is based on the severity rating of 830 patient management categories developed by consensus, each rated from 1 to 4 for severity [255]; it was shown to predict in-hospital mortality in 2.3 million hospitalized patients in two different regions [255].
The TRISS (Trauma and Injury Severity Score) predicts death after immediate trauma in 2,819 patients and there was no improvement in prediction of trauma-related mortality with the CCI [227].
The comorbidity index for obstetric patients identified 20 conditions that increased obstetric risk including pre-eclampsia, previous caesarian section, gestational diabetes or hypertension, multiple gestations, and found that it performed better than the CCI in predicting maternal mortality and morbidity (Bateman c = 0.617 vs. CCI c = 0.578) [256].
Incalzi et al. [257] evaluated the risk of in-hospital death in 500 geriatric patients and identified 52 conditions which increased risk, both acute (i.e., sepsis) and chronic (i.e., cancer), and showed in a validation cohort of 375 that death was predicted by malnutrition, activities of daily living (ADL), lymphocytopenia, and their own new index.
van Walraven et al. [258] reweighted the Elixhauser Index using 228,565 patients admitted in Ottawa, assigning weights from –7 to 12 to the remaining 21 conditions, to predict in-hospital mortality. van Walraven et al. [258] compared their version of the Elixhauser to the Scheeweiss weighted version of the CCI [237] and in a validation cohort of 117,230 admitted patients compared the Van Walraven (c = 0.763; 95% CI: 0.757–0.769) to the Schneewiess (c = 0.742; 95% CI 0.736–0.748) [258].
Gagne et al. [259] created a “Combined Comorbidity Index” using the 17 conditions from the CCI and the 30 from van Walraven’s version of the Elixhauser Index to develop weights from logistic regression coefficients for 120,679 Medicare subjects enrolled in a pharmacy coverage program and a validation cohort of 123,955 Medicare subjects enrolled in New Jersey. They selected a total of 20 conditions, finding that the c statistic for the Romano CCI for 1-year mortality was c = 0.778 (95% CI 0.776–0.780); van Walraven/Elixhauser (c = 0.772, 95% CI 0.770–0.775) and their new Combined Comorbidity Index (c = 0.788, 95% CI 0.786–0.781) [259]. There was no validation study.
Sharma et al. [260] analyzed the Quan (“updated Charlson”) Index with 12 conditions compared to the van Walraven-Elixhauser Index versus a newly defined Swiss mode as predictors of in-hospital mortality in 6.09 million adults admitted to Swiss hospitals over a 6-year period, showing that the newly derived model performed best. There are no validation data.
Overall there are differences in systems that use only discharge data versus those that capture other data [261].
Appendix 2
Other Examples of Re-Weighted CCI
Armitage et al. [262] developed a Royal College of Surgeons Charlson Score by translating the CCI into ICD-10 using 3-digit codes, dropping 2 conditions, and counting up the number of 14 other conditions; the result was an ICD-10 translation that contained far fewer ICD-10 codes than other adaptions to predict in-hospital and 1-year mortality in four separate surgical procedures. It was not compared to any other measures, nor was it validated in a separate population [262].
Bottle and Aylin [263] re-weighted the ICD-10 CCI for 5.4 million patients in the UK, sourcing the index by calculating the regression coefficient for each of the conditions and dividing it by the smallest regression coefficient in the model, creating four new weighted indices using the Charlson conditions, assigning four separate weights for all conditions (i.e., severe liver disease was weighted 11 in one model; 9 in another; 27 in another, and 5 in another) and found a small difference in mortality rate (CCI c = 0.719 and Bottle c = 0.726) for predicting in-hospital mortality [263]. These findings were not validated in a separate population.
Bravo et al. [264] created a new weighted comorbidity index to predict mortality after 3 years among 291 long-term care patients, by re-weighting 5 conditions and adding 2 more conditions, and showed the diseases specific predictive accuracy (c = 0.86) versus the CCI (c = 0.79). This long-term care mortality index was not validated in a separate population.
Desai et al. [265] developed another scale focused on conditions identified as high risk for elderly patients in 524 patients. Their resultant High Risk in Elderly Scale was validated in a population of 852 and had a c statistic for 1-year mortality of 0.69 versus 0.65 for the Deyo/Charlson [265]. The Cox model showed identical relative risks (1.9, 95% CI 1.5–2.1) [265]. There is no separate validation.
Kusumastuti et al. [266] focused on 36,751 community-dwelling elderly using 7 re-weighted conditions in the CCI, predicting 1- and 3-year mortality in relation to frailty and frailty phenotype, and showed limited added value of their new estimate of comorbidity in predicting 1- and 3-year mortality. These findings were not evaluated in a separate population.
Reid et al. [267] created a disease-specific comorbidity index in 9,386 head and neck cancer patients (with 4 conditions from the CCI and conditions which were complications, like pneumonia, urinary tract infection, and electrolyte imbalance) in predicting 5-year mortality and found that the CCI and their new head and neck index had almost identical relative risks for survival: 1.5 and 1.53, respectively. The disease-specific index was not validated in a separate population.
Volk et al. [268] developed a “modified Charlson” index to predict 4-year mortality in 624 patients after liver transplant using 9 re-weighted conditions and showed increased mortality with one or more conditions from the “modified” index; this was not validated in a separate population.
Martins et al. [269] developed a new study-specific index to predict in-hospital death among 54,680 patients admitted over a 2-year period for respiratory illnesses, with 8 of the original 19 conditions re-weighted and added to another 13 conditions including symptoms to predict in-hospital death. It was validated in a separate population of 14,622 patients with respiratory illnesses (CCI original; c = 0.721; 95% CI 0.701–0.740) versus Martins (c = 0.755, 95% CI 0.730–0.774) [269].
Baldwin et al. [270] tested a colon cancer-specific Klabunde Index (no published weights) versus the Elixhauser to predict 2-year non-cancer mortality in colon cancer patients and found that neither measure was a better predictor of short-term mortality; there was no validation in a separate population.
Toson et al. [271] conducted a study of hip fracture patients that compared the CI [19] vs. Quan Charlson [235] and showed puzzling differences in the rates of dementia (4 times higher with Quan): it was found that the CCI outperformed Quan for in-hospital mortality (c = 0.734 vs. 0.720) and Quan did slightly better for 1-year mortality (0.071 vs. 0.690) [271]. There is no separate validation.
Appendix 3
Other Systems That Have Been Set Up to Evaluate Comorbidity to Predict Survival
Most are called comorbidity indices, and many do not have validation populations [2].
Cardiac arrest: Hallstrom et al. [272] created a comorbidity assessment to predict survival after out-of-hospital ventricular fibrillation using 10 chronic conditions and 6 recent symptoms that occurred in 282 patients pre-cardiac arrest. There was no validation (111 citations).
Community dwelling elderly: Cornoni-Huntley [273, 274] created a comorbidity assessment from five specific conditions: coronary artery disease, cerebrovascular disease, cancer, hypertension, and diabetes, and the presence of two or more of the conditions as predictors of 6-year mortality in 4,126 community dwelling elderly. There was no validation (9236 citations).
Breast cancer: Satariano et al. [275] totaled the number of seven specific conditions (i.e., “comorbidity” such as diabetes, gallbladder disease, cancer) to estimate the impact on 3-year survival in 963 women with breast cancer. There was no validation (823 citations).
Dialysis/transplant: Khan et al. [276]developed a 3-level scale focused on 6 specific chronic conditions (i.e., diabetes, cancer, COPD) to predict 2-year survival in 375 dialysis or transplant patients. There was no validation (320 citations).
End-stage renal disease: Davies et al. [277] picked seven disease areas (i.e., rheumatic disease, left ventricular dysfunction) in a three-level scale to predict 5-year survival in 303 peritoneal dialysis patients. There was no validation (403 citations).
End-stage renal disease: van Manen et al. [278] showed that the Deyo version was a better predictor of 2-year survival in end-stage renal disease patients than Davies or Khan; van Manen et al. [278] created their own disease-specific index, which was comprised of the β-coefficients for diseases that predicted 2-year survival in their cohort; their index had a c value of 0.75 like Deyo (c = 0.74) [278] (212 citations).
Dialysis: Liu et al. [279] tested their new comorbidity dialysis-specific index in 244,651 patients with four conditions related to renal disease and 11 other conditions all weighted to predict 3-year survival; they found that the performance in the validation study was virtually identical to the CCI (dialysis index c = 0.6698; CCI c = 0.6623) [279] (275 citations).
Adults: Rius et al. [280] selected 16 conditions (i.e., cataract, skin conditions, allergy, diabetes) weighted them on a four-point scale differently for 6,641 men and women over 40 years old in Catalonia to predict 5-year survival in 6,600 adults. There was no validation (53 citations).
Lung cancer: Colinet et al. [281] created a 6 item weighted measure from an initial group of 735 with non-small cell cancer (i.e., tobacco = 7; diabetes = 5; and cancer = 1) and validated it as a predictor of 1-year survival in 136 patients with non-small cell cancer (190 citations).
Hospitalized patients: Sessler et al. [282] created a Risk Stratification Index, which used between 184 and 1,096 of the 16,000 ICD-9 diagnoses and 4,500 ICD-9 procedure codes to predict in-hospital mortality, 1-year mortality, and length of stay. They evaluated 35,179,507 Medicare provider analysis and review (MEDPAR) patient stay records split into a development and a validation data set, and developed tables to calculate the risks for each of the outcomes for specific individuals in studies [282]. The new index had “almost perfect” prediction of in-hospital mortality (c = 0.98) than the CCI (c = 0.65; which is not surprising), since it was calculated from discharge diagnoses. With 1-year mortality, the new index had a slightly better c statistic (c = 0.83) than the CCI (c = 0.77) [282] (105 citations).
Holman et al. [283] developed the Multipurpose Australian Comorbidity Scoring System using 102 conditions from ICD-9 (i.e., gout and cataract) that were associated with an increased risk of 1-year mortality in 1,118,989 patients admitted for medical, procedural, or psychiatric reasons. They then tested it in five smaller groups of patients from the same cohort admitted for either asthma, MI, mastectomy, TURP, or psychiatric reasons. The c statistics for CCI versus their scoring system were quite similar for 1-year mortality in the five groups (e.g., asthma, c = 0.88 CCI and MACSS, c = 0.9) [283] but the multipurpose Australian comorbidity scoring system (MACSS) did better in predicting length of stay and 30-day re-admission rates (130 citations).
Older adults: Newman et al. [284] created a Physiologic Index of Comorbidity from carotid ultrasound, pulmonary function tests, brain magnetic resonance scans, serum cystatin-c and fasting glucose to predict 9-year mortality in 2,928 subjects enrolled in the Cardiovascular Health Study. However, the physiological index predicted mortality only slightly more than age, race, and gender (c = 0.726) than the physiological index (c = 0.735). There was no validation (120 citations).
Veterans: among primary care patients at Veterans Affairs centers, the Seattle Index of Comorbidity was developed to predict 2-year mortality from 24 conditions in 5,469 patients resulting in a scale containing age, six chronic conditions (cancer, CHF, diabetes, stroke, lung disease, and prior MI), one acute condition (pneumonia), and 2 variables for smoking (past and current) [285]. Validated in 5,478 patients, it predicted 2-year mortality with an explanatory power identical to the PCS and MCS in the SF-36 [285] (173 citations).
In 1,741 Australian veterans Byles et al. [286] tested an index of 25 conditions, including hearing problems, cancer, and “fits, faints, and funny turns,” rated by the patients according to severity (from 1 to 7) and then developed independently weighted scales to predict mortality and hospital admission from the derivation data (n = 869). They found that the scales did not predict both mortality and admission in the validation data set (n = 434), with and without adjustment for baseline quality of life (119 citations).
Residents of Ontario, Canada: Austin et al. [287] calculated Adjusted Clinical Groups (ACGs) after all ICD-9 or ICD-10 codes that were clustered into 32 aggregated diagnostics groups (ADG), based on each specific condition’s severity, duration, etiology, need for specialty care and diagnostic certainty. Each person may have between 0 and 32 ADGs. The 32 ADGs are collapsed into 12 Collapsed ADGs, and then into 102 ACGs, which have patients with similar resources consumption. A total of 10,498,413 adults in Ontario were divided into a derivation and validation data set to predict 1-year mortality [287]. The final ADG model had a c statistic for 1 year of 0.917 compared to 0.906 for Charlson in the validation cohort [287] (261 citations).
Diabetes: Austin et al. [288] had similar findings with incident and prevalent diabetes showing that in 1,226,146 patients the ADG had a slightly higher c statistic (i.e., ADG c = 0.838 vs. CCI c = 0.827) in the validation sample in the larger prevalent population (21 citations).
Rheumatic disease: England et al. [289] evaluated an index designed to predict functional outcomes and 1-year mortality specifically developed in 4,765 patients with rheumatic disease consisting of 8 conditions all given a weight of two. There was no validation (139 citations).
Australian women: In 5,217 older Australian women, Tooth et al. [290] identified 7 out of 19 separate conditions, including heart disease, stroke, and low iron, each with their own weights to develop thirteen separate scales to predict mortality, physician visits, specialty visits, hospitalizations, ADL, and each of the 8 individual SF-36 subscales. They found that none of the scales predicted all of the outcomes, but their 7 conditions predicted mortality [290]. There was no validation (106 citations).
Prostate cancer: Fleming et al. [291] developed a new comorbidity index in 2,931 black men with prostate cancer to predict 5-year all-cause mortality using 20 conditions and their two-, three-, and four-way interactions. They selected the model with two-way interactions but had limited clarity and performance in predicting outcomes [291]. There was no validation (31 citations).
Hematopoietic cell transplant: another new comorbidity index was created for 1,055 allogeneic hematopoietic cell transplant patients, divided into a training and validation data set, using laboratory abnormalities, 8 new conditions, and re-weighted and combined 10 of the conditions in the CCI to predict 4-year survival [292, 293]; this index showed a higher c value than the CCI [292, 293]. Others created a comorbidity index for such transplants from ferritin, albumin, and platelet counts [294] (2,241 citations).
Medication-based indices
Von Korff et al. [238, 295] developed a Chronic Disease Scale from 27 different weighted classes of pharmaceuticals and validated it for its ability to predict 1-year mortality in a separate population of 122,911 adults in Group Health (654 citations).
George et al. [296] developed a Medication-Based Disease Burden Index from a consensus panel-weighted list of multiple different medications used to treat 20 chronic diseases enrolled in a randomized clinical trial of community pharmacy service to predict death or readmission which occurred in 24% of 317 patients within 12 weeks. There was not a validation study. Prediction was greater than the Chronic Disease Score, but not as good as the CCI [296] (57 citations).
Appendix 4
Systems Described as Systems to Classify Comorbidity That Focused Primarily on Predicting Functional Outcomes
These systems were developed from assessing body systems (i.e., neurologic, cardiovascular) or the presence of specific diagnoses/conditions to predict functional outcomes, but are sometime referred to as comorbidity measures.
To estimate the degree of physical impairment, Linn et al. [297, 298] developed the Cumulative Illness Rating Scale, which assessed 13 organ systems, weighted 0–4 for severity. However, the Cumulative Illness Rating Scale did not correlate with ADL, Independent Activities of Daily Living (IADL), or European Cooperative Oncology Group physical function in 203 geriatric patients [299], nor with functional disability in 439 geriatric institutionalized residents [300].
The burden of disease, based on the symptoms, complications, treatment complexity, procedures, hospitalizations, and emergency room visits, assessed by reviewers to ascertain the presence and severity of 59 chronic or acute medical conditions in 194 long-stay nursing home patients showed a low but significant correlation with ADL (r = 0.21) to predict independent activities of daily living and no relation with perceived health status [301].
Liu et al. [302] developed a “comorbidity” measure in 106 stroke patients admitted for in-hospital rehabilitation, which rated 130 conditions (including eczema and gallstones) from 0 to 5 based on the need for rehabilitation, and found it predicted functional impairment, as did the CCI. It was later revised in 175 stroke patients to 41 conditions leading to a 6-point scale reflecting limitations in activity and shown to predict functional impairment [303].
Groll et al. [304] developed a Functional Comorbidity Index with 7 of the conditions in the Charlson Index and 11 other conditions to predict 1-year physical function in 73 acute respiratory distress survivors and showed it predicted SF-36 physical function. Another study found that neither the Functional Comorbidity Index nor Charlson were predictors of functional status at the time of admission to acute rehabilitation for 105,441 Medicare patients who had either a stroke, joint replacement, or lower extremity fracture [202]. Another study had a similar finding that CCI did not predict SF-36 PCS scores 1 year after total joint replacement [305].
Tessier et al. [306] evaluated Groll’s Functional Comorbidity Index, the CCI, and their own stroke-specific index to predict functional recovery using a new measure that combined IADL and SF-36 physical function outcomes of stroke in 437 patients after discharge and 235 patients at 3 months, and found that the CCI had a c statistic of 0.763 at discharge and 0.714 after 3 months greater or equal to the other measures.
The ICED focused on ratings of severity of 14 categories of co-existent conditions, and the degree of physical impairment that were combined into a four-tier scale which was a significant predictor of 1-year functional outcomes (i.e., IADL) [307]. The reliability of the index was moderate because agreement on rating disease severity was low (κ = 0.4) because the agreement on the functional severity component was higher than the disease severity component [308]. The ICED predicted the health status in 55 patients with angina [309]. The ICED was used in several studies to predict 1-year mortality in dialysis patients [142] and long-term mortality in prostate cancer patients [80].
The Duke Severity of Illness checklist records each of 803 outpatient diagnoses (from sprains to heart disease) and rates them by symptoms and complications in the last week, treatability, and 6-month prognosis [310]. Severity of illness was related negatively to physical and emotional function [311].
The Geriatric Index of Comorbidity took the 15 conditions in the ICED and weighted them 0–4 for severity, and then re-grouped them into a 4-rank scale that was associated with disability [312]. The Geriatric Index of Comorbidity showed that by 1 year after hospital discharge, of 444 elderly, 31% were institutionalized and 22% had died. The Geriatric scale was divided into classes and compared to quartiles of other measures with p values calculated for each rank-complicating interpretation [313].
Crabtree et al. [314] developed a comorbidity symptom scale in 50 ophthalmologic patients selecting 23 symptoms that might arise from chronic conditions (i.e., cough, urinary problems), each coded 0–5 for severity and tested it in 183 patients aged over 65 years and found a correlation of 0.54 with activities of daily living.
In 157 Health Maintenance Organization (HMO) members, Bayliss et al. [315] assessed 25 conditions identified from the literature, weighted from 1 to 5 according to the patient’s assessment of the extent to which each condition interfered with their daily activities. They showed that the estimates of how all conditions interfered with function correlated with health status and physical function as measured by the SF-36 [315].
Verbrugge et al. [316] evaluated how the total number of chronic conditions, 13 specific conditions, and 88 pairs of conditions impacted physical disability as assessed by ADL and IADL, and found the more conditions the more disability.
Conflict of Interest Statement
M.E.C. reports the following grants: PCORI (IHS-2017C3-8923), NHLBI (T32 HL135465-01A), and NIMHD (5T37MD014220-02) outside of the submitted work, and Cornell University has filed a patent for the use of the enhanced comorbidity index to predict future costs. All other authors have no conflicts of interest to declare.
Funding Sources
There are no funding sources to declare.
Author Contributions
All the authors conceived the work. M.E.C., D.C., J.G., and C.P. searched, screened, and selected studies. All the authors drafted and finalized the paper.