The exponential growth of electronic medical records has catalyzed the expansion of real-world data (RWD), encompassing a wide array of patient health information routinely collected from diverse sources [1]. These data extend beyond traditional electronic medical records to include laboratory results, external hospital records, outpatient and inpatient treatments, billing information, and patient-generated data from mobile devices. Remarkably, much of these data are now collected in real time. Projections indicate that the global datasphere will expand to 175 zettabytes by 2025, with nearly 30% requiring real-time processing [2].
Analyzing this vast amount of data necessitates substantial computational power, currently provided by machine learning (ML) algorithms [3]. In the context of artificial intelligence, ML employs algorithms and statistical models to identify patterns that enhance diagnostic accuracy, predict clinical events, and improve therapeutic efficacy. This approach uncovers insights that may elude conventional statistical methods [4], making the application of ML techniques to predict clinical outcomes, such as mortality, a frontier in medical research.
The study by Shi et al. [5] exemplifies this cutting-edge approach by utilizing the MIMIC-IV critical care database, an open-access database containing detailed, anonymized medical records of patients hospitalized in intensive care units [6]. The researchers applied ML models to predict 28-day all-cause mortality in heart failure (HF) patients with Clostridioides difficile infection (CDI) admitted to intensive care.
CDI is a severe condition with high mortality, particularly when associated with comorbidities such as HF, stroke, and pneumonia [7, 8]. HF patients are especially vulnerable to CDI due to increased risk of nosocomial infections, higher comorbidity prevalence, frequent readmissions, prolonged hospital stays, and higher overall mortality rates [9‒11].
Given that CDIs significant impact on HF patients, there is an urgent need to identify factors predicting infection severity and mortality risk. This is where healthcare big-data analysis generates what is termed real-world evidence (RWE) [12]. And the implementation of ML techniques seems to obtain better results [13].
RWE studies have explored various health aspects, including epidemiology, disease burden, treatment patterns, safety, outcomes, and patient-reported measures like satisfaction and quality of life. For rare conditions where traditional randomized clinical trials may be unfeasible, RWD and RWE play crucial roles [12, 14].
In their study, Shi et al. [5] employed recursive feature elimination with cross-validation to select 18 predictor variables and applied nine different ML algorithms to construct predictive models. This comprehensive analysis identified novel predictor variables, including red blood cell distribution width, blood urea nitrogen, and Simplified Acute Physiology Score II.
The use of RWD, combined with the comparison of different ML algorithms, significantly enhances the robustness of the results. Moreover, techniques like synthetic minority oversampling technique help reduce bias in unbalanced datasets [15].
The rapid growth of this approach in medical research is noteworthy. In the past year alone, over 28,000 studies using ML were published, more than 8,000 employed RWD, and over 3,000 combined both strategies. This surge in RWE-based research reflects its growing recognition as a complement to randomized studies, potentially providing real-time, personalized, population-based medicine at lower costs [16].
However, this strategy is not without limitations. These analyses are retrospective and potentially biased, often spanning several years of data collection and may have limited external validity. Despite these challenges, the combination of large-scale RWD with robust ML-based statistical analysis represents a promising frontier in medical research.
This manuscript demonstrates the potential of the RWE approach to improve clinical outcomes, particularly in complex scenarios such as patients with HF and CDI. As we move forward, it will be crucial to refine these methodologies, validate findings across diverse populations, and integrate insights gained from ML and RWE into clinical practice.
Conflict of Interest Statement
The author has no conflicts of interest to declare.
Funding Sources
The author has not received any financial support.
Author Contributions
Nicolas Vecchio: writing – original draft, review, and editing. Nicolas Vecchio was responsible for drafting the original version of the manuscript and for revising and editing it, incorporating feedback, and making necessary improvements.
Additional Information
Nicolas Vecchio: Member of the Digital Health Council of the Argentine Society of Cardiology.