The exponential growth of electronic medical records has catalyzed the expansion of real-world data (RWD), encompassing a wide array of patient health information routinely collected from diverse sources [1]. These data extend beyond traditional electronic medical records to include laboratory results, external hospital records, outpatient and inpatient treatments, billing information, and patient-generated data from mobile devices. Remarkably, much of these data are now collected in real time. Projections indicate that the global datasphere will expand to 175 zettabytes by 2025, with nearly 30% requiring real-time processing [2].

Analyzing this vast amount of data necessitates substantial computational power, currently provided by machine learning (ML) algorithms [3]. In the context of artificial intelligence, ML employs algorithms and statistical models to identify patterns that enhance diagnostic accuracy, predict clinical events, and improve therapeutic efficacy. This approach uncovers insights that may elude conventional statistical methods [4], making the application of ML techniques to predict clinical outcomes, such as mortality, a frontier in medical research.

The study by Shi et al. [5] exemplifies this cutting-edge approach by utilizing the MIMIC-IV critical care database, an open-access database containing detailed, anonymized medical records of patients hospitalized in intensive care units [6]. The researchers applied ML models to predict 28-day all-cause mortality in heart failure (HF) patients with Clostridioides difficile infection (CDI) admitted to intensive care.

CDI is a severe condition with high mortality, particularly when associated with comorbidities such as HF, stroke, and pneumonia [7, 8]. HF patients are especially vulnerable to CDI due to increased risk of nosocomial infections, higher comorbidity prevalence, frequent readmissions, prolonged hospital stays, and higher overall mortality rates [9‒11].

Given that CDIs significant impact on HF patients, there is an urgent need to identify factors predicting infection severity and mortality risk. This is where healthcare big-data analysis generates what is termed real-world evidence (RWE) [12]. And the implementation of ML techniques seems to obtain better results [13].

RWE studies have explored various health aspects, including epidemiology, disease burden, treatment patterns, safety, outcomes, and patient-reported measures like satisfaction and quality of life. For rare conditions where traditional randomized clinical trials may be unfeasible, RWD and RWE play crucial roles [12, 14].

In their study, Shi et al. [5] employed recursive feature elimination with cross-validation to select 18 predictor variables and applied nine different ML algorithms to construct predictive models. This comprehensive analysis identified novel predictor variables, including red blood cell distribution width, blood urea nitrogen, and Simplified Acute Physiology Score II.

The use of RWD, combined with the comparison of different ML algorithms, significantly enhances the robustness of the results. Moreover, techniques like synthetic minority oversampling technique help reduce bias in unbalanced datasets [15].

The rapid growth of this approach in medical research is noteworthy. In the past year alone, over 28,000 studies using ML were published, more than 8,000 employed RWD, and over 3,000 combined both strategies. This surge in RWE-based research reflects its growing recognition as a complement to randomized studies, potentially providing real-time, personalized, population-based medicine at lower costs [16].

However, this strategy is not without limitations. These analyses are retrospective and potentially biased, often spanning several years of data collection and may have limited external validity. Despite these challenges, the combination of large-scale RWD with robust ML-based statistical analysis represents a promising frontier in medical research.

This manuscript demonstrates the potential of the RWE approach to improve clinical outcomes, particularly in complex scenarios such as patients with HF and CDI. As we move forward, it will be crucial to refine these methodologies, validate findings across diverse populations, and integrate insights gained from ML and RWE into clinical practice.

The author has no conflicts of interest to declare.

The author has not received any financial support.

Nicolas Vecchio: writing – original draft, review, and editing. Nicolas Vecchio was responsible for drafting the original version of the manuscript and for revising and editing it, incorporating feedback, and making necessary improvements.

Additional Information

Nicolas Vecchio: Member of the Digital Health Council of the Argentine Society of Cardiology.

1.
Tran
TTV
,
Tayara
H
,
Chong
KT
.
Recent studies of artificial intelligence on in silico drug distribution prediction
.
Int J Mol Sci
.
2023
;
24
(
3
):
1815
.
2.
Reinsel
D
,
Gantz
J
,
Rydning
R
.
The digitization of the world, from edge to core
. An IDC White Paper-#US44413318.
2018
.
3.
Ahmed
Z
,
Mohamed
K
,
Zeeshan
S
,
Dong
X
.
Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine
.
Database
.
2020
;
2020
:
baaa010
.
4.
Cho
B
,
Geng
E
,
Arvind
V
,
Valliani
AA
,
Tang
JE
,
Schwartz
J
, et al
.
Understanding artificial intelligence and predictive analytics: a clinically focused review of machine learning techniques
.
JBJS Rev
.
2022
;
10
(
3
).
5.
Shi
C
,
Jie
Q
,
Zhang
H
,
Zhang
X
,
Chu
W
,
Chen
C
, et al
.
Prediction of 28-day all-cause mortality in heart failure patients with clostridioides difficile infection using machine learning models: evidence from the MIMIC-IV database
.
Cardiology
.
2024
;
1
:
1
.
6.
Johnson
AEW
,
Bulgarelli
L
,
Shen
L
,
Gayles
A
,
Shammout
A
,
Horng
S
, et al
.
MIMIC-IV, a freely accessible electronic health record dataset
.
Sci Data
.
2023
;
10
(
1
):
1
.
7.
Cózar
A
,
Ramos-Martínez
A
,
Merino
E
,
Martínez-García
C
,
Shaw
E
,
Marrodán
T
, et al
.
High delayed mortality after the first episode of Clostridium difficile infection
.
Anaerobe
.
2019
;
57
:
93
8
.
8.
Czepiel
J
,
Krutova
M
,
Mizrahi
A
,
Khanafer
N
,
Enoch
DA
,
Patyi
M
, et al
.
Mortality following clostridioides difficile infection in Europe: a retrospective multicenter case-control study
.
Antibiotics
.
2021
;
10
(
3
):
299
.
9.
Drobnik
J
,
Pobrotyn
P
,
Belovičová
M
,
Madziarska
K
,
Trocha
M
,
Baran
M
.
Mortality in clostridioides difficile infection among patients hospitalized at the university clinical hospital in Wroclaw, Poland - a 3-year observational study
.
BMC Infect Dis
.
2024
;
24
(
1
):
625
.
10.
Duhan
S
,
Taha
A
,
Keisham
B
,
Badu
I
,
Atti
L
,
Hussein
MH
, et al
.
Outcomes of Clostridioides difficile infection in acute heart failure hospitalizations: insights from the National Inpatient Database
.
J Hosp Infect
.
2024
;
145
:
129
39
.
11.
Mamic
P
,
Heidenreich
PA
,
Hedlin
H
,
Tennakoon
L
,
Staudenmayer
KL
.
Hospitalized patients with heart failure and common bacterial infections: a nationwide analysis of concomitant Clostridium difficile infection rates and in-hospital mortality
.
J Card Fail
.
2016
;
22
(
11
):
891
900
.
12.
Wu
J
,
Wang
C
,
Toh
S
,
Pisa
FE
,
Bauer
L
.
Use of real-world evidence in regulatory decisions for rare diseases in the United States-Current status and future directions
.
Pharmacoepidemiol Drug Saf
.
2020
;
29
(
10
):
1213
8
.
13.
Collins
R
,
Bowman
L
,
Landray
M
,
Peto
R
.
The magic of randomization versus the myth of real-world evidence
.
N Engl J Med
.
2020
;
382
(
7
):
674
8
.
14.
Dang
A
.
Real-world evidence: a primer
.
Pharmaceut Med
.
2023
;
37
(
1
):
25
36
.
15.
Rudrapatna
V
,
Butte
A
.
Opportunities and challenges in using real-world data for health care
.
J Clin Invest
.
2020
;
130
(
2
):
565
74
.
16.
Blagus
R
,
Lusa
L
.
SMOTE for high-dimensional class-imbalanced data
.
BMC Bioinformatics
.
2013
;
14
(
1
):
106
.