Abstract
Background: Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with a significantly increased risk of systemic thromboembolism and stroke. Anticoagulation therapy, particularly with direct oral anticoagulants, has become the standard for stroke prevention but comes at the cost of an increased bleeding risk. With the introduction of effective alternatives to anticoagulation, such as percutaneous left atrial appendage occlusion, bleeding risk stratification has become essential to guide therapeutic decision-making. Conventional statistical methods have been used for bleeding risk stratification scores, such as HEMORR2HAGES, HAS-BLED, and ATRIA. However, these methods may inadequately address the multifactorial nature of bleeding risk in diverse patient populations, and their overall performance has been suboptimal. Summary and Key Messages: Recent advancements in machine learning (ML) offer promising opportunities to enhance bleeding risk prediction and optimize anticoagulation therapy. This review explores ML applications in AF patients receiving anticoagulation therapy, focusing on the development and validation of ML-based bleeding risk scores. These models have demonstrated improved predictive performance compared to traditional tools, leveraging complex datasets to identify nuanced patterns and interactions. Furthermore, ML-driven tools in warfarin management, including dose prediction, optimization of time in the therapeutic range, and the identification of drug-drug interactions, show significant potential to enhance patient safety and treatment efficacy.
Introduction
Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia, with its prevalence rising significantly with age [1, 2]. In the USA, the prevalence of AF is estimated to reach 4% among individuals aged 60–69 years and up to 18% among older adults 80–89 years [2]. AF is strongly associated with an elevated risk of systemic thromboembolism, particularly stroke, which contributes to significant morbidity, mortality, and disability [3, 4]. Anticoagulation therapy has been established as the cornerstone for reducing stroke and mortality risks in these patients, making it a critical component of AF management [5, 6].
The introduction of direct oral anticoagulants (DOACs) has revolutionized anticoagulation therapy, offering superior effectiveness in stroke prevention along with an enhanced safety profile compared to warfarin in patients with AF [7‒11]. However, despite these advancements, concerns about bleeding complications associated with warfarin and DOAC use have led to hesitation in prescribing and usage of this guideline-recommended treatment to many patients [11‒15].
Over the years, several risk calculators, including HEMORR2HAGES (hepatic or renal disease, ethanol abuse, malignancy, older age, reduced platelet count or function, re-bleeding, hypertension, anemia, genetic factors, excessive fall risk, and stroke), HAS-BLED (hypertension, abnormal renal/liver function, stroke, bleeding history or predisposition, labile international normalized ratio (INR), elderly, drugs/alcohol), and ATRIA (anticoagulation and risk factors in atrial fibrillation), have been developed and widely validated to assess bleeding risk [16‒18]. These tools, however, rely on traditional statistical methods, which have limitations in their prognostic performance. They are usually based on previous data and experience and, therefore, incorporate established risk factors and may omit other potentially relevant predictors. Furthermore, traditional methods tend to oversimplify the complex interplay between risk factors. Machine learning (ML), a subset of artificial intelligence (AI), has emerged as a powerful approach for identifying complex patterns and relationships within large datasets [19]. Advances in computational power and the increasing availability of standardized, structured, and digitalized datasets have facilitated the application of ML in healthcare. ML, deep learning, artificial neural networks, and convolutional neural networks are all subclasses of AI [20] that have driven significant advancements in areas such as natural language processing, pattern recognition, and predictive analytics. Some of the commonly used ML methods are the RegCox, Extreme Gradient Boosting, and Random Survival Forest. In the context of AF, ML has already been employed extensively to predict future AF based on electronic health records [19, 21], improve screening and diagnosis using 12-lead electrocardiograms [22‒25], pacemaker, and other implantable devices derived data [26, 27], and, more recently, smartwatch technology [28‒30]. ML has also enhanced stroke risk evaluation, outperforming traditional tools like the CHADS2-VASc score [31‒33]. This narrative review aimed to explore the emerging role of ML in improving bleeding risk assessment for anticoagulated AF patients. Specifically, it focuses on two primary applications: (1) introducing the performance of novel ML-derived risk scores compared to conventional models and (2) optimizing warfarin therapy through ML-driven approaches. While there are many parameters to evaluate the performance of ML-based models, the most used is the area under the curve (AUC). An AUC of 1.0 means that the model perfectly distinguishes between the true positive and true negative, while an AUC of 0.5 means that the model performs no better than a random chance.
Traditional Bleeding Risk Score in Patients with AF
The latest American College of Cardiology and American Heart Association guidelines for bleeding risk stratification in patients with AF recommend using one of the following three well-established scoring systems [13]. The HEMORR2HAGES score [16], introduced in 2006, was developed from a cohort of 3,791 AF patients (1,604 on warfarin) to evaluate bleeding risk factors using a comprehensive classification approach. It assigns 2 points for prior bleeding and 1 point each for hepatic or renal disease, ethanol abuse, malignancy, advanced age (>75 years), reduced platelet count or function, uncontrolled hypertension, anemia, genetic factors, excessive fall risk, and prior stroke.
The HAS-BLED score [18, 34], introduced in 2011, was developed using data from a real-world cohort of 7,329 anticoagulated patients with AF (including 3,665 treated with warfarin only without other oral anticoagulation or antiplatelet agents). This score categorized patients into three bleeding risk levels: low (score of 0), moderate (scores of 1–2), and high (score ≥3). Over the years, HAS-BLED has been extensively validated across various cohorts, confirming its utility in clinical practice.
The ATRIA score [17], also introduced in 2011, was derived from a real-world cohort of 9,186 AF patients treated with warfarin. The model incorporated five weighted variables: anemia (3 points), severe renal disease (3 points), age >75 years (2 points), prior bleeding (1 point), and hypertension (1 point). The score was designed to predict significant hemorrhagic events effectively.
Among these traditional scores, HAS-BLED has demonstrated superior predictive performance for major bleeding events in a few retrospective studies and in meta-analysis [34, 35]. It has been considered the favorite bleeding risk score and, therefore, was the most used as a comparator for developing ML-based scores. In recent years, additional bleeding risk scores tailored for DOAC users, such as the DOAC score, have been developed using traditional statistical methods, showing modestly improved performance [36]. However, the advancement of ML has enabled the creation of novel predictive models with the potential for greater accuracy and personalized risk stratification.
Application of ML Models in Bleeding Risk Prediction
Herrin et al. [37] analyzed a large US national dataset comprising medical and pharmacy claims from privately insured patients and Medicare Advantage enrollees in the USA between 2016 and 2019. These patients had at least one of the following: AF (43.7%), ischemic heart disease (71.2%), or venous thromboembolism (19.3%) and were prescribed with anticoagulation or antiplatelet agents within the last 12 months. The cohort included 306,463 patients (mean age 69 [SD 12.6]), of whom 57% were prescribed oral anticoagulants (OACs), 42% with antiplatelet agents, and 1% were treated with a combination of OAC and antiplatelet agent. Three ML models, RegCox, Extreme Gradient Boosting, and Random Survival Forest, were trained to predict gastrointestinal bleeding (GIB) at 6 and 12 months. Patients with malignancy-related GIB were excluded. The RegCox and Extreme Gradient Boosting models performed similarly in the validation set, with an AUC of 0.67 at 6 months and 0.66 at 12 months. The Random Survival Forest model had lower performance, with AUCs of 0.62 at 6 months and 0.60 at 12 months. All ML-developed models outperformed HAS-BLED, which achieved AUCs of 0.61 and 0.60 for GIB at 6 and 12 months, respectively. The single most influential predictor in the ML models was prior GIB (importance score 0.72). Other key predictors in the ML model included a prior diagnosis of AF, ischemic heart disease and venous thromboembolism combined (0.38), and the use of gastroprotective medications (0.32).
Loring et al. [38] assessed data from two population-based registries: ORBIT-AF (median age 73 years, median follow-up of 520 days) and GARFIELD-AF (median age 71 years, 1 year of follow-up), which included 22,760, and 52,032 patients, respectively. In the ORBIT-AF, 38%, 45%, and 40% of the study population were treated with vitamin K antagonists (VKAs), DOACs, and antiplatelet agents (with or without OAC), respectively. While in the GARFIELD-AF study, 39%, 28%, and 35% were treated with VKAs, DOACs, and antiplatelet agents (with or without OAC), respectively. There were 1,323 (4.7%) major bleeding events in the ORBIT-AF and 349 (0.7%) in the GARFIELD-AF (defined based on the International Society of Thrombosis and Hemostasis criteria). ML models, including random forests, gradient boosting, and neural networks, were compared with stepwise logistic regression for predicting 1-year major bleeding. Across all outcomes, stepwise regression outperformed other more advanced ML models, with AUCs of 0.711 (95% CI: 0.707–0.715) for ORBIT-AF and 0.647 (95% CI: 0.641–0.653) for GARFIELD-AF. Performance comparisons to HAS-BLED were not reported.
Lu et al. [39] evaluated the multi-label gradient boosting decision tree model using 25 key variables to predict all-cause mortality and major bleeding in a prospective global AF registry (GLORIA-AF) of 25,656 patients (mean age 70.3 years [SD 10.3]). Patients from 38 countries were enrolled between 2011 and 2020 and had at least 1 year of follow-up or died during the first year. Of these, 86% were anticoagulated (18.6% with warfarin and the rest with DOACs), 405 (1.6%) had major bleeding (as defined by the International Society on Thrombosis and Hemostasis) within 1 year after AF diagnosis The multi-label gradient boosting decision tree model achieved an optimized AUC of 0.70 (95% CI: 0.65–0.75) for predicting major bleeding, significantly outperforming HAS-BLED (AUC 0.61 [0.559–0.655], p = 0.002). Key predictors identified were age, smoking, and creatinine clearance.
Joddrell et al. [40] also used the data from the GLORIA-AF registry. Their analysis cohort included data of 26,183 patients (mean age 70.13 [SD 10.13]), 67.1% were treated with DOACs, 18.5% with VKA, and 33.3% received aspirin. Overall, during the study period, 873 (3.3%) individuals had major bleeding. The authors trained several ML models – including logistic regression, random forest, linear discriminant analysis, naive Bayes, Extreme Gradient Boosting, and neural networks using 23 demographic and clinical variables. The random forest and linear discriminant analysis models performed best, achieving AUCs of 0.68 (95% CI: 0.62–0.72) and 0.66 (95% CI: 0.62–0.70) for major bleeding risk at 1 and 3 years, respectively, significantly outperforming HAS-BLED (1-year AUC 0.54 [95% CI: 0.52–0.56] and 3-year AUC 0.52 [95% CI: 0.51–0.54]). Age and geographic region were the most critical predictors, with HAS-BLED scores showing a strong skew toward specificity in this dataset.
Bernardini et al. [41] utilized data from the START-2 database, which included 11,078 patients (median age 77 [IQR 71–83]) initiating anticoagulation for non-valvular AF (46.4% on VKAs and the rest on DOACs). During a median follow-up 1.5 years (IQR 1.0–2.6), 240 (2.2%) patients had major bleeding (according to the International Society on Thrombosis and Hemostasis definition). Supervised ML models including stepwise logistic regression, gradient-boosted decision trees, and multitask neural networks were used to predict major bleeding. Gradient-boosted decision trees was the best performing model for patients on DOACs, achieving an AUC of 0.71 compared to HAS-BLED’s 0.59 (p < 0.001). The most significant predictors were hemoglobin, platelet count, BMI, creatinine clearance (all with linear inverse correlation), and age (with linear correlation). For patients on VKAs, the ML models did not significantly outperform HAS-BLED (AUC 0.63 vs. 0.57, p = 0.83).
Table 1 summarizes the results of these studies. Notably, an evolving application of ML tools is the use of natural language processing for data retrieval from electronic health records. This capability enhances data extraction and has improved the performance of traditional risk scores like HAS-BLED [42, 43]. It is reasonable to expect that leveraging ML to refine the accuracy and quality of data used in developing prediction models will lead to improved performance of these models.
Summary of studies that examined the performance of machine learning-based models compared to traditional risk scores for prediction of bleeding
Study (ref) . | Patients, n . | Best performed ML model . | AUC of ML method . | AUC of HAS-BLED . | Major factors influencing the ML models . | ||
---|---|---|---|---|---|---|---|
total population . | DOAC patients . | warfarin patients . | |||||
Herrin et al. [37] | 306,463 | 57% | RegCox | 0.67–0.68 | 0.60–0.61 | Prior diagnosis of AFa | |
Ischemic heart diseasea | |||||||
Venous thromboembolisma | |||||||
Extreme Gradient Boosting | 0.67–0.68 | Use of gastroprotective medicationsa | |||||
Loring et al. [38] | 74,792 | 28–45% | 38–39% | Stepwise logistic regression | 0.647–0.711 | Not reported | Age |
Kidney function | |||||||
Coronary diseasea | |||||||
Anemia | |||||||
Lu et al. [39] | 25,656 | 67.4% | 18.6% | Multi-label gradient boosting decision tree model | 0.70 | 0.61 | Age |
Smokinga | |||||||
Creatinine clearance | |||||||
Joddrell et al. [40] | 26,183 | 67.1% | 18.5% | Random forest | 0.68 | 0.52–0.54 | Age |
Linear discriminant analysis | 0.66 | Geographic regiona | |||||
Bernardini et al. [41] | 11,078 | 53.6% | 46.4% | Gradient-boosted decision trees (for DOACS) | 0.71 | 0.59 | Hemoglobin |
Platelet count | |||||||
BMIa | |||||||
Creatinine clearance | |||||||
Age |
Study (ref) . | Patients, n . | Best performed ML model . | AUC of ML method . | AUC of HAS-BLED . | Major factors influencing the ML models . | ||
---|---|---|---|---|---|---|---|
total population . | DOAC patients . | warfarin patients . | |||||
Herrin et al. [37] | 306,463 | 57% | RegCox | 0.67–0.68 | 0.60–0.61 | Prior diagnosis of AFa | |
Ischemic heart diseasea | |||||||
Venous thromboembolisma | |||||||
Extreme Gradient Boosting | 0.67–0.68 | Use of gastroprotective medicationsa | |||||
Loring et al. [38] | 74,792 | 28–45% | 38–39% | Stepwise logistic regression | 0.647–0.711 | Not reported | Age |
Kidney function | |||||||
Coronary diseasea | |||||||
Anemia | |||||||
Lu et al. [39] | 25,656 | 67.4% | 18.6% | Multi-label gradient boosting decision tree model | 0.70 | 0.61 | Age |
Smokinga | |||||||
Creatinine clearance | |||||||
Joddrell et al. [40] | 26,183 | 67.1% | 18.5% | Random forest | 0.68 | 0.52–0.54 | Age |
Linear discriminant analysis | 0.66 | Geographic regiona | |||||
Bernardini et al. [41] | 11,078 | 53.6% | 46.4% | Gradient-boosted decision trees (for DOACS) | 0.71 | 0.59 | Hemoglobin |
Platelet count | |||||||
BMIa | |||||||
Creatinine clearance | |||||||
Age |
DOACs, direct oral anticoagulants; ML, machine learning; AUC, area under the curve; AF, atrial fibrillation; BMI, body mass index.
aMarks the factors that are not part of the HAS-BLED score.
ML Application for the Management of Warfarin Treatment
Warfarin has remained an important anticoagulation agent for some patients with AF despite the rise of DOACs. Its role is especially important among patients with significant mitral stenosis, patients with anti-phospholipid syndrome, and those with mechanical prosthetic valves, where DOACs were found to be inferior to warfarin [44‒47]. Despite decades of experience with warfarin, challenges in achieving optimal dosing – stemming from its narrow therapeutic index and significant interpatient variability – have fueled interest in leveraging ML to enhance safety and efficacy.
The individualized dosing of warfarin is complex, influenced by many factors such as genetic polymorphisms (e.g., CYP2C9 and VKORC1), clinical characteristics (such as age, weight, and comorbidities), vitamin K content in dietary intake, and drug-drug interactions [48‒53]. These factors contribute to substantial variability in dose requirements and a heightened risk of out-of-range INR values, which can lead to thromboembolism on one end of the spectrum or bleeding on the other. Traditionally, dose adjustments rely on clinical judgment and experience or dosing algorithms, but these approaches often fail to account for a patient’s specific profile, leading to preventable adverse events. Even a genotype-based approach has failed to show better results when used to dictate warfarin dosage [54]. ML offers transformative potential in warfarin management by excelling at analyzing complex interactions of clinical, demographic, and genetic factors, enabling individualized treatment strategies that enhance treatment effectiveness and patient safety.
Prediction of Warfarin Dosage and Warfarin Management
One of the most utilized applications of ML is for the prediction of warfarin dosage, a topic that has been extensively researched over the past decade. Grossi et al. [55] analyzed demographic, clinical, and genetic data from 377 patients treated with warfarin using an artificial neural network algorithm. Their model was trained and tested retrospectively on patients who had achieved stable therapeutic INR values. It achieved high accuracy, with a mean absolute error (MAE) of 5.7 mg/week for the warfarin maintenance dose. In a more advanced comparison of ML models, Liu et al. [56] evaluated nine different algorithms to optimize warfarin dosing using the International Warfarin Pharmacogenetic Consortium, an open database with pooled data on 4,798 patients who had achieved a stable therapeutic dose. The study was retrospective and employed a derivation/validation cohort design, using 80% of patients to develop the algorithms and 20% for testing. Performance was measured by MAE and the percentage of patients whose predicted dose fell within 20% of the actual dose. Their analysis identified three advanced models: Bayesian additive regression trees, multivariate adaptive regression splines (MARS), and Support Vector Regression (SVR) as the top performing methods, achieving similar levels of accuracy (MAE: ∼8.9 mg/week, mean percentage within 20% of ∼46%). Building on these advancements, Ma et al. [57] employed a stacked generalization framework, machine learning technique that combines multiple models to improve predictive performance. Their study also utilized the International Warfarin Pharmacogenetic Consortium dataset and demonstrated significant improvements in prediction accuracy compared to traditional regression methods. Specifically, their model achieved a 15% reduction in MAE of the weekly dose and increased the percentage of patients with doses within 20% of the therapeutic range. Notably, these algorithms were particularly beneficial for patients requiring low warfarin maintenance doses, where even small dose changes could lead to adverse clinical events such as thrombosis or bleeding.
ML models have also been applied to optimize the time in therapeutic range for patients receiving warfarin. In a notable study, Petch et al. [58] developed a deep Q-learning algorithm capable of dynamically adjusting warfarin doses based on INR measurements. This algorithm was trained using data from 22,502 patients in the warfarin-treated arms of pivotal randomized clinical trials, including edoxaban (ENGAGE AF-TIMI 48), apixaban (ARISTOTLE), and rivaroxaban (ROCKET AF). For external validation, the model was tested on an independent dataset of 5,730 warfarin-treated patients from the dabigatran trial (RE-LY). The results demonstrated that for every 10% increase in algorithm-consistent dosing, there was a 6.78% improvement in time in therapeutic range. This optimization was accompanied by significant reductions in composite adverse clinical outcomes, such as stroke and major bleeding [58]. Importantly, these results were similar to those obtained using a rules-based clinical algorithm employed as a benchmark. Beyond dosing prediction, other applications of ML in warfarin treatment have been demonstrated. For instance, Hu et al. [59] developed an extreme gradient boosting model to predict bleeding risk in hospitalized patients receiving warfarin after valve replacement. This model achieved an AUC of 0.882 for predicting INR ≥4.5 demonstrating high accuracy [59]. In addition, ML may be used to identify drug-drug interaction. Hansen et al. [60] applied random forest models to identify significant drug-drug interactions affecting warfarin metabolism analyzing initiation of a novel prescription in previously INR-stable warfarin-treated patients. Their work rediscovered known interactions and also highlighted new potential interactions. These findings are instrumental for clinicians managing complex medication regimens in patients on warfarin therapy, as they allow for tailored adjustments to minimize bleeding risk.
Future Perspective and Conclusion
This review underscores the potential of integrating ML into bleeding risk assessment and management, as well as for reducing bleeding complications due to warfarin treatment. Even though current ML-based risk scores already surpass traditional risk scores in predictive accuracy, they achieve moderate predictive performance with AUC values of only 0.6–0.7. ML-based models are believed to perform better primarily because they capture the factors influencing bleeding risk and their interactions more effectively. Furthermore, the enhanced performance of ML-based models may also be due to changes in the clinical profiles and pharmaceutical treatments of today’s AF patients compared to those used to develop the traditional risk scores. Consequently, these conventional risk scores might now be outdated and less effective at evaluating contemporary patient populations.
While current studies rely primarily on structured tabular data, limiting their scope, future research should further explore integrating large language models for free-text analysis from electronic medical records and unstructured imaging data, such as electrocardiograms and echocardiograms. This can enable a multimodal approach for more comprehensive predictions. To ensure responsible AI implementation, these models must undergo rigorous validation, including randomized controlled trials to confirm their reliability and clinical utility across diverse settings. Transparent reporting and continuous monitoring are essential to ensure these tools meet real-world clinical standards.
The incorporation of ML can significantly enhance clinical decision-making, especially in a clinical landscape with available alternative therapeutic options. For AF who are at high bleeding risk, alternatives such as left atrial appendage occlusion devices offer non-inferior outcomes compared to oral anticoagulation therapies, including similar rates of cardiovascular mortality and incident stroke [61, 62]. These devices provide a valuable option for individuals with elevated bleeding risk scores. Furthermore, the use of ML-based systems promotes safer treatment and better control of INR in patients treated with warfarin.
By enabling more personalized stroke prevention strategies and providing tools with greater precision than traditional risk scores, ML has the potential to promote patient-centered care. As research in this field progresses, ML-driven innovations are expected to play a pivotal role in improving outcomes and enhancing the quality of care for patients with AF requiring anticoagulation therapy.
Conflict of Interest Statement
The authors declare that they have no conflicts of interest.
Funding Sources
This study was not supported by any sponsor or funder.
Author Contributions
Conceptualization, methodology, literature search and screening, and data extraction: T.T.L. and B.F. Writing – original draft: T.T.L., B.F., and M.C.-S. Writing – review, critical revision, and editing: T.T.L., S.T., R.B., M.C.-S., R.K., R.M., D.A.N., S.H., K.S., D.E.F., and B.F. Supervision: B.F.