Abstract
Introduction: Continuous renal replacement therapy (CRRT) is a prolonged continuous extracorporeal blood purification therapy to replace impaired renal function. Typically, CRRT therapy requires routine anticoagulation, but for patients at risk of bleeding and with contraindications to sodium citrate, anticoagulant-free dialysis therapy is necessary. However, this approach increases the risk of CRRT circuit coagulation, leading to treatment interruption and increased resource consumption. In this study, we utilized artificial intelligence machine learning methods to predict the risk of CRRT circuit coagulation based on pre-CRRT treatment metrics. Methods: We retrospectively analyzed 212 patients who underwent anticoagulant-free CRRT from October 2022 to October 2023. Patients were categorized into high-risk and low-risk groups based on CRRT circuit coagulation within 24 h. We employed eight machine learning methods to predict the risk of circuit coagulation. The performance of the model was evaluated using the area under the curve (AUC) of the receiver operating characteristic. 5-fold cross-validation was used to validate the machine learning models. Feature importance and SHAP plots were used to interpret the model’s performance and key drivers. Results: We identified 88 patients (41.51%) at high risk of circuit coagulation within 24 h of CRRT. Our machine learning models showed excellent predictive performance, with ensemble learning achieving an AUC of 0.863 (95% CI: 0.860–0.868), outperforming individual algorithms. Random forest was the best single-algorithm model, with an AUC of 0.819 (95% CI: 0.814–0.823). The top three features identified as most important by the SHAP summary plot and feature importance graph are platelet, filtration fraction (FF), and triglycerides. Conclusion: We created a model using machine learning to predict the risk of circuit coagulation during anticoagulant-free CRRT therapy. Our model performs well (AUC 0.863) and identifies key factors like platelets, FF, and triglycerides. This facilitates the development of personalized treatment strategies by clinicians aimed at reducing circuit coagulation risk, thereby enhancing patient outcomes and reducing healthcare expenses.
Introduction
Continuous renal replacement therapy (CRRT), as an important therapeutic tool in the field of critical care medicine, is used worldwide. CRRT, by replacing continuous and slow clearance function of the kidney, provides an effective blood purification pathway for patients who are unable to maintain normal renal function due to severe disease and multiorgan failure with acute kidney injury [1]. However, anticoagulation during CRRT therapy has been a challenge for clinicians. In CRRT therapy, anticoagulants are used to prevent blood coagulation within the extracorporeal circulation [2]. Therefore, the selection and management of anticoagulation methods are particularly important.
In CRRT therapy, commonly used anticoagulants include heparin, low molecular heparin, and sodium citrate [3‒5]. Although these anticoagulants are effective in preventing circuit coagulation in most cases, their use may increase risk to patients in some specific situations [6]. For patients with high-risk bleeding tendency, the use of heparin or low molecular heparin may exacerbate bleeding [7]. Kidney Disease: Improving Global Outcomes (KDIGO) recommends citrate anticoagulation for CRRT in patients without citrate contraindications [8]. However, in patients with liver insufficiency or elevated circulatory lactate levels during shock, the metabolism of sodium citrate may be impaired, thus increasing its accumulation and subsequent acidosis [6, 9]. Anticoagulant-free CRRT may be necessary with many considerations to optimize the extracorporeal circuit function [10]. This can include the choice of dialysis access, selecting appropriate filters, adjusting dialysis modes, setting reasonable machine parameters, and enhancing operator training [11, 12]. The guidelines suggest that for patients with severe coagulation dysfunction, severe active bleeding, and contraindicated use of anticoagulants, CRRT without anticoagulants is recommended [9, 13]. The mechanism of circuit coagulation is complex and involves multiple factors [14‒16]. Circuit coagulation is influenced by various factors, notably filtration fraction (FF), blood flow rates, the quality of vascular access, platelet counts, triglyceride levels, blood pressure, coagulation indices (activated partial thromboplastin time, international normalized ratio [INR], prothrombin time [PT]), pH balance, ionized calcium concentrations, and patient demographics (gender, age) [1, 14, 17]. Circuit coagulation is an important factor affecting the efficacy of CRRT treatment and may also lead to time without treatment and “downtime” and treatment failure [18, 19]. Therefore, it may be helpful to identify a method to predict and guide intervention and prevent in CRRT filter blockage. In this study, we propose to establish a machine learning-based model to predict the risk of circuit coagulation in CRRT without anticoagulation therapy, which will help guide physicians to adjust the prescription and management of CRRT, and prolong the service life of the extracorporeal circuit and filter, where no anticoagulants are used.
Methods
Study Population
We retrospectively analyzed the medical records of 212 patients who received anticoagulant-free CRRT. Inclusion criteria: patients who underwent CRRT from October 2022 to October 2023 in the Department of Nephrology, Xinqiao Hospital; age ≥18 years; and treatment mode of CVVHDF and where pre-dilution fluid was used. Exclusion criteria: anticoagulants were used for CRRT therapy; patients undergoing systemic anticoagulation, antiplatelet therapy, or thrombolysis. The study was approved by the Ethics Review Committee and the requirement for individual patient consent was waived due to the de-identification of the data.
Data Collection and Data Preprocessing
We collected data on demographic characteristics, clinical conditions, clinical examination data, and laboratory test data from the electronic medical record system at Xinqiao Hospital and obtained CRRT treatment prescription information from the CRRT prescription system. Our analysis included a total of 49 variables, encompassing demographics, vascular access types, CRRT treatment parameters, vital signs, blood gas analysis, hematological parameters, renal function assessments, and coagulation indexes. All data were manually collected from medical records by Dashuang Liu and Liang Liu. The overall missing data rate was 4.99% across all variables. We used multiple imputation to complete the missing values and used density plots to compare the characteristics of the data distribution before and after imputation [20].
Definition of CRRT Circuit Coagulation Risk
Patients were categorized into high-risk and low-risk groups based on their circuit coagulation during CRRT treatment. Those who experienced CRRT circuit coagulation within 24 h of starting treatment, defined as a transmembrane pressure greater than 300 mm Hg, resulting in circuit changes or treatment termination, were classified as the high-risk group [23]. Note that elective stoppages, such as those for imaging, procedures, or due to death or other reasons, were excluded from this analysis. Those who did not experience CRRT circuit coagulation within 24 h of treatment were categorized as the low-risk group. The high-risk and low-risk groups were subsequently used to train and validate a machine learning model for predicting the risk of circuit coagulation in new CRRT patients.
Machine Learning
Machine learning is a subset of artificial intelligence that enables computers to learn from data and make predictions [24]. In this study, we applied machine learning algorithms to identify high-risk patients who are likely to experience extracorporeal circuit coagulation during CRRT treatment. All data analyses were done in R (4.3.2) and Python (3.11.5). We employed internal validation for machine learning, where 80% of the dataset was randomly selected as the training set (a subset of data used to teach the machine learning algorithm patterns and relationships) and the remaining 20% as the validation set. We use the following eight supervised machine learning methods to build predictive models: logistic regression, decision tree, support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), gradient boosting decision tree (GBDT), categorical boosting (CatBoost), and ensemble learning (EL) (RF + XGBoost). Logistic regression predicts categories by estimating the conditional probability of the data, which is based on a sigmoid function that converts linear predictive values into probabilistic outputs such as a probability value between 0 and 1 representing the likelihood of a data point belonging to the positive class. The main idea behind SVMs is to find a clear boundary between different groups of data. Imagine trying to separate different types of objects into distinct categories. SVM aims to find the best dividing line that maximizes the distance between the groups. This helps reduce errors in classification [25]. Decision-tree method constructs a simple-to-follow, tree-shaped model for making decisions. It operates by employing the Gini index, a measure of impurity, to pinpoint the most effective features for splitting the data at each step, thus growing the tree. One challenge with decision trees is their tendency to overly adapt to the training data, a phenomenon known as overfitting. To counteract this and enhance the tree’s ability to generalize to unseen data, a process called post-pruning is applied, which trims away unnecessary branches and nodes without sacrificing the tree’s predictive power [26]. RF, XGBoost, GBDT and CatBoost are all tree-based algorithms. RF makes predictions by constructing and combining multiple decision trees, each trained on a random subset of the data and considering only a random subset of features, each tree contributes its prediction. The final answer from the RF is the most common prediction among all the trees, like the majority vote of the expert panel [27]. GBDT corrects the residuals of the previous trees by continuously training new decision trees, and finally accumulates the results of all the trees as the prediction result, which has strong prediction ability. GBDT operates like a team of doctors collaboratively diagnosing a complex illness. Envision a scenario where the first physician (the initial decision tree) makes an initial assessment but overlooks certain symptoms. The subsequent doctor (the next decision tree) then zeroes in on those overlooked aspects, aiming to rectify the diagnostic gaps. This sequential collaboration persists, with each new physician (tree) learning from the oversights of their predecessors and incrementally refining the collective understanding of the patient’s condition [28]. CatBoost is a powerful machine learning algorithm that predicts outcomes by combining many decision trees. It works by repeatedly building trees, improving predictions each time. XGBoost uses a gradient boosting algorithm to train a series of decision trees and combine them. It works by repeating a process: predict, fix mistakes, and improve. It does this many times, then combines its predictions for a superaccurate forecast [29]. EL allows for better performance than a single learner by integrating multiple models together. We use a soft voting strategy in ensemble learning, and the advantage of soft voting is that it can take into account the confidence level of each classifier to obtain more accurate predictions [30]. To evaluate the generalizability and robustness of our machine learning model, we employed a rigorous 5-fold cross-validation procedure. This technique divides data into 5 groups, using 4 for training and 1 for testing, and repeats this process 5 times. By doing so, it helps ensure the model performs well on unseen data [31].
We evaluated the model performance by the following metrics, area under the curve (AUC) of receiver operating characteristic curve (ROC), accuracy, Youden index, best cutoff, sensitivity, specificity, F1 score, positive predictive value, and negative predictive value. The model was interpreted using various visualization tools, including ROC plot and feature importance plot, to understand its performance and key drivers. Additionally, SHapley Additive exPlanation (SHAP) plots were used to explain how the model makes predictions. SHAP shows how each feature contributes to a prediction, helping us understand why the model makes certain decisions. The SHAP summary plot provides an overview of feature importance, while the SHAP dependency plot reveals how features interact with each other.
Results
Study Population
We included a total of 212 patients. There were 88 (41.51%) cases in the high-risk group of circuit coagulation and 124 (58.49%) cases in the low-risk group. Variables that exhibited significant differences between the two groups included FF, platelets, systolic blood pressure, diastolic blood pressure, estimated glomerular filtration rate, plasma PT, and INR (Table 1).
. | Overall . | Low-risk . | High-risk . | p value . |
---|---|---|---|---|
Patients, n | 212 | 124 | 88 | |
Female, n (%) | 74 (34.9) | 50 (40.3) | 24 (27.3) | 0.058 |
Age, years | 58.55 (15.67) | 57.99 (16.06) | 59.34 (15.16) | 0.538 |
Height, m | 1.63 (0.08) | 1.62 (0.09) | 1.64 (0.08) | 0.252 |
Weight, kg | 62.51 (12.72) | 61.53 (12.21) | 63.89 (13.35) | 0.185 |
BMI, kg/m2 | 23.51 (3.63) | 23.25 (3.57) | 23.87 (3.72) | 0.224 |
Hypertension, n (%) | 107 (50.5) | 55 (44.4) | 52 (59.1) | 0.038 |
Diabetes, n (%) | 51 (24.1) | 23 (18.5) | 28 (31.8) | 0.034 |
Renal failure, n (%) | 0.014 | |||
None | 8 (3.8) | 8 (6.5) | 0 (0.0) | |
AKI | 104 (49.1) | 64 (51.6) | 40 (45.5) | |
CKD | 100 (47.2) | 52 (41.9) | 48 (54.5) | |
CVD (%) | 145 (68.4) | 89 (71.8) | 56 (63.6) | 0.232 |
Smoking, n (%) | 88 (41.5) | 49 (39.5) | 39 (44.3) | 0.572 |
Drinking alcohol, n (%) | 71 (33.5) | 42 (33.9) | 29 (33.0) | 1 |
COVID-19, n (%) | 25 (11.8) | 11 (8.9) | 14 (15.9) | 0.134 |
Vascular access, n (%) | 0.153 | |||
Arteriovenous fistula | 30 (14.2) | 12 (9.7) | 18 (20.5) | |
Right femoral vein NCC | 103 (48.6) | 63 (50.8) | 40 (45.5) | |
Left femoral vein NCC | 53 (25.0) | 34 (27.4) | 19 (21.6) | |
Right internal jugular vein NCC | 21 (9.9) | 11 (8.9) | 10 (11.4) | |
Left internal jugular vein NCC | 3 (1.4) | 3 (2.4) | 0 (0.0) | |
Tunneled cuffed catheter | 2 (0.9) | 1 (0.8) | 1 (1.1) | |
Net ultrafiltration rate, L/h | 0.21 (0.17) | 0.19 (0.17) | 0.23 (0.17) | 0.173 |
FF, % | 22.95 [19.98, 25.15] | 22.44 [17.21, 24.53] | 24.13 [21.66, 25.73] | <0.001 |
SBP, mm Hg | 123.00 [103.75, 142.00] | 119.50 [102.50, 137.25] | 131.00 [108.00, 145.75] | 0.041 |
DBP, mm Hg | 69.00 [57.75, 83.00] | 68.00 [57.00, 78.00] | 74.00 [59.50, 85.25] | 0.03 |
Heart rate (times/min) | 94.40 (21.46) | 95.32 (22.33) | 93.10 (20.22) | 0.459 |
Blood gas analysis | ||||
pH | 7.39 (0.10) | 7.38 (0.11) | 7.39 (0.09) | 0.597 |
Partial pressure of carbon, mm Hg | 35.23 (11.22) | 34.51 (9.54) | 36.26 (13.22) | 0.265 |
Partial pressure of oxygen, mm Hg | 102.80 (39.55) | 103.27 (40.63) | 102.14 (38.19) | 0.839 |
Sodium, mmol/L | 144.54 (77.39) | 148.44 (100.66) | 139.06 (12.09) | 0.386 |
Potassium, mmol/L | 4.60 (1.00) | 4.65 (1.04) | 4.53 (0.94) | 0.403 |
Calcium, mmol/L | 1.11 (0.72) | 1.11 (0.78) | 1.11 (0.63) | 0.986 |
Blood glucose, mmol/L | 8.78 (4.48) | 8.46 (4.14) | 9.24 (4.92) | 0.209 |
Bicarbonate, mmol/L | 21.28 (6.16) | 21.03 (6.31) | 21.64 (5.95) | 0.474 |
Oxygen saturation, % | 95.80 (5.68) | 95.48 (6.60) | 96.24 (4.03) | 0.337 |
Leukocyte, 109/L | 15.17 (25.48) | 15.82 (31.66) | 14.24 (12.49) | 0.657 |
Hemoglobin, g/L | 86.24 (24.20) | 86.17 (24.69) | 86.33 (23.62) | 0.962 |
Platelet, 109/L | 82.00 [50.00, 152.75] | 58.50 [34.00, 106.75] | 138.50 [81.00, 201.75] | <0.001 |
Hct, % | 26.76 (7.12) | 26.64 (7.11) | 26.93 (7.17) | 0.772 |
Albumin, g/L | 32.90 [29.80, 35.70] | 33.35 [30.17, 36.20] | 32.55 [28.75, 34.90] | 0.085 |
Urea nitrogen, mmol/L | 27.90 (15.47) | 27.20 (14.40) | 28.89 (16.90) | 0.434 |
Serum creatinine, μmol/L | 502.87 (373.09) | 474.30 (407.40) | 543.14 (316.42) | 0.186 |
eGFR, mL/min/1.73 m2 | 11.00 [6.00, 23.00] | 12.00 [6.75, 26.00] | 9.00 [6.00, 15.25] | 0.032 |
Uric acid, μmol/L | 520.08 (320.73) | 550.53 (373.04) | 477.18 (222.64) | 0.101 |
Triglycerides, mmol/L | 2.10 (1.54) | 1.96 (1.44) | 2.30 (1.67) | 0.105 |
Total cholesterol, mmol/L | 3.61 (1.35) | 3.68 (1.37) | 3.51 (1.32) | 0.373 |
HDL, mmol/L | 0.83 (0.43) | 0.85 (0.46) | 0.80 (0.38) | 0.332 |
LDL, mmol/L | 1.69 (0.85) | 1.68 (0.88) | 1.70 (0.82) | 0.841 |
PT, s | 13.25 [11.50, 18.55] | 13.90 [11.67, 20.22] | 12.90 [11.17, 15.30] | 0.009 |
INR | 1.19 [1.03, 1.66] | 1.26 [1.05, 1.78] | 1.15 [1.00, 1.37] | 0.005 |
APTT, s | 38.20 (29.10) | 37.84 (19.89) | 38.70 (38.66) | 0.831 |
TT, s | 22.88 (39.03) | 24.25 (42.31) | 20.93 (34.02) | 0.543 |
Fibrinogen, g/L | 3.35 (1.53) | 3.21 (1.46) | 3.56 (1.61) | 0.095 |
PCT, ng/mL | 10.32 (21.72) | 10.91 (24.03) | 9.49 (18.08) | 0.641 |
CRP, mg/L | 88.39 (79.94) | 89.83 (72.46) | 86.35 (89.84) | 0.756 |
. | Overall . | Low-risk . | High-risk . | p value . |
---|---|---|---|---|
Patients, n | 212 | 124 | 88 | |
Female, n (%) | 74 (34.9) | 50 (40.3) | 24 (27.3) | 0.058 |
Age, years | 58.55 (15.67) | 57.99 (16.06) | 59.34 (15.16) | 0.538 |
Height, m | 1.63 (0.08) | 1.62 (0.09) | 1.64 (0.08) | 0.252 |
Weight, kg | 62.51 (12.72) | 61.53 (12.21) | 63.89 (13.35) | 0.185 |
BMI, kg/m2 | 23.51 (3.63) | 23.25 (3.57) | 23.87 (3.72) | 0.224 |
Hypertension, n (%) | 107 (50.5) | 55 (44.4) | 52 (59.1) | 0.038 |
Diabetes, n (%) | 51 (24.1) | 23 (18.5) | 28 (31.8) | 0.034 |
Renal failure, n (%) | 0.014 | |||
None | 8 (3.8) | 8 (6.5) | 0 (0.0) | |
AKI | 104 (49.1) | 64 (51.6) | 40 (45.5) | |
CKD | 100 (47.2) | 52 (41.9) | 48 (54.5) | |
CVD (%) | 145 (68.4) | 89 (71.8) | 56 (63.6) | 0.232 |
Smoking, n (%) | 88 (41.5) | 49 (39.5) | 39 (44.3) | 0.572 |
Drinking alcohol, n (%) | 71 (33.5) | 42 (33.9) | 29 (33.0) | 1 |
COVID-19, n (%) | 25 (11.8) | 11 (8.9) | 14 (15.9) | 0.134 |
Vascular access, n (%) | 0.153 | |||
Arteriovenous fistula | 30 (14.2) | 12 (9.7) | 18 (20.5) | |
Right femoral vein NCC | 103 (48.6) | 63 (50.8) | 40 (45.5) | |
Left femoral vein NCC | 53 (25.0) | 34 (27.4) | 19 (21.6) | |
Right internal jugular vein NCC | 21 (9.9) | 11 (8.9) | 10 (11.4) | |
Left internal jugular vein NCC | 3 (1.4) | 3 (2.4) | 0 (0.0) | |
Tunneled cuffed catheter | 2 (0.9) | 1 (0.8) | 1 (1.1) | |
Net ultrafiltration rate, L/h | 0.21 (0.17) | 0.19 (0.17) | 0.23 (0.17) | 0.173 |
FF, % | 22.95 [19.98, 25.15] | 22.44 [17.21, 24.53] | 24.13 [21.66, 25.73] | <0.001 |
SBP, mm Hg | 123.00 [103.75, 142.00] | 119.50 [102.50, 137.25] | 131.00 [108.00, 145.75] | 0.041 |
DBP, mm Hg | 69.00 [57.75, 83.00] | 68.00 [57.00, 78.00] | 74.00 [59.50, 85.25] | 0.03 |
Heart rate (times/min) | 94.40 (21.46) | 95.32 (22.33) | 93.10 (20.22) | 0.459 |
Blood gas analysis | ||||
pH | 7.39 (0.10) | 7.38 (0.11) | 7.39 (0.09) | 0.597 |
Partial pressure of carbon, mm Hg | 35.23 (11.22) | 34.51 (9.54) | 36.26 (13.22) | 0.265 |
Partial pressure of oxygen, mm Hg | 102.80 (39.55) | 103.27 (40.63) | 102.14 (38.19) | 0.839 |
Sodium, mmol/L | 144.54 (77.39) | 148.44 (100.66) | 139.06 (12.09) | 0.386 |
Potassium, mmol/L | 4.60 (1.00) | 4.65 (1.04) | 4.53 (0.94) | 0.403 |
Calcium, mmol/L | 1.11 (0.72) | 1.11 (0.78) | 1.11 (0.63) | 0.986 |
Blood glucose, mmol/L | 8.78 (4.48) | 8.46 (4.14) | 9.24 (4.92) | 0.209 |
Bicarbonate, mmol/L | 21.28 (6.16) | 21.03 (6.31) | 21.64 (5.95) | 0.474 |
Oxygen saturation, % | 95.80 (5.68) | 95.48 (6.60) | 96.24 (4.03) | 0.337 |
Leukocyte, 109/L | 15.17 (25.48) | 15.82 (31.66) | 14.24 (12.49) | 0.657 |
Hemoglobin, g/L | 86.24 (24.20) | 86.17 (24.69) | 86.33 (23.62) | 0.962 |
Platelet, 109/L | 82.00 [50.00, 152.75] | 58.50 [34.00, 106.75] | 138.50 [81.00, 201.75] | <0.001 |
Hct, % | 26.76 (7.12) | 26.64 (7.11) | 26.93 (7.17) | 0.772 |
Albumin, g/L | 32.90 [29.80, 35.70] | 33.35 [30.17, 36.20] | 32.55 [28.75, 34.90] | 0.085 |
Urea nitrogen, mmol/L | 27.90 (15.47) | 27.20 (14.40) | 28.89 (16.90) | 0.434 |
Serum creatinine, μmol/L | 502.87 (373.09) | 474.30 (407.40) | 543.14 (316.42) | 0.186 |
eGFR, mL/min/1.73 m2 | 11.00 [6.00, 23.00] | 12.00 [6.75, 26.00] | 9.00 [6.00, 15.25] | 0.032 |
Uric acid, μmol/L | 520.08 (320.73) | 550.53 (373.04) | 477.18 (222.64) | 0.101 |
Triglycerides, mmol/L | 2.10 (1.54) | 1.96 (1.44) | 2.30 (1.67) | 0.105 |
Total cholesterol, mmol/L | 3.61 (1.35) | 3.68 (1.37) | 3.51 (1.32) | 0.373 |
HDL, mmol/L | 0.83 (0.43) | 0.85 (0.46) | 0.80 (0.38) | 0.332 |
LDL, mmol/L | 1.69 (0.85) | 1.68 (0.88) | 1.70 (0.82) | 0.841 |
PT, s | 13.25 [11.50, 18.55] | 13.90 [11.67, 20.22] | 12.90 [11.17, 15.30] | 0.009 |
INR | 1.19 [1.03, 1.66] | 1.26 [1.05, 1.78] | 1.15 [1.00, 1.37] | 0.005 |
APTT, s | 38.20 (29.10) | 37.84 (19.89) | 38.70 (38.66) | 0.831 |
TT, s | 22.88 (39.03) | 24.25 (42.31) | 20.93 (34.02) | 0.543 |
Fibrinogen, g/L | 3.35 (1.53) | 3.21 (1.46) | 3.56 (1.61) | 0.095 |
PCT, ng/mL | 10.32 (21.72) | 10.91 (24.03) | 9.49 (18.08) | 0.641 |
CRP, mg/L | 88.39 (79.94) | 89.83 (72.46) | 86.35 (89.84) | 0.756 |
Continuous variables that conform to a normal distribution are expressed as mean (standard deviation), continuous variables that are not normally distributed are expressed as median (interquartile spacing), and categorical variables are expressed as number (%).
BMI, body mass index; AKI, acute kidney injury; COVID-19, coronavirus disease 2019; CVD, cardiovascular disease; SBP, systolic blood pressure; DBP, diastolic blood pressure; eGFR, estimated glomerular filtration rate; HDL, high-density lipoprotein; LDL, low-density lipoprotein; PT, prothrombin time; INR, international normalized ratio; APTT, activated partial thromboplastin time; TT, thrombin time; PCT, procalcitonin; FF, filtration fraction; CRP, C-reactive protein; Hct, hematocrit; NCC, non-cuffed central venous catheter.
Machine Learning
We used the following eight machine learning methods including logistic regression, decision tree, SVM, RF, XGBoost, GBDT, CatBoost, and ensemble learning (RF + XGBoost) with all the variables as inputs to predict the CRRT circuit coagulation risk. 80% of the patients were randomly selected as the training set and the remaining 20% as the validation set (online suppl. Table S1; for all online suppl. material, see https://doi.org/10.1159/000540695). Figure 1a shows the ROC, AUC, and their 95% CI for different algorithms. RF (AUC = 0.819) showed the best predictive performance among the seven individual machine learning methods however ensemble learning (RF + XGBoost) (AUC = 0.863) showed higher predictive performance than the other single algorithms (Fig. 1a). The best cutoff, Youden index, sensitivity, specificity, F1 score, positive predictive value, negative predictive value, accuracy of ensemble learning were 0.50, 66.44%, 81.25%, 85.19%, 0.75, 0.75, 0.85, 0.81 (online suppl. Table S2). The mean ROC of the 5-fold cross-validation for ensemble learning is demonstrated as 0.806 in Figure 1b.
We visualize the importance of the top 20 features for the prediction results of the RF model and the direction of their influence through the SHAP summary plot (Fig. 2). The features in the SHAP summary plot are shown in order of importance. In the RF model, the higher the SHAP value of a feature, the higher the risk of circuit coagulation. The relationship between the top 4 feature values and their corresponding SHAP values is explored through a SHAP dependency plot (Fig. 3), taking into account the influence of other features. Each point corresponds to a sample’s value on that feature and its corresponding SHAP value. The distribution pattern of the points can reveal whether the relationship between feature values and model predictions is linear or otherwise complex. When the SHAP values of the features exceeded zero, there was an increased risk of circuit coagulation. Figure 4 displays the feature importance ranking of the RF model, with platelet count, FF, and triglyceride levels emerging as the top three most influential features.
Discussion
In this retrospective cohort study, we developed and validated machine learning algorithms for predicting the risk of CRRT treatment circuit coagulation using 49 pre-CRRT treatment and CRRT prescription features. Among the eight machine learning models we tested, RF demonstrated the best predictive performance among the single-algorithm models, while ensemble learning had a higher AUC than all other models. Of the top five features ranked in terms of feature importance, four features were responsive to patient status and one feature was a CRRT treatment-related parameter, suggesting that both patient status and CRRT treatment parameters have an impact on circuit coagulation. Our model helps personalize anticoagulation and CRRT treatment prescriptions for each patient.
This study found that FF, platelets, systolic blood pressure, DBP, estimated glomerular filtration rate, PT, and INR showed significant differences between high- and low-risk groups. The top three features identified as most important by the SHAP summary plot and feature importance graph are platelet, FF, and triglycerides. In a meta-analysis, Brain et al. [14] identified elevated platelet counts as a patient factor associated with shorter filter life. In a cardiac ICU study, antiplatelet therapy demonstrated a significant extension of CRRT filter lifespan, emphasizing the positive impact these agents have on preserving filter function. This finding underscores the crucial role of platelet activity in determining filter durability [32]. The investigation conducted by Deep et al. [33] revealed that using epoprostenol, a prostacyclin analog, as the exclusive anticoagulant in CRRT for critically ill pediatric liver disease patients resulted in a median filter life of 48 h, with 60.5% lasting up to 60 h. Their findings reinforce the efficacy of this novel anticoagulation strategy in enhancing filter longevity, further emphasizing the pivotal role platelet function plays in determining filter survival performance. The difference of FF reflects the influence of different filtration efficiency on the coagulation risk of the circuit. SHAP summary plots indicate that increased platelet counts and FF are associated with heightened risk of circuit coagulation. From the KDIGO guidelines in 2012 to the Acute Dialysis Quality Initiative (ADQI) guidelines in 2016, the requirements for FF have become more and more stringent, from 30% to 25%, indicating that experts believe that low FF is conducive to the protection of extracorporeal circuit [8, 34]. Research has pinpointed triglyceride as a contributor to clogging in CRRT. Elevated triglyceride levels were linked to clogging incidents in COVID-19 patients undergoing CRRT, implying that hypertriglyceridemia significantly hinders the patency of extracorporeal circuits [16]. The levels of cholesterol were found to exhibit a strong correlation with significant elevations in fibrinogen and factor VIIc [35]. Additionally, other studies have demonstrated that hypertriglyceridemia is associated with abnormal coagulation factors and reduced fibrinolytic activity, thereby constituting a “hypercoagulable state” that augments the risk of cardiovascular disease [36]. These two studies suggest that distinct types of dyslipidemia may impact the clotting process through diverse mechanisms.
Current research suggests that there are many non-anticoagulant factors associated with CRRT circuit coagulation. Brain et al. [14] in a meta-analysis of filter life related to non-anticoagulation suggested that patient condition, vascular access, CRRT circuit, and operator influence filter life, with a mean filter life of 21.92 h in the included studies (n = 7,502, SD = 10.89). Implementing a bubble trap can effectively capture and eliminate air bubbles, thereby minimizing bubbles in the circuit, reducing blood contact with air, and reducing the risk of coagulation [37]. Studies suggested that the filter life of CVVHD was significantly longer than that of CVVH [38, 39]. A study revealed that the filter lifespan in prefilter CVVH was 79% of that noted in CVVHD. The median duration until filter replacement was 21.8 (11.4–45.3) hours for prefilter CVVH, whereas it extended to 26.6 (13.0–63.5) hours in the case of CVVHD [38]. Due to the pre-dilution in CVVH, according to the previously mentioned FF calculation formula, it can be found that even if the ultrafiltration rate and plasma flow are the same, the FF values of CVVH and CVVHD are different. CRRT treatment mode has been shown to impact filter life. Insufficient anticoagulation, high FF and damaged vascular access are key factors contributing to shortened filter life [1]. In the operation and management of CRRT circuits, many factors are also considered to be important indicators of circuit life. This includes the location of vascular access and the level of ability of the caregiver to prepare and monitor circuit function [17].
Research on predictive models for the risk of circuit coagulation in anticoagulant-free CRRT is limited. Zhang et al. [40] included 170 patients with CVVH without anticoagulants, 90 patients with filter life less than 24 h, and found that platelet, uric acid, D-dimer, mean arterial pressure, activated partial thromboplastin time, ALP, and urea nitrogen were correlated with filter life through logistic regression analysis and their model AUC = 0.828. Logistic regression assumes by default that the data are linearly separable, which limits its ability to handle nonlinear relationships. This study employs a suite of machine learning algorithms, each with unique strengths. Specifically, tree-based models like RF, XGBoost, GBDT, and CatBoost excel at uncovering nonlinear relationships and intricate feature interactions, thereby enhancing the identification of key predictors and improving model performance [41]. Our analysis encompasses a comprehensive set of parameters, encompassing both patient status indicators and prescriptions of CRRT therapy: blood flow, replacement fluid rate, blood flow rate, and ultrafiltration rate, which collectively inform the calculation of the FF. Notably, the characteristic variables included in our model are routine clinical data that can be easily obtained and assessed for circuit coagulation risk prior to CRRT treatment.
There are several limitations to our study. First, the sample size we included was not large enough, so we adopted the tree-based ensemble learning method, which can achieve good generalization ability even in the case of limited data. Second, the data for the study were collected based on a retrospective cohort study, which means there could be unmeasured confounding factors or other factors that could potentially affect the results. Third, we did not adequately reduce the number of variables included, and the potential synergies emerging from the interplay of these variables remain underexplored. Future research endeavors should focus on assessing the combined impact of multiple variables, aiming to significantly boost the precision of predictive models. In addition, because the study was a single-center study, the results may not be generalizable to other healthcare facilities or different populations. Furthermore, although the model performs well, this study uses internal validation, and the validity and feasibility of the model need to be further evaluated in practical applications. Moreover, another limitation of our study is that we have not developed a user-friendly application for clinical practice. To address this, we have completed a 5-fold cross-validation of our model, affirming its internal reliability. Subsequently, we plan to conduct thorough external validation in the next phases of our project. The creation of a web-based or mobile application will proceed once the model’s robustness is confirmed through this external validation.
Conclusion
Our study developed a risk model for predicting CRRT circuit coagulation using multiple machine learning algorithms. We identified key features and demonstrated the potential of machine learning in personalizing anticoagulation and CRRT treatment prescriptions.
Statement of Ethics
Prior to the commencement of this study, it was granted approval by the Medical Ethics Committee of Xinqiao Hospital (Approval No. 2024-057-01), with the assessment that the risk posed to patients would not exceed a minimal level. Consequently, the committee authorized us to forego obtaining informed consent. Furthermore, it was ascertained that participation or nonparticipation would in no way negatively impact the personal interests of individuals involved.
Conflict of Interest Statement
There is no conflicting relationship for any author.
Funding Sources
The study received support from the National Natural Science Foundation China (No. 82100783).
Author Contributions
Jinghong Zhao and Ting He designed the study; Liang Liu and Dashuang Liu conducted data collection; Liang Liu and Bo Liang conducted data analysis; Liang Liu wrote the manuscript; and Jinghong Zhao and Bo Liang revised the manuscript. The contributions of Liu Liang, Liu Dashuang, and Ting He to this research are equal.
Additional Information
Liang Liu, Dashuang Liu, and Ting He contributed equally to this work.
Data Availability Statement
For access to the raw data, due to ethical considerations and participant confidentiality, please send a formal request to the corresponding author.