Introduction: Prediction of outcomes following allogeneic hematopoietic cell transplantation (HCT) remains a major challenge. Machine learning (ML) is a computational procedure that may facilitate the generation of HCT prediction models. We sought to investigate the prognostic potential of multiple ML algorithms when applied to a large single-center allogeneic HCT database. Methods: Our registry included 2,697 patients that underwent allogeneic HCT from January 1976 to December 2017. 45 pretransplant baseline variables were included in the predictive assessment of each ML algorithm on overall survival (OS) as determined by area under the curve (AUC). Pretransplant variables used in the EBMT ML study (Shouval et al., 2015) were used as a benchmark for comparison. Results: On the entire dataset, the random forest (RF) algorithm performed best (AUC 0.71 ± 0.04) compared to the second-best model, logistic regression (LR) (AUC = 0.69 ± 0.04) (p < 0.001). Both algorithms demonstrated improved AUC scores using all 45 variables compared to the limited variables examined by the EBMT study. Survival at 100 days post-HCT using RF on the full dataset discriminated patients into different prognostic groups with different 2-year OS (p < 0.0001). We then examined the ML methods that allow for significant individual variable identification, including LR and RF, and identified matched related donors (HR = 0.49, p < 0.0001), increasing TBI dose (HR = 1.60, p = 0.006), increasing recipient age (HR = 1.92, p < 0.0001), higher baseline Hb (HR = 0.59, p = 0.0002), and increased baseline FEV1 (HR = 0.73, p = 0.02), among others. Conclusion: The application of multiple ML techniques on single-center allogeneic HCT databases warrants further investigation and may provide a useful tool to identify variables with prognostic potential.

Styczyński J, Tridello G, Koster L, Iacobelli S, van Biezen A, van der Werf S, et al. Death after hematopoietic stem cell transplantation: changes over calendar year time, infections and associated factors. Bone Marrow Transplant. 2020;55(1):126–36.
Wingard JR, Majhail NS, Brazauskas R, Wang Z, Sobocinski KA, Jacobsohn D, et al. Long-term survival and late deaths after allogeneic hematopoietic cell transplantation. J Clin Oncol. 2011;29(16):2230–9.
Bacigalupo A, Sormani MP, Lamparelli T, Gualandi F, Occhini D, Bregante S, et al. Reducing transplant-related mortality after allogeneic hematopoietic stem cell transplantation. Haematologica. 2004;89(10):1238–47.
Gratwohl A, Stern M, Brand R, Apperley J, Baldomero H, de Witte T, et al. Risk score for outcome after allogeneic hematopoietic stem cell transplantation: a retrospective analysis. Cancer. 2009;115(20):4715–26.
Sorror ML, Maris MB, Storb R, Baron F, Sandmaier BM, Maloney DG, et al. Hematopoietic cell transplantation (HCT)-specific comorbidity index: a new tool for risk assessment before allogeneic HCT. Blood. 2005;106(8):2912–9.
Vaughn JE, Storer BE, Armand P, Raimondi R, Gibson C, Rambaldi A, et al. Design and validation of an augmented hematopoietic cell transplantation-comorbidity index comprising pretransplant ferritin, albumin, and platelet count for prediction of outcomes after allogeneic transplantation. Biol Blood Marrow Transplant. 2015;21(8):1418–24.
Sorror ML, Storb RF, Sandmaier BM, Maziarz RT, Pulsipher MA, Maris MB, et al. Comorbidity-age index: a clinical measure of biologic age before allogeneic hematopoietic cell transplantation. J Clin Oncol. 2014;32(29):3249–56.
Sorror ML, Logan BR, Zhu X, Rizzo JD, Cooke KR, McCarthy PL, et al. Prospective validation of the predictive power of the hematopoietic cell transplantation comorbidity index: a Center for International Blood and Marrow Transplant Research Study. Biol Blood Marrow Transplant. 2015;21(8):1479–87.
Wang HT, Chang YJ, Xu LP, Liu DH, Wang Y, Liu KY, et al. EBMT risk score can predict the outcome of leukaemia after unmanipulated haploidentical blood and marrow transplantation. Bone Marrow Transplant. 2014;49(7):927–33.
Versluis J, Labopin M, Niederwieser D, Socie G, Schlenk RF, Milpied N, et al. Prediction of non-relapse mortality in recipients of reduced intensity conditioning allogeneic stem cell transplantation with AML in first complete remission. Leukemia. 2015;29(1):51–7.
Nakaya A, Mori T, Tanaka M, Tomita N, Nakaseko C, Yano S, et al. Does the hematopoietic cell transplantation specific comorbidity index (HCT-CI) predict transplantation outcomes? A prospective multicenter validation study of the Kanto Study Group for Cell Therapy. Biol Blood Marrow Transplant. 2014;20(10):1553–9.
Deo RC. Machine learning in medicine. Circulation. 2015;132(20):1920–30.
Taati B, Snoek J, Aleman D, Ghavamzadeh A. Data mining in bone marrow transplant records to identify patients with high odds of survival. IEEE J Biomed Health Inform. 2014;18(1):21–7.
Koyuncugil AS, Ozgulbas N. Donor research and matching system based on data mining in organ transplantation. J Med Syst. 2010;34(3):251–9.
Shouval R, Labopin M, Bondi O, Mishan-Shamay H, Shimoni A, Ciceri F, et al. Prediction of allogeneic hematopoietic stem-cell transplantation mortality 100 days after transplantation using a machine learning algorithm: a European Group for Blood and Marrow Transplantation Acute Leukemia Working Party retrospective data mining study. J Clin Oncol. 2015;33(28):3144–51.
Gupta V, Braun TM, Chowdhury M, Tewari M, Choi SW. A systematic review of machine learning techniques in hematopoietic stem cell transplantation (HSCT). Sensors. 2020;20(21):6100.
Luft T, Benner A, Terzer T, Jodele S, Dandoy CE, Storb R, et al. EASIX and mortality after allogeneic stem cell transplantation. Bone Marrow Transplant. 2020 Mar;55(3):553–61.
Mrózek K, Harper DP, Aplan PD. Cytogenetics and molecular genetics of acute lymphoblastic leukemia. Hematol Oncol Clin North Am. 2009;23(5):991–1010.
Baliakas P, Jeromin S, Iskas M, Puiggros A, Plevova K, Nguyen-Khac F, et al. Cytogenetic complexity in chronic lymphocytic leukemia: definitions, associations, and clinical impact. Blood. 2019;133(11):1205–16.
Campbell LJ. Cancer cytogenetics: methods and protocols. Methods Mol Biol. 2011;730:1–2.
Patnaik MM, Tefferi A. Cytogenetic and molecular abnormalities in chronic myelomonocytic leukemia. Blood Cancer J. 2016;6(2):e393.
Tefferi A, Nicolosi M, Mudireddy M, Lasho TL, Gangat N, Begna KH, et al. Revised cytogenetic risk stratification in primary myelofibrosis: analysis based on 1002 informative patients. Leukemia. 2018;32(5):1189–99.
Rajan AM, Rajkumar SV. Interpretation of cytogenetic results in multiple myeloma for clinical practice. Blood Cancer J. 2015;5(10):e365.
Kluin P, Schuuring E. Molecular cytogenetics of lymphoma: where do we stand in 2010?Histopathology. 2011;58(1):128–44.
Deeg HJ, Scott BL, Fang M, Shulman HM, Gyurkocza B, Myerson D, et al. Five-group cytogenetic risk classification, monosomal karyotype, and outcome after hematopoietic cell transplantation for MDS or acute leukemia evolving from MDS. Blood. 2012;120(7):1398–408.
Moons KG, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1–73.
Chao CM, Yu YW, Cheng BW, Kuo YL. Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree. J Med Syst. 2014;38(10):106–7.
Kotsiantis SB, Zaharakis I, Pintelas P, et al. Supervised machine learning: a review of classification techniques. Emerging Artif intelligence Appl Comput Eng. 2007;160:3–24.
Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak. 2011;11:51.
Chen T, He T, Benesty M, et al. Xgboost: extreme gradient boosting. R Package Version 0.4-2. 2015;1:1–4.
Bouckaert RR, Frank E. Evaluating the replicability of significance tests for comparing learning algorithms. Pacific-asia conference on knowledge discovery and data mining. Springer; 2004. p. 3–12.
Gratwohl A. The EBMT risk score. Bone Marrow Transplant. 2012;47(6):749–56.
Chee L, Tacey M, Lim B, Lim A, Szer J, Ritchie D. Pre-transplant ferritin, albumin and haemoglobin are predictive of survival outcome independent of disease risk index following allogeneic stem cell transplantation. Bone Marrow Transplant. 2017;52(6):870–7.
Hamadani M, Khanal M, Ahn KW, Litovich C, Chow VA, Eghtedar A, et al. Higher total body irradiation dose intensity in fludarabine/TBI-based reduced-intensity conditioning regimen is associated with inferior survival in non-Hodgkin lymphoma patients undergoing allogeneic transplantation. Biol Blood Marrow Transplant. 2020;26(6):1099–105.
Eisenberg L, Brossette C, Rauch J, Grandjean A, Ottinger H, et al.; XplOit consortium. Time-dependent prediction of mortality and cytomegalovirus reactivation after allogeneic hematopoietic cell transplantation using machine learning. Am J Hematol. 2022;97(10):1309–23.
Iwasaki M, Kanda J, Arai Y, Kondo T, Ishikawa T, Ueda Y, et al. Establishment of a predictive model for GVHD-free, relapse-free survival after allogeneic HSCT using ensemble learning. Blood Adv. 2022;6(8):2618–27.
Tang S, Chappell GT, Mazzoli A, Tewari M, Choi SW, Wiens J. Predicting acute graft-versus-host disease using machine learning and longitudinal vital sign data from electronic health records. JCO Clin Cancer Inform. 2020;4:128–35.
Arai Y, Kondo T, Fuse K, Shibasaki Y, Masuko M, Sugita J, et al. Using a machine learning algorithm to predict acute graft-versus-host disease following allogeneic transplantation. Blood Adv. 2019;3(22):3626–34.
Shouval R, Bondi O, Mishan H, Shimoni A, Unger R, Nagler A. Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in SCT. Bone Marrow Transplant. 2014;49(3):332–7.
You do not currently have access to this content.