Abstract
Introduction: Machine learning can enable the development of predictive models that incorporate multiple variables for a systems approach to organ allocation. We explored the principle of Bayesian Belief Network (BBN) to determine whether a predictive model of graft survival can be derived using pretransplant variables. Our hypothesis was that pretransplant donor and recipient variables, when considered together as a network, add incremental value to the classification of graft survival. Methods: We performed a retrospective analysis of 5,144 randomly selected patients (age ≥18, deceased donor kidney only, first-time recipients) from the United States Renal Data System database between 2000 and 2001. Using this dataset, we developed a machine-learned BBN that functions as a pretransplant organ-matching tool. Results: A network of 48 clinical variables was constructed and externally validated using an additional 2,204 patients of matching demographic characteristics. This model was able to predict graft failure within the first year or within 3 years (sensitivity 40%; specificity 80%; area under the curve, AUC, 0.63). Recipient BMI, gender, race, and donor age were amongst the pretransplant variables with strongest association to outcome. A 10-fold internal cross-validation showed similar results for 1-year (sensitivity 24%; specificity 80%; AUC 0.59) and 3-year (sensitivity 31%; specificity 80%; AUC 0.60) graft failure. Conclusion: We found recipient BMI, gender, race, and donor age to be influential predictors of outcome, while wait time and human leukocyte antigen matching were much less associated with outcome. BBN enabled us to examine variables from a large database to develop a robust predictive model.
Introduction
Renal transplantation has emerged as the definitive treatment for end-stage renal disease with an increasing waiting list. The expansion of the wait list far exceeds the number of available donor organs, contributing to the stress on the allocation system [1]. In 2007, approximately 72,000 patients were listed with the United Network for Organ Sharing (UNOS), with only 17,513 receiving transplants, which was a 3% decrease over the previous year. Of those patients transplanted with deceased-donor grafts, approximately 10% of the grafts will fail in the first year, with an additional 32% failing at 5 years and 61% at 10 years, which would return those patients to the wait list [2]. In an effort to bridge this gap, we are relying on extended criteria donors as well as donation after cardiac death. With the use of a greater number of these grafts, our ability to predict graft failure becomes critical to maximize donation to the most suitable recipient and to minimize the flow of patients returning to the already burdened wait list. We hypothesized and found that pretransplant donor and recipient variables, when considered together as a network, add incremental value to the classification of graft survival. This work gives transplant surgeons an objective tool for pairing donor organs with appropriate recipients to optimize outcomes.
As evidence-based medicine becomes the standard of care, clinicians look towards prognostic tools to assist in decision making [3]. While there are several publications on various models to predict allograft survival, these either rely on pre- and postoperative variables, or use only a handful of preoperative variables for model functionality. Some of these models include nomograms, neural networks and tree-modeling with positive predictive values for graft survival of 43.5, 82.1 and 76% [3,4,5,6]. However, these have not yet been implemented routinely.
Bayesian statistics is well suited to the analysis of large numbers of variables to predict outcomes. Originally developed in the 18th century, advances in computing power have made it practical today. Bayesian methodology has been used to predict survival in liver transplant patients, whereby using pretransplant variables the authors were able to predict 90-day survival with a positive predictive value of 91% and an area under the curve (AUC) of 0.681 [7]. This approach has not yet been applied to outcome prediction in renal allografts. Unlike traditional or frequentist statistical methods, bayesian statistics lends itself to use with large databases, can tolerate missing values and incomplete variables, and can graphically describe the probability distributions of outcomes [8,9]. In other words, this type of statistical analysis allows for the use of an unlimited number of variables, and not only shows the relationship between each variable and the targeted outcome, but also the contribution of inter-variable relationships to the probability of each outcome.
Utilizing patient information from the United States Renal Data System (USRDS) [2,10], we used a machine-learning tool to generate a minimized bayesian network that accurately predicts graft failure 1 and 3 years after transplantation based solely on preoperative variables. In this model, donor age, recipient BMI and gender were amongst the pretransplant variables with strongest association to outcome, variables which are not currently incorporated in allocation schemes.
Materials and Methods
Data were obtained from the USRDS database (2004) after approval from the institutional review board of the National Naval Medical Center NNMC.2010.0014 (Office of Naval Research work unit number 604771N.0933.001.A0604).
A total of 1,266,494 cases were screened for data analysis (SPSS 16 and 19, SPSS Inc., Chicago, Ill., USA). Data were curated for accuracy and completeness. Inclusion criteria were first-time graft, 18 years of age or older and deceased-donor kidney-only recipient. The cohort was further narrowed by selecting transplants performed in 2000 and 2001. All cases in which we were unable to determine the outcome were removed. Thus, 7,348 patients remained, of which 5,144 were randomly selected for model construction and 2,204 for validation (fig. 1). There were a total of 793 pre- and posttransplant variables extracted from the database, which were ultimately narrowed to 52 variables based on clinical expertise, global modeling, and excluding those variables collected during follow-up appointments that did not directly describe outcome. This process minimizes the complexity of the model and elucidates the value of variables that power outcome prediction. However, it may also eliminate variables previously considered individually associated with graft survival; this is a limitation of our study.
Cohort selection scheme for model construction and validation. The number of records (n) remaining after each step of the selection process based on our inclusion criteria is shown.
Cohort selection scheme for model construction and validation. The number of records (n) remaining after each step of the selection process based on our inclusion criteria is shown.
All data processing was done with either SPSS (v. 16-19) or Excel (2007, Microsoft Corp., Redmond, Wash., USA). Outcomes of interest were 1, 3, and >3 years’ survival. Graft survival was calculated as time from transplant date to graft failure date. In the absence of a graft failure date, surrogates for failure were used (a return to maintenance dialysis, a second transplant, or recipient date and cause of death. Grafts that did not fail within the confines of our dataset were given a survival length using their latest follow-up date as a report of minimum survival. If a patient did not have a follow-up date reported or was lost to follow-up without a report of graft failure, then that patient was removed from analysis as stated above.
FasterAnalytics (DecisionQ Corp., Washington, D.C., USA) was used to apply a set of heuristics to generate hypothetical models with different conditional independence assumptions on 5,144 of the 7,348 included patients [11]. The Bayesian Belief Networks (BBNs) constructed encode the joint probability distribution of all variables in the clinical dataset as a directed network incorporating parent-child relationships representing conditional independence assumptions.
The current network model was constructed using a minimum description length (MDL) gain (a weighting of the MDL or the bayesian information criterion that trades off goodness of fit for model complexity) of 0.5. An MDL gain of 1.0 leads to a relatively equal weighting of representation of the known data and complexity to yield a robust model. Continuous variables were divided into two bins based on equal areas under the distribution curves; use of three bins was also investigated and did not result in added benefit. This method, binning continuous variables, has the benefit of reducing ‘noise’ in the data, but also loses information. This is a limitation of our study. An additional bin was included for missing data where appropriate.
The network was queried to provide estimates for posterior probabilities given a priori knowledge, and the model accuracy was validated using data from 2,204 patients withheld from the initial training dataset. Probability of graft survival was calculated using only variables whose values could be known prior to transplantation and ignoring all posttransplant variables. Model performance was evaluated using receiver operator characteristic curves (ROC). The AUC is calculated as a measure of classification accuracy with 0.5 representing random chance, i.e. the model is right just as often as it is wrong, and 1.0 (or 0.0), indicating perfect classification of both the positive and negative outcomes.
A 10-fold internal cross-validation was used to assess robustness. It was then externally validated with the additional 2,204 patients from the same time period to evaluate functionality. Finally, an additional cohort of patients randomly selected from the years 1997, 2002, and 2003, and meeting the same selection criteria as the model training cohort, were used as a test set to evaluate model robustness with respect to transplant year.
As a demonstration of utility, the model was used to estimate the number of grafts that could have been reallocated for improved, predicted survival. Two Organ Procurement Transplantation Networks (OPTN) were evaluated: OPTN 2 (Washington, D.C., New Jersey, Maryland, West Virginia, and Pennsylvania) with a long wait time, and OPTN 6 (Hawaii, Washington, Oregon, Alaska, Montana, and Idaho) with a short wait time. As a matter of reference, based on OPTN data as of April 29, 2011, for kidney registrations listed 1999–2004, OPTN 2 had a median waiting time of 1,357 days (Caucasian recipients; 2003–2004), and OPTN 6 had a median waiting time of 831 days (via optn.transplant.hrsa.gov on May 5, 2011).
Finally, the donor information from a graft that failed within the first year was applied to other recipients in this same 2-year cohort to demonstrate how the model may be used as an allocation tool.
Results
Demographics
The donor and recipient characteristics for key variables from the 5,144-patient model training dataset and the 2,204-patient external validation dataset were compared. Both populations were well matched for donor and recipient age, gender, race, and BMI (p > 0.05). The recipients were well matched for time on dialysis and graft survival (p > 0.05; table 1).
Machine Learning Yields a Robust and Predictive (or Accurate) Model
The internal, 10-fold cross-validation confirmed model robustness, as measured by AUC, for both 1-year and 3-year graft failure (0.59 and 0.60, respectively). This exercise yielded a sensitivity and specificity for graft failure of 24.3 and 83.4%, respectively, 1 year, and 30.6 and 80.2%, respectively, 3 years after transplantation using a threshold of 8.35% (1-year failure) or 14.3% (3-year failure) probability for a positive test.
The external validation also demonstrated predictive accuracy for identifying 1-year and 3-year graft failure (AUC 0.63 and 0.63). The model successfully identified 55 of the 138 grafts that failed within 1 year following current allocation practices. ROC curves are shown superimposed in figure 2. Using the thresholds mentioned above for a positive test for graft failure, our model had a sensitivity of 39.9% and specificity of 79.9%. This was maintained for 3-year graft failure with a sensitivity of 39.8% and specificity of 80.2%. Positive and negative predictive values differed greatly between a balanced test cohort, which has an equal number of cases in each graft failure class, and one that is representative of the incidence rate, but is overrepresentative of graft survival (data not shown). The latter was pursued in order to present the most conservative model; one that considers the largest cohort available and demonstrates the ‘promise’ of a machine learning approach to donor-recipient matching.
ROC curves for external and internal model validations. The internal cross-validations (dashed lines) are shown overlaid with external validations (solid lines) for predictions of graft failure within 1 year (black) or 3 years (gray) to illustrate the similarity between the robustness of the model and its accuracy.
ROC curves for external and internal model validations. The internal cross-validations (dashed lines) are shown overlaid with external validations (solid lines) for predictions of graft failure within 1 year (black) or 3 years (gray) to illustrate the similarity between the robustness of the model and its accuracy.
BBN Demonstrated that Recipient BMI, Gender, Race, and Donor Age Drive Outcome
The BBN consisted of 48 nodes, with 37 nodes representing pretransplant variables, 12 representing postoperative variables, and 3 nodes representing outcomes of 1, 3, and >3 years’ survival (table 2; fig. 3). The model showed recipient BMI, gender, race, and donor age to be the pretransplant variables with strongest association with survival as illustrated by being primary or secondary nodes of the graft survival nodes in the model (fig. 3). Of those nodes, one (donor age) was classified as a parent of an outcome node, while the remaining nodes were classified as having a shared child with or are grandparents or children of the outcome nodes. Although time on dialysis (a surrogate for time on list) and human leukocyte antigen (HLA) matching are both associated with graft survival, as can be seen in the network, they are not as closely related to outcome as the aforementioned variables (fig. 3).
Bayesian network structure. Each box is a node and contains the probability distribution of a particular variable (see table 2). Lines indicate the connectivity of the conditional dependences of these distributions. A few variables of interest are shown as the expanded bar graph of the underlying probability distribution. Posttransplant variables were ignored when calculating the predictions described in the text.
Bayesian network structure. Each box is a node and contains the probability distribution of a particular variable (see table 2). Lines indicate the connectivity of the conditional dependences of these distributions. A few variables of interest are shown as the expanded bar graph of the underlying probability distribution. Posttransplant variables were ignored when calculating the predictions described in the text.
Model Performance Is Affected by Sampling Time
The effect of sampling time frame on model performance was tested using data from 4,422 patients from 1997, 3,615 from 2002, and 423 from 2003, which were the total number of records meeting the selection criterion as per the 2000–2001 cohort. The performance as measured by AUC was 0.59 for 1997, 0.597 for 2002, and 0.50 for 2003 for 1-year failure. The predictive performance for 3-year failure for 1997 was 0.59 and for 2002 was 0.60 as measured by AUC. Three-year survival was not evaluated for 2003 as the USRDS data obtained for our investigation concluded with January 2005.
Model-Based Allocation Avoids Graft Failure per OPTN
The model was applied to two OPTNs to demonstrate the potential for region-specific graft recovery. For the years 2000 and 2001 in OPTN 2 (New Jersey, Pennsylvania, Maryland, District of Columbia, Delaware, and West Virginia), 890 transplants meeting the selection criteria were performed. Of those, 77 failed within the first year; using the same probability thresholds as the validation exercises, the model predicted 37 of those as failures with a sensitivity of 48.1% and specificity of 70.2%. This equates to a potential 4% graft reallocation. One hundred and twenty-four grafts failed in the first 3 years. The model predicted 60 of those failures with a sensitivity of 48.4% and specificity of 70.5%. When applied to OPTN 6 (Washington, Oregon, Idaho, Montana, and Hawaii), with a shorter wait time, the results were similar. Of the 279 grafts in this cohort transplanted in 2000 or 2001, 19 grafts failed in the first year. This model predicted 4 of those failures, which would suggest that an additional 1.43% of available organs (or 21.1% of those failures) should be reallocated to another candidate recipient.
To demonstrate this, a patient whose graft failed in the first year after transplant was selected randomly from the OPTN 2 data. The donor information was then applied to all recipients in the above-described cohort (n = 890). Seventy-seven grafts in this sub-cohort failed within the first year, 30 of which the model predicted would survive had that recipient received the example organ; interestingly, this ‘reallocation’ was predicted to lead to 3-year survival for 25 of these new donor-recipient pairs with probabilities ranging from 0.105 to 0.143 for graft failure. Additionally, 51% of the remaining 813 recipients, all of whose grafts survived the first year, were also predicted to have survived with the example organ with >0.916 probability.
With regard to 3-year graft survival, the model identified 18 recipients whose graft survived longer than 1 year, but failed in less than 3 years, and might have survived greater than 3 years with a reallocation of the example organ (<0.138 failure probability). The 47 failed allografts that were also classified as failures within the first year with this hypothetical donor continued to be identified as failures at 3 years. The remaining surviving grafts were predicted to have also survived with this organ.
Discussion
Currently, kidneys are allocated based on HLA matching and time on list. Nomograms, neural networks, and decision trees have become popular methods for creating more objective ways to predict transplant outcomes [3,5,12,13,14]. This is the first study that uses BBN to predict outcomes in deceased kidney transplantation, and places donor age as one of the most important pretransplant predictors of outcome [15,16,17]. Recipient BMI, gender and race are also influential predictors of outcome in the model, while wait time and HLA matching are much less associated with outcome. This is illustrated in the model graphic as donor age is a primary variable of, or shares an arc with, graft survival greater than 3 years (graft survival over3yr), while recipient gender (RSEX) and BMI (BMI) are secondary variables to a minimum of one year graft survival (graft survival 1yr). The influence of combinations of these factors on 1- and 3-year outcome can be seen explicitly in online supplementary table 1 (for all online suppl. material, see www.karger.com?doi=10.1159/000345552). BBN was able to weigh how each variable in the network, major or minor, influences each other to affect outcome, in contrast to other more traditional nomograms. In essence, BBN takes raw data in the form of individual probability distributions and refines that into a fluid network that accurately predicts renal transplant outcomes.
This model accurately predicts those donor-recipient matches that will have a poor 1-year outcome as illustrated by the sensitivity of 40%. Furthermore, for those donor-recipient pairs that were already a good match, our model did not overly predict them incorrectly as failures, as seen by the high specificity of 80%. In other words, our model would be able to reclaim or reallocate two fifths of the renal allografts that may have been lost in the first year due to a less than ideal recipient selection. Another benefit of the bayesian model is that, while individual variables such as recipient gender may not reliably predict outcome, these same variables populating a network can accurately predict graft outcomes.
While our model’s performance decreases slightly with time from transplant, it maintains a high survival predictive value (>87%) as well as a high specificity (>77%). As we used threshold values that would provide for at least 40% identification of failed grafts, we interpret this as correctly predicting those grafts that would fail while not incorrectly classifying good matches as failures. Although we are not able to capture all of the grafts that failed under the current allocation system, we are able to improve the identification of poor matches by 40% for 1-year and 3-year failures, and potentially warrant reallocation to another qualified recipient. Even though there is a low percent failure rate for cadaveric donor transplants within the first few years, avoiding graft failure in an additional 40% of those that failed within the first year translates into ∼500 additional grafts annually, and thus, a potentially significant reduction in the number of recipients returning to the wait list.
Several efforts have been made to better predict transplant outcomes in order to better allocate organs. An early study, using data from the Collaborative Transplant Study and multivariate analysis, created a computer model to predict 1-year graft survival [18]. Another study using artificial neural network resulted in a model with a positive predictive value for graft survival superior to that of the nomogram (82.1 vs. 43.5%) [3]. However, that study was limited to a single medical center. Yet another study used logistic regression and decision tree modeling to identify predictors of 3-year allograft survival using information from the UNOS database. This information was then used to create predictive models, which had a 76% positive predictive value for graft survival [4]. A similar study was done using tree-based modeling to predict 1, 3, 5, 7, and 10 years’ survival [5]. They showed good performance as measured by area under the ROC curve. However, none of these approaches have yet been widely adopted for organ allocation [19].
The practical effect of using the BBN as a decision making tool in renal allograft allocation may be multifold; less organ waste and reduced cold ischemic times. Currently, nearly 20% of all donor kidneys are discarded, the majority being of marginal quality [20]. However, with this model, the pretransplant characteristics of the particular donor and the proposed recipients can be compared for a prediction of outcome. In addition, as this model can be deployed in XML format, a center could enter the known donor characteristics into a web-based interface and compare the risk of failure based on each prospective recipient’s characteristics.
For example, with a kidney donation from a 44-year-old White male without a history of diabetes and with a BMI of 30, a 39-year-old White female without diabetes and with a BMI of 22 (each of these values, or evidence, is entered directly into the model) is associated with 83.45% probability of greater than 3 years’ graft survival. A 55-year-old Black male with non-insulin-dependent diabetes and a BMI of 29 is associated with 74.67% probability. Finally, a 39-year-old Black female with non-insulin-dependent diabetes and a BMI of 32 is associated with a 72.5% probability of graft survival over 3 years.
There are clear limitations to using an administrative dataset such as the USRDS. We examined a total of 793 pre- and posttransplant variables, which were ultimately narrowed to 52 variables based on clinical expertise, global modeling, and excluding those variables collected during follow-up appointments that did not directly describe outcome. We may have eliminated variables typically thought to have bearing on outcome, such as length of pretransplant dialysis and serum albumin, during the complexity minimization process. Other predictive models and nomograms, neural networks, and decision trees have become popular methods for creating more objective ways to predict transplant outcomes [3,5,12,13,14].
Our model with a sensitivity of approximately 40% and specificity of 80% in predicting transplant outcome for the first 3 years may not be impressive. However, our effort presents a work in which we have shown that the BBN may be superior to existing methods as it has the ability to handle incomplete variables and considers the importance of the combination of variables rather than the contribution of any single variable. Our modeling is one of the many possible means of improving donor-recipient pairing. Future studies will have to compare various neural networks to predict transplant outcomes.
We propose using our model to augment the current allocation system. The current UNOS system should continue to be used to generate the ‘short list’ of candidate recipients matching a particular donor organ. Our model would then be applied as a ‘mathematical equation’ that uses the donor’s and recipients’ information to determine which match would result in the longest-term outcome. Our hypothesis is that the proposed method of matching may have the potential to save more than 40% of grafts that fail within their first year. However, the next step towards implementation is to test the BBN prospectively within a single organ procurement organization before moving to a nationwide evaluation.
Acknowledgements
The views expressed in this work are those of the authors and do not reflect the official policy of the Department of the Army, Department of the Navy, the Department of Defense or the US Government.
Additionally, this work was supported in part by Health Resources and Services Administration contract 234-2005-37011C. The content is the responsibility of the authors alone and does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government.
We are military service members (or employee of the US Government). This work was prepared as part of our official duties. Title 17 USC 105 provides the ‘Copyright protection under this title is not available for any work of the United States Government’. Title 17 USC 101 defines a US Government work as a work prepared by a military service member or employee of the US Government as part of that person’s official duties.
This study was approved by the National Naval Medical Center Institutional Review Board in compliance with all Federal regulations governing the protection of human subjects. The NNMC IRB approved protocol number is NNMC.2010.0014, and the protocol title is ‘Bayesian Modeling of the United States Renal Data System Pre-Transplant Variables Accurately Predicts Graft Survival’.
The data reported here have been supplied by the United States Renal Data System (USRDS). The interpretation and reporting of these data are the responsibility of the author(s) and in no way should be seen as an official policy or interpretation of the US government.