Abstract
Background: Blood transfusion is a life-saving intervention for millions of recipients worldwide. Over the last 15 years, the advent of high-throughput, affordable omics technologies – including genomics, proteomics, lipidomics, and metabolomics – has allowed transfusion medicine to revisit the biology of blood donors, stored blood products, and transfusion recipients. Summary: Omics approaches have shed light on the genetic and non-genetic factors (environmental or other exposures) impacting the quality of stored blood products and efficacy of transfusion events, based on the current Food and Drug Administration guidelines (e.g., hemolysis and post-transfusion recovery for stored red blood cells). As a treasure trove of data accumulates, the implementation of machine learning approaches promises to revolutionize the field of transfusion medicine, not only by advancing basic science. Indeed, computational strategies have already been used to perform high-content screenings of red blood cell morphology in microfluidic devices, generate in silico models of erythrocyte membrane to predict deformability and bending rigidity, or design systems biology maps of the red blood cell metabolome to drive the development of novel storage additives. Key Message: In the near future, high-throughput testing of donor genomes via precision transfusion medicine arrays and metabolomics of all donated products will be able to inform the development and implementation of machine learning strategies that match, from vein to vein, donors, optimal processing strategies (additives, shelf life), and recipients, realizing the promise of personalized transfusion medicine.
Overview
In this review, we will summarize the most recent advances in the field of omics technologies and machine learning in transfusion. Owing to the author’s own limited expertise, this short review will not focus on the implementation of artificial intelligence in blood inventory management [1], recipient profiling [2], the prediction of recipient comorbidities, and outcomes in the emergency and acute care setting [3, 4]. Rather, we will focus on molecular aspects of blood storage, with an emphasis on packed red blood cell (RBC) products and the crossroads of omics characterization and deep learning strategies for the identification of markers of storage quality and transfusion outcomes [5]. The focus on RBCs is justified both by the disproportionately higher numbers of omics papers in the field of RBC storage biology compared to other products but also and foremost because of the disproportionately higher number of packed RBC units transfused daily in the USA (∼29,000 units, as opposed to the nearly 5,000 units of platelets and 6,500 units of plasma). While recent comprehensive reviews on omics and RBC storage are available in the literature [6, 7], the rapid evolution of the field has generated a novel, solid body of knowledge that complements and expands on the literature reviewed in those recent reports. As such, the main goal of this short review is to bridge that gap.
RBCs: More Than a Circulating Bag of Hemoglobin
RBCs are by far the most abundant cell in the human body: 25 out of 30 × 1012 human cells are RBCs [8, 9]. The mature erythrocyte contains ∼250–270 million molecules of hemoglobin (Hb)/cell, which facilitate the transport of up to 1 billion molecules of oxygen/cell [10]. RBC function, specifically Hb’s capacity to bind and off-load oxygen, is tightly regulated by metabolic mechanisms. These mechanisms involve direct allosteric modulators such as 2,3-diphosphoglycerate (DPG) and adenosine triphosphate (ATP) [11, 12]. Using a combination of omics technologies and single cell measurements of oxygen kinetics via microfluidic devices and biophysics approaches, we recently identified a whole new series of correlates to the Hb capacity to bind and off-load oxygen, including the indirect modulators adenosine, sphingosine-1-phosphate (S1P), recently validated in mechanistic studies on high-altitude hypoxia [13‒18] and RBC storage [19].
While Hb facilitates oxygen delivery, Hb autoxidation promotes oxidant stress in the RBC, with Hb iron (66% of bodily iron is in RBCs) fueling radical-generating Fenton and Haber-Weiss reactions. Specifically, RBC Hb is an oxygen-dependent buffer for glutathione – the main small molecule antioxidant in the cell [20] – through beta chain cysteine 93, which regulates recycling of key antioxidant enzymes like peroxiredoxin 2 [21]. Since no new proteins can be synthesized by mature RBCs, owing to the lack of nuclei and organelles, during their ∼120 day lifespan, RBCs have to cope with oxidant stress through mechanisms independent of gene regulation or protein expression [22]. Therefore, RBCs represent an excellent cell model to investigate metabolic responses to oxidant damage [10], which is relevant not just for storage biology but also for related iatrogenic interventions (e.g., RBC responses to hypoxia and oxidant stress upon transfusion in unhealthy, heterologous recipients).
Despite the lack of nuclei and organelles and the overwhelming abundance of a single class of proteins (Hb represents 98% of the cytosolic proteome), RBCs are not as simple as previously thought. Recent studies [23‒25], including some from our group [26], have elucidated an unanticipated complexity of the RBC protein machinery. Despite Hb making up ∼90% of the total RBC proteome, more than 3,000 proteins are identified in RBCs [23, 26], including 77 transporters for 267 small molecules [27]. By combining tracing experiments with 13C- and 2H-labeled substrates and functional proteomics assays, we – among other groups – are showing one by one (e.g., carboxylic acid metabolism [28‒31], fatty acid desaturases [32], nitric oxide synthase [33]), that these enzymes are not inert remnants of maturation processes but rather they are active in mature RBCs and catalyze reactions that were once thought to be exclusive to specific organs. Thus, we provocatively introduced the concept of RBCs as an organ with functions beyond O2 transport [10]. All the considerations above are relevant examples of how omics approaches can be used for basic science that unravels biology relevant to transfusion medicine, in that all the pathways mentioned above are critical regulators of stored RBC metabolism, as we will summarize below.
RBC Omics in Health and Disease Paves the Way for Mechanistic Understanding of the Storage Lesion
Altered RBC antioxidant capacity triggers intra- [34] or extra-vascular hemolysis [35, 36], resulting in excess circulating heme and iron [37], a phenomenon that underlies the etiology of many diseases in which oxidant stress plays a central role [37] and co-morbidities in transfusion recipients, such as acute lung injury, kidney dysfunction [38‒40], microbiome dysbiosis [41], or septic complications in patients infected by siderophilic bacteria [42]. As such, despite intrinsic limitations of animal models [43], comparative investigations of RBC metabolism in humans and animals – where alternative strategies to cope with such stress may have evolved to regulate hemolytic propensity – can thus further our understanding of the role of (transfused) RBCs in system physiology and its pathological alterations – other than having critical direct implications for veterinary transfusion medicine purposes [44‒47].
The identification of strategies to mitigate oxidant stress on RBCs holds immediate and critical biomedical implications. Indeed, RBC storage in the blood bank is a logistic necessity to make ∼110 million units/year available for blood transfusion, the most common in hospital procedure worldwide after vaccination. However, as storage progresses, RBCs undergo a series of biochemical and morphological modifications that are mostly triggered by oxidant stress [6, 7]. An overview of how omics technologies helped elucidate the targets of oxidant damage elicited by oxidant stress is provided below.
Omics in Transfusion Medicine
In the last 6 years, we and others have developed high-throughput proteomics and metabolomics methods – as rapid as 1 min per sample – to ensure feasibility of clinical omics studies [48‒51], which allowed rapid responses to rapid threats, such as the case of the COVID-19 pandemics and the impact of SARS-CoV-2 infection on RBC biology [52‒59]. In parallel, a burgeoning literature has emerged focusing on omics applications to transfusion medicine [6, 7]. Early omics studies in transfusion medicine have investigated the impact of RBC storage duration in all currently licensed storage additive solutions (AS), including SAGM, AS-1, 3, 5, PAGGSM, and novel alkaline additives, such as AS7, ESOL5, and PAGGGSM [29, 60‒65]. These studies revealed a significant impact of storage additives on the kinetics of the storage lesion, with alkaline additives better preserving energy metabolism by means of enhancement of the activity of pH-sensitive rate-limiting enzymes of glycolysis (phosphofructokinase) and the Rapoport-Luebering shunt (bisphosphoglycerate mutase) [66]. Based on these studies, we and others identified metabolic markers of the storage lesion [67, 68], and correlated storage-induced metabolic changes to the gold standard markers of storage quality as per the Food and Drug Administration guidelines, i.e., storage hemolysis and post-transfusion recovery (PTR) [34, 69‒74].
Through murine models of blood storage developed by Dr. Zimring [75], we identified the ferrireductase STEAP3 as a genetic and mechanistic factor contributing to the poor post-transfusion recovery of stored RBCs from mouse strains like FVB but not C57BL/6 [76]. Specifically, lipidomics studies have shown that iron metabolism in stored RBCs is tied to an elevation in oxidant stress, ultimately triggering lipid peroxidation and initiating a process that closely resembles that of iron-induced non-apoptotic cell death, or ferroptosis [77]. Of note, STEAP3, also known as tumor suppressor p53-activated pathway-6 (TSAP6), is a target of p53 transcriptional activity [78]. Both p53 and STEAP3 regulate erythropoiesis and limit maturing erythroid cell iron content [79‒81], thus constraining substrate availability for one key reactant in Fenton chemistry. Even more interestingly, the rate of detoxification of oxidized lipids (and ferroptosis) is regulated by glutathione peroxidase 4 (GPX4) – a glutathione-dependent enzyme that is expressed in mature RBCs and correlates with hemolytic propensity [82, 83].
One of the main targets of oxidant stress in stored RBCs is the most abundant membrane protein – anion exchanger 1 (AE1) – with 1 million copies/cell [24]. AE1, also known as band 3, regulates the chloride/bicarbonate exchange (chloride shift), critical to CO2 and pH homeostasis [84] and thus pH-dependent activity of metabolic enzymes. AE1 has pleiotropic functions mediated by structural interactions [85‒87]. The N-terminal AE1 residues 1–23 serve as a docking site for deoxygenated Hb under hypoxia and glycolytic enzymes at high oxygen saturation [85‒87]. Under pro-oxidant (high SO2) conditions, AE1 binding inhibits glycolytic enzymes and promotes fluxes via the pentose phosphate pathway (PPP), which generates NADPH, a reducing cofactor necessary for recycling of oxidized glutathione and NADPH-dependent antioxidant enzymes [22]. Under hypoxia, deoxyHb binding displaces glycolytic enzymes from AE1, favoring glycolysis and the generation of allosteric modulators ATP and DPG, and thus oxygen off-loading [19, 88]. This O2-dependent metabolic “switch” (hereon, “AE1-Hb switch”) favors anti-oxidant metabolism when oxidant stress is high (e.g., during storage in the blood bank), energy metabolism, and O2 release when oxidant stress is low (e.g., in peripheral tissues or in response to high altitude). Through omics technologies, it has been shown that oxidant stress [60, 89] and protease activity [90] promote the fragmentation of the AE-1 N-terminus, impairing the capacity to cope with storage-induced oxidant stress – to the extent that RBCs from mice lacking the AE1 N-terminus store poorly, while treatment with a membrane-permeable AE11-57 peptide partially rescues the phenotype [74]. Indeed, we proved that oxidation and fragmentation of the cytosolic N-terminus of AE1 impair the capacity to bind glyceraldehyde 3-phosphate dehydrogenase and inhibit glycolysis [74], thereby limiting the capacity to activate the PPP [74, 91], a critical pathway for the generation of the cofactor NADPH, which is essential in many reducing reactions (for example, recycling of oxidized glutathione) [22]. This process is in part counteracted by the activation of protein L-isoaspartyl o-methyltransferase [92, 93] and by the compensatory activity of the rate-limiting enzyme of the PPP, glucose 6-phosphate dehydrogenase (G6PD). We then showed that G6PD polymorphisms are common in the donor population – as they are across all humans (∼6% of mankind, 500 million people around the world [94]) – and impact storage quality and transfusion efficacy [73, 95‒97].
Blood Donors as a Window on Population Health: Toward Personalized (Transfusion) Medicine
While most of the studies described above focused on relatively small-scale studies in humans and animal models, the advent of high-throughput technologies affords much broader investigations to characterize the blood donor population with sample sets in the tens to hundreds of thousands [98]. For example, the US National Heart, Lung, and Blood Institute-sponsored Recipient Epidemiology and Donor Evaluation Study (REDS) has recently leveraged genomics approaches to characterize 879,000 polymorphisms from >13,000 donor volunteers[99]. Of note, through genomics studies, almost all of the enzymes mentioned in the previous paragraph were found to be polymorphic in the REDS RBC Omics blood donor population, and functional single nucleotide polymorphisms were associated with the hemolytic propensity of stored blood [82] and Hb increments in transfusion recipients [100].
A subset of the donors from the screening phase of the REDS study (n = 643) were identified as extreme hemolyzers (either spontaneous hemolysis, or upon oxidative and osmotic insults) and asked to donate a second unit of blood for multi-omics testing at storage day 10, 23, and 42 – a dataset that was in part disseminated through pilot metabolomics studies on a subset of these samples (∼600 out of a total of ∼2,000 – with the whole cohort of index and recalled donor just recently tested by our group). As proof of principle for cross-omics analyses from large transfusion medicine cohorts, we performed a pilot metabolite quantitative trait loci – mQTL analysis on just 250 of the 13K donors. We identified 2,831 high-confidence SNP-metabolite linkages (p < 5.0 × 10−8) [101]. We thus generated novel murine models for one of the most significant polymorphisms – G6PD African (V68M/N126D) and Mediterranean (S188F) variants – for future functional studies relevant to transfusion medicine and hematology [101].
Through genomics and metabolomics data in the REDS RBC Omics cohort, it has been noted that biological factors such as donor sex, sex hormones, age, body mass index contribute to storage quality [32, 73, 95‒97, 102], as gleaned by hemolytic propensity [103], omics phenotypes [34], and Hb increments in recipients of units donated by the same donor volunteers [100, 104]. We then showed that female donors have younger circulating RBCs at the time of donation, which are more resistant to oxidant challenge in storage and have more resilient energy and antioxidant metabolism during storage [105]. Later, we introduced the concept of the blood donor exposome [51], showing that dietary (e.g., alcohol, caffeine consumption [106, 107]) or other exposures (e.g., smoking [108], exercise [109, 110], diet [111], prior infection by flaviviruses [112] or coronaviruses [54, 113], drugs that are not grounds for donor deferral [51]) all contribute to modulating stored RBC energy and redox metabolism, ultimately impacting Hb increments upon transfusion [100, 104]. We concluded that the chronological age of blood – in terms of days elapsed since the time of donation – is qualitatively distinct from the metabolic age of the unit – the “molecular storage lesion” [114]. We then tested alternative storage strategies (cryopreservation [115], rejuvenation [116], hypoxic storage [16, 30, 117, 118]), as well as leveraged a novel combination of high-throughput omics and 96-well plate-based scaled storage systems to develop novel additives [19, 119].
While currently transfusion of RBC relies on the altruistic gift of an estimated 6.8 million volunteers in the USA alone every year, omics studies have been preliminary used to inform or validate the development of novel bio-inspired synthetic blood products [120] or the ex vivo expansion and differentiation of hematopoietic stem cells [23, 121], as well as the testing of RBCs after hypotonic dialysis-based drug encapsulation process of enzymes for therapeutics intervention in the oncological setting [122].
From Blood Donors to Recipients
On top of the focus on the blood donors and products, omics technologies in transfusion medicine have been used to investigate transfusion recipients, either healthy autologous volunteers [123] or non-healthy heterologous recipients – with a special focus on trauma patients (massively transfused [124]) and sickle cell patients, either receiving standard red cell exchanges [125] or rejuvenated red cell exchanges [126]. With a combination of biotinylation studies and single cell oxygenation strategies, we determined that the storage lesion impacts RBC oxygen kinetics (transport and off-loading [116, 127]), which is only partially reversible upon transfusion of RBCs in healthy autologous recipients [128]. Indeed, at the net of the survival bias impacting the readout, omics data generated on stored biotinylated RBCs – in time course studies where the transfused RBCs were flow-sorted out of the bloodstream of the recipient – showed that while some of the metabolic storage lesion (e.g., low ATP and DPG levels) can be partially restored over time in circulation (at kinetics that may be too slow for example in the hypoxic critically ill patient), while other pathways – such as the PPP – are not restored at all [128].
Through (in vivo) metabolic tracing [129, 130] in tractable animal models (e.g., rodents, including mice and rats; porcine or non-human primates) [130‒142], we are now using omics technologies to investigate the role of genetic background across multiple species as a driver of the storage lesion (the Zoomics project) [44‒47], as well as the impact of transfusion in pre-clinical models [134, 138, 140, 142, 143]. In parallel, we are exploring recipient outcomes in massively (e.g., trauma patients) [131, 140, 144] or chronically transfused patients (e.g., sickle cell patients) [17, 125, 145‒150].
In parallel, we are exploring the role of genetic abnormalities that can result in the need for transfusion, such as those linked to hemoglobinopathies (e.g., sickle cell trait, sickle cell disease; beta-thalassemia) [147, 149, 151‒153] or inborn errors of metabolism beyond G6PD deficiency (e.g., pyruvate kinase deficiency, propionic acidemia) [154, 155], especially in under-represented populations in science (e.g., the Amish Mennonites and other populations in rural America [154, 155]; individuals with Down syndrome [156, 157], transgender individuals during gender reassignment therapy with sex hormones [158], or hypogonadic individuals who develop erythrocytosis and donate blood as a strategy to counteract such effects of hormonal therapies [159]).
Machine Learning: From Biomarkers to Systems Biology to In Silico Models of RBC Membranes
We mentioned above the use of machine learning approaches (including random forest and lasso regression) to identify markers of RBC storage quality and transfusion performance in healthy autologous volunteers and trauma patients [67, 68, 117, 124]. More recently, machine learning approaches have been coupled to high-content screening that leverages microfluidic devices and microscopy in the presence or absence of flow to determine the impact of a given treatment (storage or even SARS-CoV-2 infection) on RBC morphology in unperturbed or perturbed systems [5, 113]. Indeed, RBC morphology is altered as a function of damage to energy and redox metabolism, which ultimately results in vesiculation of oxidized components [64], loss of the discocytic phenotype and acquisition of progressively irreversibly altered morphologies (spheroechinocyte, spherocyte [160]), increased surface-to-volume ratios, increased mechanical fragility, and ultimately, sequestration in the spleen that triggers extravascular hemolysis [36, 161], and erythrophagocytosis [162]. Of note, a combination of RBC membrane protein and lipid composition as a function of storage can be fed into novel in silico models of the erythrocyte membrane, which can be used to predict membrane rigidity and deformability in fresh and stored RBCs, and predict actual direct biophysical measurements of RBC mechanics [163, 164]. Feeding of quantitative and functional metabolomics (e.g., tracing experiments) and proteomics data to computational models of RBC metabolic pathways has fueled recent progress in the field of systems biology of the RBC [27, 28, 165, 166], which can be used to drive the development of novel storage solutions [167] or simply investigate RBC aging in the bloodstream in vivo [168].
Application of machine learning and artificial intelligence approaches in healthcare will face key challenges, as reviewed extensively elsewhere [169]. One limitation of most existing studies in this space to date has been that benchmarking of machine learning algorithms has mostly been performed on retrospective data from large, already available databases. Performance of these models is likely to suffer when tested against real-world data from prospective studies as opposed to historically labeled data used for algorithm training. Another challenge that this field will encounter is based on the appreciation that machine learning algorithms will use input signals to achieve the best possible (prediction) performance in the dataset used; algorithms may thus end up exploiting signals from unknown confounders that may not be reliable, impairing the algorithm’s ability to generalize to new datasets [169]. Depending on training sets, algorithms may develop implicit bias, which would limit their fitness for generalization and the accuracy of clinical predictions. The development of robust regulation and a rigorous quality control strategy will be a challenge facing this new, yet promising field.
Despite early success in the above-mentioned applications, the machine learning area that holds the strongest promise is that of personalized transfusion medicine approaches. The opportunity is there to match donors and recipients based on a treasure trove of data that can now be generated, in a cost-effective fashion, with ultra-high-throughput omics approaches. It is possible to imagine a not-so-distant future when artificial intelligence-based approaches could quickly identify optimal storage additives for blood based on donor biology and omics signatures at donation, shorten or extend the shelf-life of the product accordingly, or make indications for a specific recipient (matching the right product for acutely bleeding, massively transfused trauma patient vs. chronically transfused patient with hemoglobinopathies). These features could be made possible by the implementation in central lab testing practices of blood donor genomics characterization with precision transfusion medicine arrays [99] (including blood group antigen, rare polymorphisms associated with stored blood quality), along with metabolic characterization of non-genetic factors, e.g., metabolites of dietary or other exposures, from habits like smoking all the way to prescription drugs that are not grounds for blood donor deferral – in combination with donor and recipient demographics and clinical records.
Acknowledgments
The title is loosely inspired by Ray Kurzweil’s book “Singularity is near,” a 2005 non-fiction book (ISBN: 978-0-670-03384-3) about artificial intelligence and the future of humanity.
Conflict of Interest Statement
A.D. is a founder of Omix Technologies Inc. and Altis Biosciences LLC, a consultant for Hemanext Inc., Macopharma Inc., and Forma Therapeutics.
Funding Sources
A.D. was supported by funds from the National Institute of General and Medical Sciences (RM1GM131968) and the National Heart, Lung, and Blood Institute (R01HL146442, R01HL149714, R01HL148151, R01HL161004, R21HL150032). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author Contributions
A.D. wrote the manuscript.