Abstract
In fetal cardiology, imaging (especially echocardiography) has demonstrated to help in the diagnosis and monitoring of fetuses with a compromised cardiovascular system potentially associated with several fetal conditions. Different ultrasound approaches are currently used to evaluate fetal cardiac structure and function, including conventional 2-D imaging and M-mode and tissue Doppler imaging among others. However, assessment of the fetal heart is still challenging mainly due to involuntary movements of the fetus, the small size of the heart, and the lack of expertise in fetal echocardiography of some sonographers. Therefore, the use of new technologies to improve the primary acquired images, to help extract measurements, or to aid in the diagnosis of cardiac abnormalities is of great importance for optimal assessment of the fetal heart. Machine leaning (ML) is a computer science discipline focused on teaching a computer to perform tasks with specific goals without explicitly programming the rules on how to perform this task. In this review we provide a brief overview on the potential of ML techniques to improve the evaluation of fetal cardiac function by optimizing image acquisition and quantification/segmentation, as well as aid in improving the prenatal diagnoses of fetal cardiac remodeling and abnormalities.
Introduction
Fetal echocardiography was introduced to assess fetal cardiac function only 15 years ago (the first study was performed in 2004). It has evolved from the description of cardiac anatomical abnormalities toward quantitative assessment of cardiac dimensions, shape, and function and has been demonstrated to be useful in the diagnosis and monitoring of fetuses with a compromised cardiovascular system related to several fetal conditions, such as intrauterine growth restriction (IUGR), twin-to-twin transfusion syndrome, and congenital heart disease [1-3]. Moreover, some cardiac parameters have already shown to be helpful in predicting perinatal problems and long-term cardiovascular outcomes [4].
Different ultrasound (US) approaches are currently used to evaluate fetal cardiac function, including conventional 2-D imaging, M-mode, blood-pool and tissue Doppler imaging, 2-D speckle tracking, and 4-D spatiotemporal imaging correlation [4, 5]. For any evaluation, an optimal image of the fetal heart is crucial to adequately assess cardiac structure and function. However, assessing fetal cardiac function is still challenging due to involuntary movements of the fetus, the small size of the heart, the high heart rate, the limited access to the fetus, and the lack of expertise in fetal echocardiography of some sonographers. After having obtained an optimal image, measurements have to be performed in order to extract relevant cardiac features that relate to remodeling and functional status. Currently, these are mainly carried out manually by the sonographer, either during the investigation or offline using a dedicated workstation. Therefore, the use of new technologies to improve the primary acquired images or help extract and standardize measurements is of great importance for optimal assessment of the fetal heart.
Machine learning (ML) is a computer science discipline focused on teaching a computer to perform tasks, with a specific goal in mind, without explicitly programming the rules on how to perform this task. Mathematically speaking, learning occurs when a computer iteratively improves its performance on the given task (e.g., classification of a disease or estimation of clinical measurements) with experience or, in other words, when it is exposed to data [6]. Usually, ML algorithms are classified into 2 approaches: supervised and unsupervised learning algorithms (Fig. 1). Deep learning (DL), a popular algorithm (and often thought of when the term machine learning is used) is just a subset of ML that uses a layered structure of calculations known as artificial neural networks (ANN) on unstructured data. Figure 2 illustrates the typical pipeline for both supervised and unsupervised learning algorithms. Supervised learning requires explicit ground truth goals (diagnostic labels, outcomes, reference image measurements, etc.) from which the algorithm can optimize its performance during training. Supervised learning algorithms can be further classified into classification and regression (Fig. 1). Classification techniques evaluate the given input and come up with a category such as “red” or “blue” or “disease” or “nondisease,” while regression techniques result in a continuous output: the value of the predicted quantity (such as the probability of a diagnosis). Besides DL, the most common classification algorithms include decision trees, support vector machines (SVM), etc., while linear and logistic regressions are typical regression algorithms (Fig. 1). On the other hand, unsupervised learning algorithms receive unlabeled examples and aim at discovering main patterns or similarities in the data, which would correspond to different disease manifestations or different phenotypes within a given disease, or different temporal evolution. Consequently, supervised learning is commonly used when the final goal is well known at the time of learning and unsupervised learning is used as an exploratory tool and usually the final goal follows from the analysis of the obtained results. Unsupervised learning algorithms can be further classified into clustering and dimensionality reduction as illustrated in Figure 1. Typical clustering algorithms include K-means or Gaussian mixture models, while principal component analysis and linear discriminant analysis are classical dimensionality reduction techniques.
Once ML models are trained, their performance on unseen data (referred to as the test set) is known as the model’s generalizability (Fig. 2). Models that perform considerably better on the training set compared to the test set are overfitted, which means that they have a strong adherence to the training cases, but new patients are not correctly handled. Finding a good balance between training and testing performance is thus crucial for the application of ML models in clinical settings. A related highly relevant risk when using ML for clinical decision making is how to deal with, and not miss, rare occurrences in the (testing) data that were underrepresented in the training dataset. To circumvent this risk, ML approaches (especially supervised ones) need to be trained with a dataset that sufficiently captures the phenomenon under study. For clinical decision making, an unsupervised approach that highlights these rare instances might therefore be better compared to a supervised one that forces decisions towards what was trained for. In order to learn more about ML concepts, we refer the reader to the review paper by Deo [7].
ML techniques can help to optimize image acquisition protocols, thus reducing the acquisition time and ensuring optimal quality, and they can help to extract comprehensive and standardized information for a better evaluation of cardiac function. In this review we provide a brief overview on ML/DL applications in obstetrics, with a particular focus on the evaluation of fetal cardiac function by optimization of image acquisition and quantification/segmentation, and aid in improving the prenatal diagnoses of fetal cardiac remodeling and abnormalities.
ML for Data Acquisition
Image acquisition is the first step towards building a system to optimize the characterization of fetal cardiac function. This step is of capital importance, as the extracted information will be greatly conditioned by the intrinsic quality and amount of input data. Acquisition of the best standard fetal views is labor intensive and relies on the sonographer’s experience. The resulting interoperator variability in image acquisition hampers individual temporal follow-up or the combination of different data sources for research purposes. In this sense, ML-powered acquisition methods to speed up the acquisition, decrease the learning curve, and standardize the resulting images seem highly desirable, as they promise to boost data quality and standardization with minimal human intervention.
The improvement of image acquisition using ML is based on evaluation of the current (2-D/3-D) image on the screen by scoring how closely it resembles the type of view that was intended. This view was learned during a training phase (without explicitly defining the image appearance or content; this is learned by the algorithm). Many ML approaches can be used, but DL, using ANN, seems the most promising.
The acquisition of the fetal facial standard plane is a requisite for extracting biometric measurements and making a diagnosis during US examination. Lei et al. [8] automated this task with a SVM classifier. More recently, Yu et al. [9] leveraged the power of deep convolutional neural networks (CNN) to automatically recognize the fetal facial standard plane during routine US examination. Another standard plane is that of the fetal abdominal region, which allows measurement of the abdominal circumference (AC) and estimation of the fetal weight as a proxy for fetal health. CNN have already been trained to automatically find the abdominal region in a US image and then determine the image quality by assessing the goodness of depiction for key structures such as the stomach bubble and the umbilical vein [10]. In a similar fashion, Rahmutallah et al. [11, 12] trained an adaptive boosting (AdaBoost) model to detect these 2 structures in 2-D US images for the purpose of scoring image quality. Other ensemble approaches have been proposed to categorize unlabeled fetal 2D US images. In particular, Yaqub et al. [13] used a random forest (RF) classifier to detect meaningful structures from different regions inside the images. A more ambitious project using CNN targeted the classification of a broader collection of fetal images planes by automatic recognition of 14 different fetal structures in 2-D US images [14]. Concerning 3-D fetal US, Raynaud et al. [15] proposed an ensemble of DL for feature extraction and RF for classification of organs with the purpose of automatically encoding anatomical variability while discarding the fetus pose.
Detection of the standard scan plane in fetal brain US is an essential step in the assessment of fetal development. This task was achieved by Li et al. [16] using a CNN approach in 3-D fetal US. Concerning quality control, Yaqub et al. [17] proposed a DL solution that automatically assessed whether transventricular 2-D US images of the fetal brain met clinical standards. Namely, they first localized the fetal brain, then detected the regions of interest, and finally learned the US patterns that enable plane verification. ML techniques have also been used to automatically identify the transthalamic plane in 3-D US to then assess brain biometrics such as the fetal biparietal diameter and head circumference (HC) [18].
Specific studies involving ML techniques for imaging the fetal heart are still scarce. Among the few examples found, Bridge et al. [19] implemented a framework for tracking the key variables appearing in freehand 2-D US scanning videos of the healthy fetal heart through the use of regression forests. Concerning the electrical activity, Yu et al. [20] used independent component analysis and Muduli et al. [21] used DL to reconstruct the fetal electrocardiogram from abdominal ECG recordings. A next step towards automating the fetal US scanning consists of coupling the image plane/volume recognition with a robot arm that performs the scanning, which was pioneered by Wang et al. [22].
ML approaches for improved fetal data acquisition are already a reality in research settings and they are expected to become clinically available in the short term (5 years). In the midterm, ML techniques may be combined with robotics to automatically extract standardized fetal imaging views.
ML for Image Quantification and Feature Extraction
Fetal biometric parameters such as HC, biparietal diameter, AC, femur length, and thickness of nuchal translucency are commonly used for estimation of fetal weight, gestational age (GA) and detection of fetal abnormalities during prenatal US examinations. An accurate estimation of fetal weight and GA is essential to detect any abnormal fetal growth pattern, such as small or large for GA, IUGR, or cardiac abnormalities. Kim et al. [23] recently published a DL model to automatically calculate the HC together with the biparietal diameter from 2-D US images. A different approach was used by Li et al. [24], who first used RF to localize the fetal head and then ellipse fitting to estimate the HC from 2-D US images. van Den Heuvel et al. [25] went a step further and implemented a DL model that calculated HC from obstetric sweep protocol data. These data likely do not contain standard planes, and thus their method has great potential for application in resource-constrained countries, where there is a lack of skilled obstetricians. Lorenz et al. [26] recently published a pipeline combining RF, shape models, and CNN to automatically perform view recognition and anatomical landmark location, with the objective of measuring the AC from 3-D US recordings. Similarly, Kim et al. [27] used a CNN to estimate AC from 2-D US data. For further information on biometric measurements, we refer the reader to a recent review of automated techniques for the interpretation of fetal abnormalities [28].
ML methods have been proposed in the last decades to improve the estimation of GA in women with uncertain or unknown menstrual dates [29] and to improve the estimation of fetal weight during gestation. For example, Ashley et al. [30] explored whether data available at birth can be used to accurately predict the estimated fetal weight over the course of gestation using different ML methods such as RF or regression trees in a database of more than 10,000 normal and high-risk pregnancies. The authors found that ML algorithms estimate fetal weight better than other commonly used methods. Chuang et al. [31] developed an ANN to estimate fetal weight using morphometric data from 991 fetuses, reporting a mean absolute error of 6.15%.
Apart from measuring fetal biometrics and estimating fetal weight, recent ML approaches have been geared towards segmentation to identify fetal structures and organs to timely find fetal abnormalities so that necessary action can be taken. Namburete et al. [32] used an RF classifier to segment cranial pixels in 2-D US images. More recently, Li et al. [33] used a DL approach to automatically segment the fetal body and amniotic fluid from 2-D US data. Other examples of DL for segmentation have targeted the fetal brain and lungs [34, 35] and these 2 organs plus the placenta and the maternal kidneys from magnetic resonance imaging [36]. Lastly, an ensemble of decision trees has been used to automatically segment fetal brain structures in 3-D US images [37].
Concerning the fetal heart, the bulk of research focuses on automatically measuring the heartbeat. Some examples are the detection of cardiac activity from a predefined free-hand US sweep of the maternal abdomen using a classification model [38], extraction of the fetal heart rate from cardiotocograms (CTG) using dimensionality reduction [39], or measuring fetal QRS complexes from maternal ECG recordings using ANN [40]. More recently, Sulas et al. [41] used ANN to detect heart beats from pulse-wave Doppler envelope signals extracted from B-mode videos. For more on measuring cardiac activity from fetal US using ML techniques, the reader might be interested in the review paper by Alnuaimi et al. [42].
The application of ML algorithms to extract features from fetal echocardiographic data is already being used in some high-end scanners, in particular for the calculation of pulsatility indices from peripheral blood flow recordings. This is expected to be translated to cardiac flows soon. In the midterm, these scanners will also estimate GA and assess fetal growth based on automatic extraction of the different biometric measurements discussed above.
ML for Fetal Diagnosis
Prenatal diagnosis of fetal abnormalities has greatly benefited from advances in US technology and, in the last years, also from the advances in ML. ML algorithms have been used in different applications within fetal US medicine such as to predict preterm births [43, 44], the risk of euploidy, trisomy 21, and other chromosomal aneuploidies [45] or prediction of perinatal outcomes on asymptomatic short cervical length [46] among others. Regarding fetal cardiology, one of the subfields in which ML has been extensively applied in the last decades is improvement of the diagnosis of fetal hypoxia or acidemia based on the analysis of CTG. CTG is routinely used to record and monitor the fetal heart rate and uterine contractions during the antepartum and intrapartum periods in order to detect the symptoms of fetal distress as early as possible. In clinical practice, CTG traces are visually examined by clinicians and their interpretation is largely dependent on the clinician’s expertise, leading to high inter- and intraobserver variability. Therefore, despite the existence of standardized guidelines, the accuracy and robustness of CTG to improve prenatal outcomes remain controversial. The use of ML to improve the predictive capacity of CTG recordings was first presented by Bassil et al. [47] in the late 1980s. Since then, several attempts have been made to increase the effectiveness of the automatic evaluation of CTG traced using different ML and DL methods including ANN, SVM, and RF among others. Most of the publications have used 2 different open-access CTG databases to evaluate their proposed ML algorithms, i.e., one from the University Hospital in Brno (Czech Republic), including 552 CTG recordings [48], and another from the University of Porto (Portugal), which includes 2,126 CTG recordings [49]. We have summarized the publications on the use of ML in the analysis of CTG for the last 10 years in online supplementary Table S1 (for all online suppl. material see www.karger.com/doi/10.1159/000505021). For a review of older publications, we refer the reader to the review of Graham et al. [50]. The best results were obtained by Iraji et al. [51] using the Portuguese database showing an accuracy of 99.5%. There have also been some attempts to translate this into clinical practice via the development of software such as Infant, PeriCALM [52, 53], and Foetos [54] or the development of mobile/website applications [55, 56] to provide additional support in the interpretation of CTG signals and therefore to improve the assessment of fetal status. However, there is no evidence on whether these systems really improve the prediction of fetal distress or acidemia compared to visual CTG interpretation alone, and reports about their clinical performance were not found. In a recent systematic review, the degree of interobserver reliability between human and ML interpretations of CTG signals was determined [57], and it was concluded that the use of ML for interpretation of CTG during labor does not improve neonatal outcomes and has yet to prove its reliability relative to expert observers. The root of the problem may be that any supervised ML-based system needs to be trained with human annotations and, given that the benefit of CTG themselves for labor monitoring has not been clearly demonstrated, it is not surprising that adding an automatic system to evaluate CTG signals with similar information does not offer advantages in reducing adverse perinatal outcomes.
IUGR, which affects about 10% of pregnancies, has been associated with cardiac remodeling in utero that can persists postnatally [58-60]. Early detection of IUGR can improve the perinatal outcomes of these fetuses and reduce the risk of cardiovascular mortality in adulthood. The first study proposing the use of ML for the detection of IUGR using biometric data was presented by Gurgen et al. [61] in 1997. In that study an ANN was implemented to approximate the growth curves of fetuses showing an accuracy of 95% in the detection of IUGR. Later, Magenes et al. [62] proposed an SVM to detect IUGR using CTG data and showed good classification results in a cohort of 70 fetuses. In 2014, Gadagkar et al. [63] developed an ANN system for the diagnosis of IUGR using only 2-D US morphometric measurements from almost 300 fetuses, and they had results similar to those obtained clinically in the same study population. Similarly, Rawat et al. [64] implemented an ANN model using again 2-D US morphometric measurements from a total of 120 fetuses. Recently, Kuhle et al. [65] compared different ML methods to predict fetal growth abnormalities in a cohort of more than 30,000 patients. However, the authors reported that the ML methods used did not offer any advantage over logistic regression in the prediction of fetal growth abnormalities. The main limitation of all of these studies is that the detection of IUGR was performed considering only morphometric data, which only provide information about the fetal weight, without considering any other data such as blood flow velocities or cardiac deformation measured by Doppler or B-mode US, respectively. It is known that IUGR fetuses show abnormal blood flow patterns in the fetal circulation detected by Doppler US [66, 67] and also signs of longitudinal systolic dysfunction [58]. It was recently demonstrated that unsupervised ML algorithms using both echocardiographic (including myocardial strain traces) and clinical data can be used to find groups of similar patients within a heart failure cohort and identify individuals with a beneficial response to cardiac resynchronization therapy [68]. A similar approach integrating clinical and heterogeneous echocardiographic data could be implemented to improve the detection of IUGR fetuses, identify those at high risk of adverse perinatal outcomes, and aid clinicians in finding optimal treatment strategies. However, ML methods require a large number of patients during training in order to be able to capture the range of possible abnormalities, which is a limitation in fetal medicine as the number of patients is scarce. One possibility to overcome this limitation is to combine ML with “data augmentation” through physiological computational modelling as proposed by Hoodbhoy et al. [69]. Lumped models of the fetal circulation have demonstrated to be able to realistically simulate the hemodynamics of the fetus in many different conditions [66, 67, 70], thus providing virtual, but physiologically plausible, Doppler traces. Using these models, virtual patient populations can be created where the ratio of abnormal/normal cases can be increased so that the learning of the ML algorithms is less dependent on the data provided.
Finally, ML has been recently applied to improve the prenatal diagnosis of congenital heart diseases. Yeo et al. [71] presented an intelligent navigation method called FINE to automatically obtain different echocardiography anatomical views of the fetal heart and identify abnormalities within the cardiac anatomy. The tool was able to demonstrate evidence of abnormal fetal cardiac anatomy in 4 abnormal cases [71]. More recently, Arnaout et al. [72] proposed the use of a fully convolutional DL method in a supervised manner to: (1) identify the 5 most important views of the fetal heart, (2) segment and measure the cardiac structures, and (3) distinguish between normal hearts and tetralogy of Fallot and hypoplastic left heart syndrome using 685 echocardiograms from fetuses from 18 to 24 weeks of GA [72]. The best results were obtained in the diagnosis of hypoplastic left heart syndrome versus normality with a sensitivity and specificity of 100 and 90%, respectively. Although the results look promising, one of the main limitations of this study is that only 2 congenital heart diseases were evaluated and the DL system was only trained with images from 1 US machine without considering the variability in echocardiographs. Therefore, further studies with bigger datasets from different US machines need to be performed.
Conclusions
Given that ML approaches have become ubiquitous in our daily lives, they will become more and more integrated in clinical practice and in the assessment of the fetal heart. It is important to distinguish the different tasks involved in clinical decision making to understand how, and which type of, ML can be optimally employed. To obtain the best image quality in the shortest possible time and with the smallest learning curve, as well as for standardized extraction of specific measurements from the images, ML approaches based on DL have shown great promise and are currently being implemented in high-end clinical scanners. However, when the diagnostic interpretation is performed, and especially when a treatment decision needs to be made, the “black-box” approach inherent to, for example, DL becomes problematic given its dependence on a large and very inclusive dataset with correct clinical labels and the inherent difficulty of providing an intuitive clinical explanation for the proposed decision. Here, other ML approaches, based on for example the identification of individuals with similar (complex and multimodal) clinical data and imaging features, seem more promising and are explored in different centers.
Therefore, when carefully used and validated, and taking into account all privacy, security, and auditing measures relevant for the use of clinical data, ML can play an important role in standardization of fetal cardiac data and provide support in the clinical interpretation and suggestion of the best preventive and interventional approach to optimize perinatal as well as long-term cardiovascular health (Fig. 3).
Disclosure Statement
The authors have no conflict of interests to declare.
Funding Sources
This project was partially funded by the “la Caixa” Foundation under grant agreement LCF/PR/GN14/10270005, the Instituto de Salud Carlos III (PI17/00675) integrados en el Plan Nacional de I+D+I y cofinanciados por el ISCIII-Subdirección General de Evaluación y el Fondo Europeo de Desarrollo Regional (FEDER) “Una manera de hacer Europa”, the Centro de Investigación Biomédica en Red de Enfermedades Raras (ERPR04G719/2016), the Cerebra Foundation for the Brain-Injured Child (Carmarthen, Wales, UK), and AGAUR 2017 SGR grant No. 1531.
Author Contributions
Patricia Garcia-Canadilla, Sergio Sanchez-Martinez, and Bart Bijnens participated in the conception and design of this review, drafted this work, approved the final version, and agreed on the accuracy and integrity of this work.
Fatima Crispi participated in the design of this review, revised it critically for important intellectual content, approved the final version, and agreed on the accuracy and integrity of this work.