Introduction: Ultrasonography in the first trimester of pregnancy offers an early screening tool to identify high risk pregnancies. Artificial intelligence (AI) algorithms have the potential to improve the accuracy of diagnosis and assist the clinician in early risk stratification. Objective: The objective of the study was to conduct a systematic review of the use of AI in imaging in the first trimester of pregnancy. Methods: We conducted a systematic literature review by searching in computerized databases PubMed, Embase, and Google Scholar from inception to January 2024. Full-text peer-reviewed journal publications written in English on the evaluation of AI in first-trimester pregnancy imaging were included. Review papers, conference abstracts, posters, animal studies, non-English and non-peer-reviewed articles were excluded. Risk of bias was assessed by using PROBAST. Results: Of the 1,595 non-duplicated records screened, 27 studies were included. Twelve studies focussed on segmentation, 8 on plane detection, 6 on image classification, and one on both segmentation and classification. Five studies included fetuses with a gestational age of less than 10 weeks. The size of the datasets was relatively small as 16 studies included less than 1,000 cases. The models were evaluated by different metrics. Duration to run the algorithm was reported in 12 publications and ranged between less than one second and 14 min. Only one study was externally validated. Conclusion: Even though the included algorithms reported a good performance in a research setting on testing datasets, further research and collaboration between AI experts and clinicians is needed before implementation in clinical practice.

Ultrasonography in the first trimester of pregnancy aims to confirm viability, gestational age, location and implantation of the gestational sac, fetal anatomy, aneuploidy screening, chorionicity and amnionicity in case of multiple pregnancy [1]. It offers a unique opportunity for early risk stratification and flagging potential high-risk pregnancies. However, ultrasonography remains highly operator dependent.

Artificial intelligence (AI) technology has made exciting advances in the recent years. Medical imaging in particular is an interesting field of research for AI as imaging data can be collected during routine practice and large datasets can be compiled. AI algorithms can be employed for various tasks in fetal imaging, specifically segmentation, detection, and classification. Segmentation is the task of extracting a region of interest in an ultrasound (US) image by detecting pixels that belong to that region, which could be an anatomical structure or an organ. The automated detection of a standardized plane or anatomical region could support the clinician and expedite the diagnostic process. Classification algorithms analyze a given input and assign it to an appropriate category [2]. The aim of this systematic review was to provide an overview of current status of AI in first trimester of pregnancy imaging.

This review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [3]. A literature search in computerized databases (PubMed, Embase, and Google Scholar) was performed from inception to January 7, 2024. Search terms and number of hits are listed in Table 1. The search syntax on PubMed and Embase was built using a combination of free text and controlled vocabulary (MESH terms or Emtree) to broaden the search results. We choose to use controlled vocabulary only on Google Scholar as free text search resulted in over 17,000 hits. Given that the gestational age was not always mentioned in weeks, results reporting on imaging in the first trimester defined as ≤14 weeks gestational age were included. The search was further augmented by snowballing the references in papers that fit the inclusion criteria.

Table 1.

Search syntax used in different search engines and number of hits

PubMed
artificial intelligence AND obstetrics AND imaging 566 
artificial intelligence AND pregnancy AND imaging 472 
neural network AND early pregnancy 206 
deep learning AND early pregnancy 80 
(“Artificial Intelligence”[Mesh]) AND “Pregnancy Trimester, First”[Mesh] 34 
(“Machine Learning”[Mesh]) AND “Pregnancy Trimester, First”[Mesh] 19 
Total hits after removal of duplicates 1,088 
PubMed
artificial intelligence AND obstetrics AND imaging 566 
artificial intelligence AND pregnancy AND imaging 472 
neural network AND early pregnancy 206 
deep learning AND early pregnancy 80 
(“Artificial Intelligence”[Mesh]) AND “Pregnancy Trimester, First”[Mesh] 34 
(“Machine Learning”[Mesh]) AND “Pregnancy Trimester, First”[Mesh] 19 
Total hits after removal of duplicates 1,088 
Embase
“machine learning”/exp AND “first trimester pregnancy”/exp 290 
imaging AND “artificial neural network” AND pregnancy 35 
“first trimester pregnancy”/exp AND “artificial intelligence”/exp 53 
“deep learning” AND “first trimester pregnancy” 29 
Total hits after removal of duplicates 339 
Embase
“machine learning”/exp AND “first trimester pregnancy”/exp 290 
imaging AND “artificial neural network” AND pregnancy 35 
“first trimester pregnancy”/exp AND “artificial intelligence”/exp 53 
“deep learning” AND “first trimester pregnancy” 29 
Total hits after removal of duplicates 339 
Google Scholar
“artificial intelligence” and “first trimester pregnancy” and “ultrasound” 133 
“deep learning” and “early pregnancy” and “sonography” 104 
Total hits after removal of duplicates 200 
Google Scholar
“artificial intelligence” and “first trimester pregnancy” and “ultrasound” 133 
“deep learning” and “early pregnancy” and “sonography” 104 
Total hits after removal of duplicates 200 

Full-text peer-reviewed journal papers in English, which evaluated AI or machine learning models in imaging in the first trimester of pregnancy, either US or MRI, were included. Review papers, conference abstracts, posters, and animal studies were excluded.

Data retrieved from the included articles were author, year of publication, gestational age at US, input data, AI model, size of the dataset, prediction/output, performance. Performance of AI models was recorded as reported in the study. One reviewer assessed the risk of bias by using Prediction model Risk Of Bias Assessment Tool (PROBAST) [4]. We assessed only the risk for bias and not applicability. Due to lack of methodological uniformity, formal meta-analysis was not possible and results are presented in a narrative approach.

We included 27 papers, of which twelve focused on segmentation, eight on plane detection, six on image classification, and one on both segmentation and classification (shown in Fig. 1). Table 2 shows the summarized risk of bias for each article. 23 out of the 27 included studies have a high risk of bias. In particular, the absence of clear inclusion and exclusion criteria of participants proved to be a risk of bias in several studies. Furthermore, the number of participants and a lack of accounting for overfitting, underfitting, and model optimism proved to be common sources of possible bias. The models were evaluated by different metrics, as shown in Table 3. Duration to run the algorithm was reported in twelve publications and ranged between less than one s and 14 min (shown in Table 3). We discuss the study design and methodology of the included studies in more detail; their results are summarized in Table 3.

Fig. 1.

PRISMA flowchart of the study selection process.

Fig. 1.

PRISMA flowchart of the study selection process.

Close modal
Table 2.

PROBAST risk of bias assessment

AuthorsParticipantsPredictorsOutcomeAnalysisOverall
Arora et al. [26] (2023) − − 
Carneiro et al. [11] (2008) − − 
Deng et al. [23] (2012) − − − 
Gofer et al. [12] (2021) − − 
Gupta et al. [27] (2021) − − − 
Ji et al. [13] (2023) − − − 
Lin et al. [18] (2022) + 
Liu et al. [14] (2023) − − − 
Looney et al. [5] (2018) − − 
Looney et al. [15] (2021) − − 
Nie et al. [19] (2017) − − − 
Pei et al. [8] (2023) − − 
Ryou et al. [16] (2019) − − 
Schwartz et al. [6] (2021) + 
Sciortino et al. [20] (2016) − − − 
Sciortino et al. [24] (2017) − − − 
Sonia et al. [28] (2015) − − − − 
Stevenson et al. [7] (2015) − − − 
Tsai et al. [21] (2021) − − 
Walker et al. [29] (2022) − − 
Wang et al. [9] (2022) + 
Wee et al. [25] (2010) − − − 
Yang et al. [17] (2019) − − 
Yasrab et al. [30] (2023) − − − 
Zhang et al. [10] (2012) − − 
Zhang et al. [31] (2022) + 
Zhen et al. [22] (2023) − − 
AuthorsParticipantsPredictorsOutcomeAnalysisOverall
Arora et al. [26] (2023) − − 
Carneiro et al. [11] (2008) − − 
Deng et al. [23] (2012) − − − 
Gofer et al. [12] (2021) − − 
Gupta et al. [27] (2021) − − − 
Ji et al. [13] (2023) − − − 
Lin et al. [18] (2022) + 
Liu et al. [14] (2023) − − − 
Looney et al. [5] (2018) − − 
Looney et al. [15] (2021) − − 
Nie et al. [19] (2017) − − − 
Pei et al. [8] (2023) − − 
Ryou et al. [16] (2019) − − 
Schwartz et al. [6] (2021) + 
Sciortino et al. [20] (2016) − − − 
Sciortino et al. [24] (2017) − − − 
Sonia et al. [28] (2015) − − − − 
Stevenson et al. [7] (2015) − − − 
Tsai et al. [21] (2021) − − 
Walker et al. [29] (2022) − − 
Wang et al. [9] (2022) + 
Wee et al. [25] (2010) − − − 
Yang et al. [17] (2019) − − 
Yasrab et al. [30] (2023) − − − 
Zhang et al. [10] (2012) − − 
Zhang et al. [31] (2022) + 
Zhen et al. [22] (2023) − − 

+ indicates low risk of bias; − indicates high risk of bias.

Table 3.

Overview of included articles

AuthorUSAI modelTaskDatasetGest. ageOutcomeTime, s
Segmentation  
Stevenson et al. [7] (2015) 3D RW Placenta 10 First trim Dice score 0.86±0.06 NR 
Looney et al. [5] (2018) 3D fCN Placenta 2,393 11–13+6 w Dice score 0.84 11 
Schwartz et al. [6] (2021) 3D CNN Placenta 124 11–14 w Dice score 0.88 60 
Carneiro et al. [11] (2008) 2D CPBT CRL 430 First trim Hausdorff distance 2.86 mm (σ 3.13) <1 
Zhang et al. [10] (2012) 2D AdaBoost Gestational sac 82 4–12 w Accuracy 90%±4.0* 2.5 
Wang et al. [9] (2022) 2D CNN Gestational sac 2,942 6–8 w Dice score 90.69% NR 
Pei et al. [8] (2023) 3D fCN Gestational sac 194 4+5–9+6 w Accuracy 0.983 (95% CI: 0.972–0.993) 10.2 
Gofer et al. [12] (2021) 2D TWS Cerebral cortex 80 12–14 w Mean absolute percentage error training 1.76%, test 1.71% NR 
Yang et al. [17] (2019) 3D fCN + RNN i. Fetus 104 10–14 w i. Accuracy 0.88 300 
ii. Gestational sac ii. Accuracy 0.89 
iii. Placenta iii. Accuracy 0.64 
Ryou et al. [16] (2019) 2D i–iii. fCN i. Fetus 65 11–14 w i. Accuracy fetus 89.4%±11.4 NR 
iv. U-Net ii. Head ii. Accuracy head 95.4% 
iii. Abdomen iii. Accuracy abdomen 96.6% 
iv. Limb iv. Accuracy arm 85.0%, leg 88.3 
Looney et al. [15] (2021) 3D fCN i. Placenta i. 2,048 11–13+6 w i. Dice score 0.85 8.46 
ii. Placenta ii. 300 ii. Dice score placenta 0.82 
+ Fetus Dice score fetus 0.88 
+ Amniotic fluid Dice score amniotic fluid 0.93 
Liu et al. [14] (2023) 2D fCN i. Gestational sac 1,185 6–10 w i. Dice score gestational sac 96.71%±7.83 NR 
ii. Yolk sac ii. Dice score yolk sac 93.62±3.31 
iii. Embryo iii. Dice score embryo 96.11±7.72 
Ji et al. [13] (2023) 2D CNN 6 anatomical landmarks 2,372 11–13+6 w Mean absolute error for all landmarks <0.2 0.76 
Detection 
Sciortino et al. [20] (2016) 2D Wavelet analysis + FNN Midsagittal plane 3,000 11–13 w True positive 87.26%, FN 12.74% NR 
True negative 94.98%, FP 5.02% 
Nie et al. [19] (2017) 3D DBN Midsagittal plane 346 11–13+6 w Accuracy 91.62% NR 
Tsai et al. [21] (2021) 3D CNN + GAN Midsagittal plane 218 11–13 w Average Euclidean distance of 0.0094 2.4 
Lin et al. [18] (2022) 2D CNN + logistic regression classifier Midsagittal plane 1,528 10–14 w Internal sensitivity 90.5%, specificity 100% <0.1 
External sensitivity 98.9%, specificity 79.7% 
Zhen et al. [22] (2023) 3D CNN 9 standard planes 19,025 11–13+6 w Mean average precision from 0.604 to 0.919, (average 0.774) 840 
Wee et al. [25] (2010) 2D FNN NT region 210 First trim Accuracy 93.33% NR 
Deng et al. [23] (2012) 2D SVM NT region 690 First trim Accuracy 55.9% NR 
Sciortino et al. [24] (2017) 2D MRA NT 382 11–13 w True positive 99.95% 0.2 
Classification  
Sonia et al. [28] (2015) 2D SVM NT normal/abnormal Unclear 11–13+6 w Sensitivity 93.8% NR 
Specificity 74.4% 
Zhang et al. [31] (2022) 2D CNN Normal/trisomy 21 3,303 11–13+6 w Accuracy 0.88 (95% CI: 0.84–0.92) NR 
Walker et al. [29] (2022) 2D CNN Normal/cystic hygroma 289 11–13+6 w Accuracy 93% (95% CI: 88–98%) NR 
Wang et al. [9] (2022) 2D CNN Normal/miscarriage 2,942 6–8 w Sensitivity 80.73%, specificity 80.91% NR 
Gupta et al. [27] (2021) 2D CNN Placenta 429 11–14 w Sensitivity 70.6%, specificity 76.6% NR 
Normal/abnormal 
Arora et al. [26] (2023) 2D CNN Placenta 1,008 11–14 w i. Trim 1–2, sensitivity 78.67%, specificity 91.11% NR 
i. Trimester Trim 1–3, sensitivity 86.36%, specificity 88.98% 
ii. Normal/adverse outcome ii. Normal/adverse sensitivity 77.42%, specificity 80.21% 
Yasrab et al. [30] (2023) 3D CNN 7 anatomical categories 832 11–14 w Accuracy 93.10% NR 
AuthorUSAI modelTaskDatasetGest. ageOutcomeTime, s
Segmentation  
Stevenson et al. [7] (2015) 3D RW Placenta 10 First trim Dice score 0.86±0.06 NR 
Looney et al. [5] (2018) 3D fCN Placenta 2,393 11–13+6 w Dice score 0.84 11 
Schwartz et al. [6] (2021) 3D CNN Placenta 124 11–14 w Dice score 0.88 60 
Carneiro et al. [11] (2008) 2D CPBT CRL 430 First trim Hausdorff distance 2.86 mm (σ 3.13) <1 
Zhang et al. [10] (2012) 2D AdaBoost Gestational sac 82 4–12 w Accuracy 90%±4.0* 2.5 
Wang et al. [9] (2022) 2D CNN Gestational sac 2,942 6–8 w Dice score 90.69% NR 
Pei et al. [8] (2023) 3D fCN Gestational sac 194 4+5–9+6 w Accuracy 0.983 (95% CI: 0.972–0.993) 10.2 
Gofer et al. [12] (2021) 2D TWS Cerebral cortex 80 12–14 w Mean absolute percentage error training 1.76%, test 1.71% NR 
Yang et al. [17] (2019) 3D fCN + RNN i. Fetus 104 10–14 w i. Accuracy 0.88 300 
ii. Gestational sac ii. Accuracy 0.89 
iii. Placenta iii. Accuracy 0.64 
Ryou et al. [16] (2019) 2D i–iii. fCN i. Fetus 65 11–14 w i. Accuracy fetus 89.4%±11.4 NR 
iv. U-Net ii. Head ii. Accuracy head 95.4% 
iii. Abdomen iii. Accuracy abdomen 96.6% 
iv. Limb iv. Accuracy arm 85.0%, leg 88.3 
Looney et al. [15] (2021) 3D fCN i. Placenta i. 2,048 11–13+6 w i. Dice score 0.85 8.46 
ii. Placenta ii. 300 ii. Dice score placenta 0.82 
+ Fetus Dice score fetus 0.88 
+ Amniotic fluid Dice score amniotic fluid 0.93 
Liu et al. [14] (2023) 2D fCN i. Gestational sac 1,185 6–10 w i. Dice score gestational sac 96.71%±7.83 NR 
ii. Yolk sac ii. Dice score yolk sac 93.62±3.31 
iii. Embryo iii. Dice score embryo 96.11±7.72 
Ji et al. [13] (2023) 2D CNN 6 anatomical landmarks 2,372 11–13+6 w Mean absolute error for all landmarks <0.2 0.76 
Detection 
Sciortino et al. [20] (2016) 2D Wavelet analysis + FNN Midsagittal plane 3,000 11–13 w True positive 87.26%, FN 12.74% NR 
True negative 94.98%, FP 5.02% 
Nie et al. [19] (2017) 3D DBN Midsagittal plane 346 11–13+6 w Accuracy 91.62% NR 
Tsai et al. [21] (2021) 3D CNN + GAN Midsagittal plane 218 11–13 w Average Euclidean distance of 0.0094 2.4 
Lin et al. [18] (2022) 2D CNN + logistic regression classifier Midsagittal plane 1,528 10–14 w Internal sensitivity 90.5%, specificity 100% <0.1 
External sensitivity 98.9%, specificity 79.7% 
Zhen et al. [22] (2023) 3D CNN 9 standard planes 19,025 11–13+6 w Mean average precision from 0.604 to 0.919, (average 0.774) 840 
Wee et al. [25] (2010) 2D FNN NT region 210 First trim Accuracy 93.33% NR 
Deng et al. [23] (2012) 2D SVM NT region 690 First trim Accuracy 55.9% NR 
Sciortino et al. [24] (2017) 2D MRA NT 382 11–13 w True positive 99.95% 0.2 
Classification  
Sonia et al. [28] (2015) 2D SVM NT normal/abnormal Unclear 11–13+6 w Sensitivity 93.8% NR 
Specificity 74.4% 
Zhang et al. [31] (2022) 2D CNN Normal/trisomy 21 3,303 11–13+6 w Accuracy 0.88 (95% CI: 0.84–0.92) NR 
Walker et al. [29] (2022) 2D CNN Normal/cystic hygroma 289 11–13+6 w Accuracy 93% (95% CI: 88–98%) NR 
Wang et al. [9] (2022) 2D CNN Normal/miscarriage 2,942 6–8 w Sensitivity 80.73%, specificity 80.91% NR 
Gupta et al. [27] (2021) 2D CNN Placenta 429 11–14 w Sensitivity 70.6%, specificity 76.6% NR 
Normal/abnormal 
Arora et al. [26] (2023) 2D CNN Placenta 1,008 11–14 w i. Trim 1–2, sensitivity 78.67%, specificity 91.11% NR 
i. Trimester Trim 1–3, sensitivity 86.36%, specificity 88.98% 
ii. Normal/adverse outcome ii. Normal/adverse sensitivity 77.42%, specificity 80.21% 
Yasrab et al. [30] (2023) 3D CNN 7 anatomical categories 832 11–14 w Accuracy 93.10% NR 

AI, artificial intelligence; CNN, convolutional neural network; CPBT, constrained probabilistic boosting tree; CRL, crown rump length; DBN, deep belief network; fCN, fully convolutional network; FN, false negative; FNN, feedforward neural network; FP, false positive; Gest. Age, gestational age; GAN, generative adversarial network; Gest, gestational; MRA, multiresolution analysis; NR, not reported; RNN, recurrent neural network; RW, random walker; SVM, support vector machine; trim, trimester; TWS, Trainable Weka segmentation; US, ultrasound; w, weeks.

*On 30 correctly segmented images.

Segmentation

Thirteen studies on segmentation were identified (shown in Table 3), three of which focused on placental segmentation [5‒7], three on gestational sac segmentation [8‒10], one on crown rump length [11], and one on fetal cerebral cortex [12]. Five studies segmented more than one anatomical structure [13‒17]. We discuss the article of Wang et al. [9] in a subsequent part of the paper as they combined a segmentation and classification task.

Placenta

Stevenson et al. [7] developed a semi-automated random walker (RW) algorithm to segment the placenta. The algorithm requires seeding by an operator. The operator needs to provide information on placental location by roughly labeling the placenta and background pixels in the US images to initialize the algorithm. Their model was validated on 10 subjects with an anterior placenta and showed a dice coefficient of 0.86 ± 0.06. Looney et al. [5] trained a fully convolutional network (fCN). They used a RW algorithm seeded by 3 expert operators to generate a ground-truth dataset. A 2-fold cross-validation was performed, providing a training, validation, and test partition of 1,097, 100, and 1,196 cases, respectively. The median dice coefficient was 0.84. Schwartz et al. [6] constructed a convolutional neural network (CNN) pipeline combining 2D and 3D models with a downsampling-upsampling architecture similar to U-Net architecture. A dataset of 99 images was manually segmented to provide a ground truth. 4-fold cross-validation and data augmentation was implemented to avoid overfitting. The model achieved a mean dice coefficient of 0.88 on an independent test set of 25 images.

Fetus and Gestational Sac

Carneiro et al. [11] used constrained probabilistic boosting tree to segment different anatomical structures, including the crown rump length on first-trimester scans. The model relies on significant user interaction: the user selects the US image, what structure is displayed, and needs to make corrections to the automatic detection. The model trained on 325 expert annotated images and was tested on 105 independent images. Quantitative assessment of the algorithm is performed by calculating the Hausdorff distance between the curves generated by the algorithm and manual annotation. The model achieved a Hausdorff distance of 2.86 mm (σ 3.13). Zhang et al. [10] developed an intelligent scanning method to segment the gestational sac. They used 61 manually segmented videos as a ground truth. First, the gestational sac was located in each US frame, by extracting Haar-like features to train two-cascade AdaBoost classifiers. Using surrounding anatomical landmarks, false positive frames were eliminated. Finally, a multiscale normalized cuts algorithm generated the contour of the gestational sac, which was segmented by a modified snake model. The model was tested on an independent dataset of 31 videos. Before calculating the models’ segmentation accuracy, they excluded 1 video in which the model had falsely detected a cystic region as the gestational sac. On the remaining 30 videos, the model achieved an average accuracy of 90 ± 4.0%. Recently, Pei et al. [8] automated the segmentation of the gestational sac. The input for their model was a transvaginal ultrasonography video. A pre-trained image classifier identified the image frame containing a clear depiction of the gestational sac. Subsequently, an fCN, specifically Attention U-Net, achieved segmentation. Videos from three different centers served as the training set (n = 111), validation set (n = 33), and test set (n = 50), respectively. The ground truth was provided by two experts. The automated pipeline achieved an accuracy of 0.983 (0.972–0.993).

A study by Gofer et al. [12] compared statistical region merging, a classical computer image processing algorithm, to Trainable Weka segmentation on segmentation and measuring of the fetal cerebral cortex. The dataset included 80 images obtained with a transvaginal probe. Fivefold cross-validation was implemented: 80% of the data was used to train and 20% to test the model. They used the mean absolute percentage error to evaluate the difference between the predicted fetal cortex width and the ground-truth value. The statistical region merging revealed a mean absolute percentage error of 1.71% ± 1.62 standard deviation. The Trainable Weka segmentation had a mean absolute percentage error of 1.76 ± 0.23 SD on training set and 1.71 ± 0.559 SD on test set.

Multiple Segmentation

Yang et al. [17] simultaneously segmented the fetus, gestational sac, and placenta. They used a 3D fCN and a multi-directional recurrent neural network with a hierarchical deep supervision mechanism. Ten sonographers manually segmented all 3D volumes to provide a segmentation ground truth. The dataset was split into 50 vol for training, 10 for validation, and 44 for testing. The training dataset was augmented to 150 vol via horizontal and vertical flipping. A dice coefficient of 0.88, 0.89, and 0.64 was obtained for fetus, gestational sac, and placenta, respectively. A multiple biometry segmentation was performed by Ryou et al. [16] as well. A 2D fCN-based architecture was applied to slices of a 3D volume to segment the whole fetus, head, abdomen, and limbs. Ground-truth segmentation was provided by two experts. Data augmentation was used to increase the initial training data size of 44. The test set consisted of 21 independent US volumes. A sagittal view of the fetus was extracted from a 3D US and semantic segmentation of the whole fetus was performed by a multi-task network and achieved a mean pixel accuracy of 89.4% ± 11.4. The head and abdomen are segmented by an fCN and achieved a pixel accuracy of 95.4% and 96.6%, respectively. They reported a mean pixel accuracy for arm segmentation of 85.0% ± 12.7 and leg segmentation of 88.3% ± 8.0 with U-Net. Looney et al. [15] built a multiclass CNN to segment the placenta, amniotic fluid, and fetus. The dataset consisted of 2,093 labeled placental volumes and 300 volumes with placenta, amniotic fluid, and fetus annotated for multi-class segmentation. Ground truth was obtained by segmentation with a RW algorithm initialized by 3 experts. For their placenta segmentation model, they built an fCN. For multiclass segmentation, a two-pathway hybrid model using transfer learning, a modified loss function, and exponential average weighting was developed. The model achieved a mean dice coefficient of 0.82 (±0.08) for placental, 0.93 (±0.04) for amniotic fluid, and 0.88 (±0.04) for fetal segmentation. To perform segmentation of the gestational sac, yolk sac, and embryo, Liu et al. [14] designed AFG-net: an advanced fCN based on U-Net, with added attention fusing and guided filtering techniques. The dataset consisted of 914 training images and 271 test images. The researchers labeled the images with help from an expert sonographer. The algorithm achieved a Dice score of 96.71% ± 7.83 for gestational sac, 93.62% ± 3.31 for yolk sac, and 96.11% ± 7.72 for embryo segmentation. Ji et al. [13] developed an algorithm aimed at segmenting four anatomical structures and predicting the location of six anatomical landmarks. Their dataset included 2,372 images. Data augmentation was applied. Ground-truth annotations were obtained by an expert sonographer and scrutinized by a second expert. The region of interest was initially determined using a faster region-based CNN. Subsequent segmentation was accomplished using CNNs. The average Euclidian distance between predicted and actual landmarks was measured, and the collective mean absolute error for all landmarks was less than 0.2 mm. Following segmentation, the researchers proceeded to calculate four facial markers and assessed their potential in predicting chromosomal abnormalities in fetuses.

Detection

We identified eight studies (shown in Table 3), of which five focused on detecting anatomical planes [18‒22] and three on detecting the nuchal translucency in a pre-defined midsagittal plane [23‒25].

Anatomical Plane Detection

Sciortino et al. [20] developed a multiple-step methodology based on wavelet analysis and two feedforward neural network classifiers to detect the jawbone and on radial symmetry analysis to detect the choroid plexus. They obtained 3,000 frames from ten videos in ten different patients, labeled by an expert. The dataset was randomly divided into 1,000 frames for training and 2,000 for testing for ten permutations. Their method showed a true positive rate of 87.26%, which indicates that only 12.74% of correct midsagittal frames were rejected. A true negative rate of 94.98% was obtained, which implies 5.02% of the frames were wrongly accepted. Nie et al. [19] approached midsagittal plane detection in a 3D sweep as a symmetry plane and axis searching problem. They used a deep belief network and a modified circle detection method to identify the position and size of the fetal head. The dataset consisted of 346 3D sweeps. Two-fold cross-validation was applied to evaluate the method, dividing the dataset into 173 training and 173 test data. Data were manually annotated to provide a ground truth. The middle slice of the sweep was divided into patches. From the training data, 1,384 patches containing the fetal head and 57,616 negative patches were used to train the algorithm. The input of the deep belief network was an image patch, the output was the probability it contained the fetal head. The patch with the highest probability was chosen. The detection accuracy was 91.6%. Overall, 88.6% of the result planes had a distance error less than 4 mm and 71.0% had an angle error less than 20°. A two-stage deep learning method was proposed by Tsai et al. [21]. In the first stage, a seed point of the fetal head is found. Two segmentation networks are utilized for the sagittal and axial views; two additional networks are used for object detection and to obtain the seed point in the axial and coronal views. In the second stage, a deep learning method involving a generative adversarial network, which contains a generator and discriminator, is used for automatic fetal midsagittal plane detection in 3D US images. They applied 5-fold cross-validation on 218 cases, selecting 174 cases (80%) for training and the remaining 28 cases (20%) for testing. Images in which the fetal head was on the right side were horizontally flipped. A cube containing the fetal head was cropped out of the US volume. The system produced an average included angle of 0.5344° and an average Euclidean distance of 0.0094. Lin et al. [18] combined detection and classification to detect the midsagittal plane. They utilized nine expert-labeled anatomical structures to train multiple CNNs in a divide-and-conquer framework for hierarchical object detection. The model trained on 1,372 images. The probability of each structure obtained from the trained object detectors was then used to train a logistic regression classifier, generating a standard or non-standard plane classification. A ground truth was provided by two experts. The test dataset comprised 156 internal images and 156 images obtained in a different center to evaluate external validity. For the internal set, the sensitivity and specificity was 90.5% and 100%, respectively. For the external dataset, the sensitivity and specificity is 98.9% and 79.7%. Utilizing a weighted scoring system for each identified anatomical structure, Zhen et al. [22] developed a model designed to identify nine standard planes in videos. The CNN-based object detection algorithm (YOLOv3) was trained on 19,025 images, distributed into training, validation, and testing sets in a ratio of 7:1:2. They did not disclose how many experts were involved with manual annotation. The models’ performance to detect the anatomical structures was calculated for each standard plane and showed a mean average precision between 0.604 and 0.919 (average 0.774). The quality of the detected standard planes was quantitatively rated by a blinded expert according to a scoring protocol and achieved a score similar to an expert sonographer.

Nuchal Translucency Detection

Wee et al. [25] applied a feedforward neural network to locate the nuchal translucency region in a midsagittal US image. The network was trained on a dataset of 150 images. They did not specify how many images did or did not contain the nuchal fold. To test the performance of the network, an independent dataset consisting of images with (n = 30) and images without nuchal translucency visible (n = 30) was used. The model achieved an accuracy of 93.33%. Deng et al. [23] constructed a hierarchical model. The training data were pre-processed by calculating the histogram of oriented gradient and generating a Gaussian pyramid. Three support vector machine classifiers were trained to identify the bounding boxes containing the nuchal translucency, head, and body. A spatial model was used to define the spatial constraints between them. Finally, they applied dynamic programming and generalized distance transform to detect the nuchal translucency. The dataset consisted of 690 midsagittal images. It was not specified how many experts were involved in labeling the images. Half of the images were randomly selected as training images. The model achieved an accuracy of 55.9%, providing a positive result meant the predicted box had a 50% overlap with the ground-truth box.

Sciortino et al. [24] extracted 382 midsagittal sections from US videos of 12 patients. Wavelet analysis was applied to pre-process the images. The head of the fetus was located by fast radial symmetry. Based on the position of the fetal head and jawbone, a rough bounding box was drawn, which included the nuchal translucency. Information obtained by the wavelet transform was used to detect the nuchal translucency, which is then measured. The reported correct identification rate was 99.95%, obtained by counting the number of nuchal regions correctly detected by the system with respect to the ground truth provided by one sonographer.

Classification

We identified seven articles (shown in Table 3) on classification [9, 26‒31]. In 2015, Sonia et al. [28] used an support vector machine to classify fetal images into normal nuchal translucency or abnormal nuchal translucency. They did not explicitly mention the total amount of images in the dataset. One-third of the available images was used as training data. They achieved an accuracy of 84%. Zhang et al. [31] proposed a shallow CNN with eleven layers to identify fetuses with trisomy 21. The input is a 2D image of the fetal head region in the midsagittal plane; the output is a risk score between 0 and 1. The training set consists of 2,140 images and after data augmentation 3-fold cross-validation was applied. The algorithm was validated on 1,163 images. They generated a class activation map to show what parts of the image contributed most to the CNN’s prediction. On the validation set, the model achieved an accuracy of 0.88 (95% CI: 0.84–0.92). To predict the risk of miscarriage in a fetus of 6–8 weeks with a detectable heart beat at the first US, Wang et al. [9] trained a CNN on 2,942 images (10-fold cross-validation was applied). Preprocessing of the images and data augmentation techniques were discussed. Ground-truth segmentation of the gestational sac was provided by 2 experts. First, the gestational sac was segmented, which yielded a Dice score of 90.69%. Second, the images were classified, which achieved a sensitivity of 80.73% and specificity of 80.91%. They tested the robustness of the model on a prospective cohort of 101 patients: sensitivity and specificity of the model were 80.93% and 94.52%, respectively. A CNN (DenseNet) was trained to identify cases of cystic hygroma by Walker et al. [29]. The input was a midsagittal view of the fetus. The dataset consisted of 289 images, with 75% utilized as the training dataset to which 4-fold cross-validation was applied. Overall, 25% served as the validation dataset. The authors discussed methods for image preprocessing and data augmentation. A gradient-weighted class activation mapping was implemented to improve the interpretability of the model. They achieved an accuracy of 93% (95% CI: 88–98%). Yasrab et al. [30] propose a two-stream CNN architecture for fetal anatomy annotation in first-trimester scan videos. The model classified video segments into one of seven anatomical categories. They used transfer learning from pre-trained model on the second trimester of pregnancy. Data augmentation and preprocessing techniques were discussed. A ground truth was provided by 2 experts. The dataset consisted of 641 images for training, 145 for validation, and 46 for testing. The test set results showed an accuracy of 96.10%.

In 2021, Gupta et al. [27] compared 5 pre-trained CNNs to predict hypertension in pregnancy based on placental image texture. They recruited 451 cases, of which 429 were followed until delivery. They did not report how a possible source of bias in missing data was addressed. They omitted the missing data and analyzed the remaining data. Of all cases, a placental image was obtained in all three trimesters of pregnancy. A total of 58 patients had hypertensive disorders of pregnancy, 116 had other adverse outcomes, and 255 had normal outcomes. They performed 5-fold cross-validation and applied image data augmentation to increase the size of the training dataset 13 times. They stated an overall accuracy score of 0.7098. In the first trimester, sensitivity and specificity were reported as 70.6% and 76.6%, respectively, for abnormal placental image texture to predict hypertensive disorders of pregnancy. Arora et al. [26] conducted a prospective observational study aimed at classifying placental images based on trimester and predicting adverse or normal outcomes. The study included data from 1,008 cases, with 600 cases resulting in normal outcomes and 408 associated with either maternal or fetal adverse outcome. Employing a CNN-based model, the study reported a sensitivity of 78.67% and specificity of 91.11% for classifying images to the first or second trimester. In the case of first and third trimester, the sensitivity and specificity were reported as 86.36% and 88.89%, respectively. Furthermore, when classifying images of the first trimester as normal or adverse outcome, the model achieved a sensitivity of 77.42% and a specificity of 80.21%.

This review gives an overview of the publications so far on AI in the first trimester of pregnancy and identified several challenges for future research. All included studies were based on supervised learning, which implies that the model is trained on a dataset that has been assigned ground-truth labels. These ground-truth labels were either assigned [6‒8, 10‒14, 16‒25, 28, 30] or seeded [5, 15] by an operator or an observed clinical outcome [9, 26, 27, 29, 31]. Manual labeling or seeding is time consuming and might introduce bias in the model [2]. Transparency of labeling such that others can critically evaluate the training process is paramount to assess possible sources of bias and ensure accuracy of the algorithm [32]. However, nine articles did not disclose how many operators were involved in the labeling process [6, 12, 14, 19, 22, 23, 25, 28, 30].

Even though most included algorithms reported a good performance on testing datasets, caution should be exercised to generalize their results to the clinical workspace. First, a high accuracy does not always correlate to a good clinical performance: imagine an image with 96 red pixels and 4 blue pixels. If the segmentation model labels all pixels as red, it will have an accuracy of 96% even though the model does not identify a single blue pixel. This is known as the class imbalance problem [33]. Second, there might be a mismatch between the data used to train the model and the data it encounters during deployment in a clinical setting. This mismatch can be due to different factors, including bias in the training set, different equipment (for instance, US machines), change in clinical practice over time, or differences in the patient population. This phenomenon is called distributional shift and can lead to a decrease in performance of the algorithm [34, 35]. Another blind spot of AI models are rare diseases as they will be underrepresented or missed in training data [36]. To safely implement AI algorithms on a larger scale in the future, a continued data supply will be needed for ongoing monitoring, updating, and improvement of the algorithm [37, 38]. This could be achieved by an online learning approach, where the AI model is updated on one or a small number of data instances at a time. Data will need to be shared across multiple institutions and potentially across nations [32, 39]. To assess the external validation of an AI model, transparency on the population from which training data were obtained is crucial. However, of the 27 included articles only fifteen clearly stated in which center their data were obtained [5, 8‒10, 13‒15, 18, 19, 21, 22, 25, 29‒31]. We noticed a trend that more recently published articles are more transparent in disclosing the training process, image preprocessing techniques, data selection, and data augmentation. None of the included articles recruited training data from more than one center, which will limit the external validity of the models.

AI scientists currently direct their focus on building a model and its performance, not the clinical applicability. For instance, models focusing on segmentation of the nuchal fold in an input image of the midsagittal plan have limited clinical value. An important parameter in assessing AI performance in the clinical workflow is the time taken for task completion by both humans and AI [40]. Only twelve articles reported the time needed for the AI model to process a new image [5, 6, 8, 10, 11, 13, 15, 17, 18, 21, 22, 24]. All methods presented in this study were semi- or fully automatic. No methods proposed an interactive framework. Therefore, the clinician would have to either accept the output or reject the output and result to manual analysis. This may reduce the trust clinicians have in the AI system and even increase clinical time [41]. Achieving a balance between model complexity and interpretability is a common challenge. Nonetheless, two recent articles addressed this issue by including class-discriminative visualization [29, 31]. This technique highlights the regions in an image that influenced the algorithm’s decisions. It enables the user to understand better the model’s predictions and understand where the model might fail. Interdisciplinary collaborations will be crucial to consolidate the goals of computational scientists with the goals of clinicians providing patient care and increase the scientific value of future research [32, 42].

The strength of this review is that we give an overview of the state of AI research in first-trimester imaging. We conducted a broad search in three databases following PRISMA guidelines and assessed risk of bias using PROBAST. Our study had multiple limitations. First, the included studies have different goals, AI model structures, databases, and evaluation methods. This heterogeneity makes it difficult to compare individual studies and draw general conclusions. Second, even though we conducted a broad search in 3 large databases, relevant publications might have been missed. AI-related articles are published in a broad range of journals and databases with a different focus, which makes it difficult to identify all relevant publications. Third, publication bias might be present as it is likely that only the most accurate models get published. Last, we acknowledge that the guidelines used to assess the risk of bias were designed for conventional prediction modeling studies, and the adherence we found should be interpreted in this context [4, 40].

We propose multiple challenges for future research in AI in first-trimester imaging. First, until recently there was a lack of consensus on how to report AI trials. Lately, the CONSORT (Consolidated Standards of Reporting Trials) and SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) groups have published the first international consensus-based AI reporting guidelines [43, 44]. Currently, an extension to the TRIPOD statement (Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis statement) and PROBAST (Prediction model Risk Of Bias ASsessment Tool) for prediction model studies that applied machine learning techniques are being developed [45]. Future studies on AI algorithms should implement these guidelines to ensure transparent and systematic data management, algorithm development, and evaluation. Second, the introduction of AI in early pregnancy imaging might facilitate to make basic US available to non-expert sonographers. An important issue, however, is the liability in case of misdiagnosis [46, 47]. Last, only five studies included fetuses with a gestational age less than 10 weeks [8‒11, 14]. US findings in early and late first trimester are very different as fetal development rapidly changes over the first 14 weeks of pregnancy. Part of the AI technology designed for late first trimester of pregnancy might not be applicable to early first-trimester images and vice versa. There remains a gap in research to AI models in imaging in early first-trimester pregnancies, despite the clinical benefit of such a technology.

Despite the exponential increase in AI related research, AI is still in the developmental stage in first-trimester pregnancy imaging and has yet to transition from research to clinical implementation.

Artificial Intelligence

The term artificial intelligence is often used to describe the science of developing software, which can perform complex tasks, such as interpreting spoken language, image recognition, or solving problems [48].

Machine Learning

Machine learning is a branch of AI that allows algorithms to learn from data and experience, improving over time. Machine learning systems are given a task and fed a large amount of data to use as examples of how this task can be achieved [49]. The type of machine learning method should be appropriate for the amount and type of input data. Generally, more recent methods (e.g., convolutional neural networks, recurrent neural networks) are better suited than traditional methods (e.g., support vector machines) in assessing complex data such as medical images. However, more recent methods generally require significantly more data, which can become a bottle neck [50].

Deep Learning

Deep learning is a subtype of machine learning, in which the input and output are connected by multiple layers of hidden connections [51]. Deep learning algorithms refer to artificial neural networks with three or more layers.

Artificial Neural Network

A machine learning method is modeled after the connections in the human brain. These systems may consist of many layers of neurons. A deep neural network (DNN) consists of multiple layers.

Convolutional Neural Network

Convolutional neural network is a type of deep neural network designed to process multi-dimensional data like images and designed after the biological model of the visual cortex. It contains a convolutional layer, in which a filter slides over the input data and produces a feature map. This allows the computation of hierarchical features of complex images. CNNs are a popular network choice within medical imaging analysis [41, 51].

Fully Convolutional Network

Fully convolutional network is a version of a CNN model in which all the fully connected layers are replaced by convolution layers. Using an fCN, an image can be analyzed globally instead of using patches [52, 53].

Support Vector Machine

SVM is a supervised learning algorithm, which can be used for regression, novelty detection, and classification tasks. Image segmentation can be treated as a classification problem, to which an SVM can be applied [54, 55].

Feedforward Neural Network

Feedforward neural network is a simple type of ANN in which the information moves only in a forward direction. Each layer consists of nodes and the connections between nodes can have associated weights [56].

Generative Adversarial Network

Generative adversarial network is a deep learning method, which contains two sub-models. The generator makes new examples and the discriminator classifies if the generated examples are “real” training data or “fake” data generated by the generator model [21, 57].

Random Walker

Random walker is an algorithm that calculates the probability that a random walker starting at an unlabeled pixel will first reach a pre-labeled pixel. By assigning each pixel to the label for which the greatest probability is calculated, an image segmentation is obtained [58].

Deep Belief Neural Network

Deep belief neural network is a deep neural network composed of several middle layers of restricted Boltzmann machines (RBM), each consisting of a visible layer and a single hidden layer. There are connections between the layers but not between the units in each layer [59].

Recurrent Neural Network

Recurrent neural network is a neural network designed to handle sequential data like speech processing, where the previous words contribute to the meaning of a sentence. Recurrent neural networks achieve this “memory” through recurrent connections that create a loop [51].

AdaBoost

AdaBoost is an adaptive boosting algorithm. Boosting is a learning method that produces an accurate prediction rule by combining rough and moderately inaccurate rules of thumb [60].

Constrained Probabilistic Boosting Tree

Constrained probabilistic boosting tree is a boosting classifier, where the strong classifiers are represented by the nodes of a binary tree. As the input training dataset is divided in new sets by the classifier, each set is used to train the sub-trees recursively [61].

Ethical approval and patient consent is not applicable as this study is based exclusively on published literature.

The authors have no conflict of interest to declare.

The authors declare they have no financial interests and no funding was received.

The search syntax was built by Emma Umans, Kobe Dewilde, and Thierry Van den Bosch. Two researchers (Emma Umans and Kobe Dewilde) independently screened titles, abstracts, and full text. Discrepancies in article selection were resolved by the senior researcher (Thierry Van den Bosch). Emma Umans and Kobe Dewilde drafted the manuscript. Thierry Van den Bosch, Helena Williams, and Jan Deprest reviewed the manuscript critically and gave final approval of the version to be published.

Additional Information

Emma Umans and Kobe Dewilde should be considered as joint first authors, and both authors contributed equally to the publication.

All data generated or analyzed during this study are included in this article. Further inquiries can be directed to the corresponding author.

1.
Salomon
LJ
,
Alfirevic
Z
,
Bilardo
CM
,
Chalouhi
GE
,
Ghi
T
,
Kagan
KO
, et al
.
ISUOG practice guidelines: performance of first-trimester fetal ultrasound scan
.
Ultrasound Obstet Gynecol
.
2013
;
41
(
1
):
102
13
.
2.
Drukker
L
,
Noble
JA
,
Papageorghiou
AT
.
Introduction to artificial intelligence in ultrasound imaging in obstetrics and gynecology
.
Ultrasound Obstet Gynecol
.
2020
;
56
(
4
):
498
505
.
3.
Page
MJ
,
McKenzie
JE
,
Bossuyt
PM
,
Boutron
I
,
Hoffmann
TC
,
Mulrow
CD
, et al
.
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
.
BMJ
.
2021
;
74
:
790
9
.
4.
Wolff
RF
,
Moons
KGM
,
Riley
RD
,
Whiting
PF
,
Westwood
M
,
Collins
GS
, et al
.
PROBAST: a tool to assess the risk of bias and applicability of prediction model studies
.
Ann Intern Med
.
2019
;
170
(
1
):
51
8
.
5.
Looney
P
,
Stevenson
GN
,
Nicolaides
KH
,
Plasencia
W
,
Molloholli
M
,
Natsis
S
, et al
.
Fully automated, real-time 3D ultrasound segmentation to estimate first trimester placental volume using deep learning
.
JCI Insight
.
2018
;
3
(
11
):
e120178
.
6.
Schwartz
N
,
Oguz
I
,
Wang
J
,
Pouch
A
,
Yushkevich
N
,
Parameshwaran
S
, et al
.
Fully automated placental volume quantification from 3D ultrasound for prediction of small-for-gestational-age infants
.
J Ultrasound Med
.
2022
;
41
(
6
):
1509
24
.
7.
Stevenson
GN
,
Collins
SL
,
Ding
J
,
Impey
L
,
Noble
JA
.
3-D ultrasound segmentation of the placenta using the random walker algorithm: reliability and agreement
.
Ultrasound Med Biol
.
2015
;
41
(
12
):
3182
93
.
8.
Pei
Y
,
E
L
,
Dai
C
,
Han
J
,
Wang
H
,
Liang
H
.
Combining deep learning and intelligent biometry to extract ultrasound standard planes and assess early gestational weeks
.
Eur Radiol
.
2023
;
33
(
12
):
9390
400
.
9.
Wang
Y
,
Zhang
Q
,
Yin
C
,
Chen
L
,
Yang
Z
,
Jia
S
, et al
.
Automated prediction of early spontaneous miscarriage based on the analyzing ultrasonographic gestational sac imaging by the convolutional neural network: a case-control and cohort study
.
BMC Pregnancy Childbirth
.
2022
;
22
(
1
):
621
.
10.
Zhang
L
,
Chen
S
,
Chin
CT
,
Wang
T
,
Li
S
.
Intelligent scanning: automated standard plane selection and biometric measurement of early gestational sac in routine ultrasound examination
.
Med Phys
.
2012
;
39
(
8
):
5015
27
.
11.
Carneiro
G
,
Georgescu
B
,
Good
S
,
Comaniciu
D
.
Detection and measurement of fetal anatomies from ultrasound images using a constrained probabilistic boosting tree
.
IEEE Trans Med Imaging
.
2008
;
27
(
9
):
1342
55
.
12.
Gofer
S
,
Haik
O
,
Bardin
R
,
Gilboa
Y
,
Perlman
S
.
Machine learning algorithms for classification of first-trimester fetal brain ultrasound images
.
J Ultrasound Med
.
2021
;
41
:
1773
9
.
13.
Ji
C
,
Liu
K
,
Yang
X
,
Cao
Y
,
Cao
X
,
Pan
Q
, et al
.
A novel artificial intelligence model for fetal facial profile marker measurement during the first trimester
.
BMC Pregnancy Childbirth
.
2023
;
23
(
1
):
718
.
14.
Liu
L
,
Tang
D
,
Li
X
,
Ouyang
Y
.
Automatic fetal ultrasound image segmentation of first trimester for measuring biometric parameters based on deep learning
.
Multimed Tools Appl
.
2023
;
83
(
9
):
27283
304
.
15.
Looney
P
,
Yin
Y
,
Collins
SL
,
Nicolaides
KH
,
Plasencia
W
,
Molloholli
M
, et al
.
Fully automated 3-D ultrasound segmentation of the placenta, amniotic fluid, and fetus for early pregnancy assessment
.
IEEE Trans Ultrason Ferroelectr Freq Control
.
2021
;
68
(
6
):
2038
47
.
16.
Ryou
H
,
Yaqub
M
,
Cavallaro
A
,
Papageorghiou
AT
,
Alison Noble
J
,
Noble
JA
.
Automated 3D ultrasound image analysis for first trimester assessment of fetal health
.
Phys Med Biol
.
2019
;
64
(
18
):
185010
.
17.
Yang
X
,
Yu
L
,
Li
S
,
Wen
H
,
Luo
D
,
Bian
C
, et al
.
Towards automated semantic segmentation in prenatal volumetric ultrasound
.
IEEE Trans Med Imaging
.
2019
;
38
(
1
):
180
93
.
18.
Lin
Q
,
Zhou
Y
,
Shi
S
,
Zhang
Y
,
Yin
S
,
Liu
X
, et al
.
How much can AI see in early pregnancy: a multi-center study of fetus head characterization in week 10–14 in ultrasound using deep learning
.
Comput Methods Programs Biomed
.
2022
:
226
.
19.
Nie
S
,
Yu
J
,
Chen
P
,
Wang
Y
,
Zhang
JQ
.
Automatic detection of standard sagittal plane in the first trimester of pregnancy using 3-D ultrasound data
.
Ultrasound Med Biol
.
2017
;
43
(
1
):
286
300
.
20.
Sciortino
G
,
Orlandi
E
,
Valenti
C
,
Tegolo
D
.
Wavelet analysis and neural network classifiers to detect mid-sagittal sections for nuchal translucency measurement
.
Image Anal Stereol
.
2016
;
35
(
2
):
105
15
. Available from: https://www.ias-iss.org/ojs/IAS/article/view/1352.
21.
Tsai
PYY
,
Hung
CHH
,
Chen
CYY
,
Sun
YNN
.
Automatic fetal middle sagittal plane detection in ultrasound using generative adversarial network
.
Diagnostics
.
2020
;
11
(
1
):
21
.
22.
Zhen
C
,
Wang
H
,
Cheng
J
,
Yang
X
,
Chen
C
,
Hu
X
, et al
.
Locating multiple standard planes in first-trimester ultrasound videos via the detection and scoring of key anatomical structures
.
Ultrasound Med Biol
.
2023
;
49
(
9
):
2006
16
.
23.
Deng
Y
,
Wang
Y
,
Chen
P
,
Yu
J
.
A hierarchical model for automatic nuchal translucency detection from ultrasound images
.
Comput Biol Med
.
2012
;
42
(
6
):
706
13
.
24.
Sciortino
G
,
Tegolo
D
,
Valenti
C
.
Automatic detection and measurement of nuchal translucency
.
Comput Biol Med
.
2017
;
82
:
12
20
.
25.
Wee
LK
,
Min
TY
,
Arooj
A
,
Supriyanto
E
.
Nuchal translucency marker detection based on artificial neural network and measurement via bidirectional iteration forward propagation
.
WSEAS Trans Inf Sci Appl
.
2010
;
7
(
8
):
1025
36
.
26.
Arora
U
,
Sengupta
D
,
Kumar
M
,
Tirupathi
K
,
Sai
MK
,
Hareesh
A
, et al
.
Perceiving placental ultrasound image texture evolution during pregnancy with normal and adverse outcome through machine learning prism
.
Placenta
.
2023
;
140
:
109
16
.
27.
Gupta
K
,
Balyan
K
,
Lamba
B
,
Puri
M
,
Sengupta
D
,
Kumar
M
.
Ultrasound placental image texture analysis using artificial intelligence to predict hypertension in pregnancy
.
J Maternal-Fetal Neonatal Med
.
35
(
25
),
2021
;
5587
94
.
28.
Sonia
R
,
Shanthi
V
.
Image classification for ultrasound fetal images with increased nuchal translucency during first trimester using SVM classifier
.
Res J Appl Sci Eng Technol
.
2015
;
9
(
2
):
113
21
.
29.
Walker
MC
,
Willner
I
,
Miguel
OX
,
Murphy
MSQ
,
El-Chaâr
D
,
Moretti
F
, et al
.
Using deep-learning in fetal ultrasound analysis for diagnosis of cystic hygroma in the first trimester
.
PLoS One
.
2022
;
17
(
6
):
e0269323
.
30.
Yasrab
R
,
Fu
Z
,
Zhao
H
,
Lee
LH
,
Sharma
H
,
Drukker
L
, et al
.
A machine learning method for automated description and workflow analysis of first trimester ultrasound scans
.
IEEE Trans Med Imaging
.
2023
;
42
(
5
):
1301
13
.
31.
Zhang
L
,
Dong
D
,
Sun
Y
,
Hu
C
,
Sun
C
,
Wu
Q
, et al
.
Development and validation of a deep learning model to screen for trisomy 21 during the first trimester from nuchal ultrasonographic images
.
JAMA Netw Open
.
2022
;
5
(
6
):
E2217854
.
32.
He
J
,
Baxter
SL
,
Xu
J
,
Xu
J
,
Zhou
X
,
Zhang
K
.
The practical implementation of artificial intelligence technologies in medicine
.
Nat Med
.
2019
;
25
(
1
):
30
6
.
33.
Guo
X
,
Yin
Y
,
Dong
C
,
Yang
G
,
Zhou
G
.
On the class imbalance problem
.
Proc 4th Int Conf Nat Comput ICNC
.
2008
;
4
:
192
201
.
34.
Subbaswamy
A
,
Saria
S
.
From development to deployment: dataset shift, causality, and shift-stable models in health AI
.
Biostatistics
.
2020
;
21
(
2
):
345
52
.
35.
Challen
R
,
Denny
J
,
Pitt
M
,
Gompels
L
,
Edwards
T
,
Tsaneva-Atanasova
K
.
Artificial intelligence, bias and clinical safety
.
BMJ Qual Saf
.
2019
;
28
(
3
):
231
7
.
36.
Hosny
A
,
Parmar
C
,
Quackenbush
J
,
Schwartz
LH
,
Aerts
HJWL
.
Artificial intelligence in radiology
.
Nat Rev Cancer
.
2018
;
18
(
8
):
500
10
.
37.
Wang
F
,
Kaushal
R
,
Khullar
D
.
Should health care demand interpretable artificial intelligence or accept “black Box” Medicine
.
Ann Intern Med
.
2020
;
172
(
1
):
59
.
38.
Davis
SE
,
Greevy
RA
,
Lasko
TA
,
Walsh
CG
,
Matheny
ME
.
Comparison of prediction model performance updating protocols: using a data-driven testing procedure to guide updating
.
AMIA Annu Symp Proc
.
2019
;
2019
:
1002
10
.
39.
Beam
AL
,
Manrai
AK
,
Ghassemi
M
.
Challenges to the reproducibility of machine learning models in health care
.
JAMA
.
2020
;
323
:
305
6
.
40.
Nagendran
M
,
Chen
Y
,
Lovejoy
CA
,
Gordon
AC
,
Komorowski
M
,
Harvey
H
, et al
.
Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies in medical imaging
.
BMJ
.
2020
;
368
:
1
12
.
41.
Williams
H
,
Pedrosa
J
,
Cattani
L
,
Housmans
S
,
Vercauteren
T
,
Deprest
J
, et al
.
Interactive segmentation via deep learning and B-spline explicit active surfaces
.
Lecture notes in computer science [internet]
.
Springer Science and Business Media Deutschland GmbH
;
2021
. p.
315
25
. Available from: https://link.springer.com/chapter/10.1007/978-3-030-87193-2_30.
42.
Littmann
M
,
Selig
K
,
Cohen-Lavi
L
,
Frank
Y
,
Hönigschmid
P
,
Kataka
E
, et al
.
Validity of machine learning in biology and medicine increased through collaborations across fields of expertise
.
Nat Mach Intell
.
2020
;
2
(
1
):
18
24
.
43.
Liu
X
,
Cruz Rivera
S
,
Moher
D
,
Calvert
MJ
,
Denniston
AK
,
SPIRIT-AI and CONSORT-AI Working Group
, et al
.
Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension
.
Nat Med
.
2020
;
26
(
9
):
1364
74
.
44.
Cruz Rivera
S
,
Liu
X
,
Chan
AW
,
Denniston
AK
,
Calvert
MJ
,
SPIRIT-AI and CONSORT-AI Working Group
, et al
.
Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension
.
Nat Med
.
2020
;
26
(
9
):
1351
63
.
45.
Collins
GS
,
Dhiman
P
,
Navarro
CLA
,
Ma
J
,
Hooft
L
,
Reitsma
JB
, et al
.
Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence
.
BMJ Open
.
2021
;
11
(
e048008
):
1
7
.
46.
Habli
I
,
Lawton
T
,
Porter
Z
.
Artificial intelligence in health care: accountability and safety
.
Bull World Health Organ
.
2020
;
98
(
4
):
251
6
.
47.
Price
WN
,
Gerke
S
,
Cohen
IG
.
Potential liability for physicians using artificial intelligence
.
JAMA
.
2019
;
322
:
1765
6
.
48.
Institute TAT
.
Frequently asked questions | the alan turing institute
. [Internet].
2020
. Available from: https://www.turing.ac.uk/about-us/frequently-asked-questions.
50.
Liu
Y
,
Chen
PHC
,
Krause
J
,
Peng
L
.
How to read articles that use machine learning: users’ guides to the medical literature
.
JAMA, J Am Med Assoc
.
2019
;
322
:
1806
16
.
51.
Lecun
Y
,
Bengio
Y
,
Hinton
G
.
Deep learning
.
Nature
.
2015
;
521
(
7553
):
436
44
.
52.
Long
J
,
Shelhamer
E
,
Darrell
T
.
Fully convolutional networks for semantic segmentation
.
Proceedings of the IEEE computer society conference on computer vision and pattern recognition
.
2015
; p.
431
40
.
53.
Ben-Cohen
A
,
Greenspan
H
.
Handbook of medical image computing and computer assisted intervention
.
Academic Press
;
2020
; p.
65
90
.
54.
Wang
XY
,
Wang
T
,
Bu
J
.
Color image segmentation using pixel wise support vector machine classification
.
Pattern Recognit
.
2011
;
44
(
4
):
777
87
.
55.
Noble
WS
.
What is a support vector machine
.
Nat Biotechnol
.
2006
;
24
:
1565
7
.
56.
Razavi
S
,
Tolson
BA
.
A new formulation for feedforward neural networks
.
IEEE Trans Neural Netw
.
2011
;
22
(
10
):
1588
98
.
57.
Goodfellow
I
,
Pouget-Abadie
J
,
Mirza
M
,
Xu
B
,
Warde-Farley
D
,
Ozair
S
, et al
.
Generative adversarial networks
.
Commun ACM
.
2020
;
63
(
11
):
139
44
.
58.
Grady
L
.
Random walks for image segmentation
.
IEEE Trans Pattern Anal Mach Intell
.
2006
;
28
(
11
):
1768
83
.
59.
Hua
Y
,
Guo
J
,
Zhao
H
.
Deep belief networks and deep learning
.
Proceedings of 2015 international conference on intelligent computing and internet of things
.
ICIT
;
2015
; p.
1
4
.
60.
Freund
Y
,
Schapire
RE
.
Experiments with a new boosting algorithm
.
ICML’96: proceedings of the thirteenth international conference on international conference on machine learning
.
1996
;
148
56
. [cited 2023 Dec 29] Available from: http://www.research.att.com/orgs/ssr/people/fyoav,schapireg/.
61.
Tu
Z
.
Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering
.
Tenth IEEE international conference on computer vision (ICCV’05)
;
2005
.
Vol. 1
.