Abstract
Background: Artificial intelligence (AI) using deep learning systems has recently been utilized in various medical fields. In the field of gastroenterology, AI is primarily implemented in image recognition and utilized in the realm of gastrointestinal (GI) endoscopy. In GI endoscopy, computer-aided detection/diagnosis (CAD) systems assist endoscopists in GI neoplasm detection or differentiation of cancerous or noncancerous lesions. Several AI systems for colorectal polyps have already been applied in colonoscopy clinical practices. In esophagogastroduodenoscopy, a few CAD systems for upper GI neoplasms have been launched in Asian countries. The usefulness of these CAD systems in GI endoscopy has been gradually elucidated. Summary: In this review, we outline recent articles on several studies of endoscopic AI systems for GI neoplasms, focusing on esophageal squamous cell carcinoma (ESCC), esophageal adenocarcinoma (EAC), gastric cancer (GC), and colorectal polyps. In ESCC and EAC, computer-aided detection (CADe) systems were mainly developed, and a recent meta-analysis study showed sensitivities of 91.2% and 93.1% and specificities of 80% and 86.9%, respectively. In GC, a recent meta-analysis study on CADe systems demonstrated that their sensitivity and specificity were as high as 90%. A randomized controlled trial (RCT) also showed that the use of the CADe system reduced the miss rate. Regarding computer-aided diagnosis (CADx) systems for GC, although RCTs have not yet been conducted, most studies have demonstrated expert-level performance. In colorectal polyps, multiple RCTs have shown the usefulness of the CADe system for improving the polyp detection rate, and several CADx systems have been shown to have high accuracy in colorectal polyp differentiation. Key Messages: Most analyses of endoscopic AI systems suggested that their performance was better than that of nonexpert endoscopists and equivalent to that of expert endoscopists. Thus, endoscopic AI systems may be useful for reducing the risk of overlooking lesions and improving the diagnostic ability of endoscopists.
Introduction
Recently, artificial intelligence (AI) has made remarkable progress in image recognition using deep learning systems, particularly convolutional neural networks, and has been applied to various medical fields [1]. Similarly, the implementation of AI has spread to the field of gastroenterology, including gastrointestinal (GI) endoscopy. In GI endoscopy, two main roles, computer-aided detection (CADe) and diagnosis (CADx), are essential. CADe helps endoscopists identify suspected lesions, while CADx aids in characterizing these lesions, particularly in distinguishing between neoplastic and nonneoplastic types or assessing the invasion depth of GI cancers.
In colonoscopy, multiple AI systems for colorectal polyps have been implemented clinically, with several randomized controlled and prospective trials [2, 3]. Their real-world performance will soon be evident. However, in esophagogastroduodenoscopy (EGD), although CADe systems for esophageal or gastric cancer (GC) are used in clinical practice, there are fewer clinical trials than for colonoscopy. Thus, AI in EGD is still in the development stage [3].
To clarify the usefulness of AI in GI neoplasms, we searched PubMed for original articles on AI and GI neoplasms using terms: “artificial intelligence, endoscopy, esophagogastroduodenoscopy, colonoscopy, deep learning, esophageal cancer, GC, colorectal polyps.” Further articles were found via manual searches and checking references of included articles. Potentially eligible articles were screened by screening the titles and abstracts, and we downloaded those full texts. In colorectal polyps, we focused on randomized controlled trials (RCTs) of CADe systems and prospective studies of CADx systems because of many retrospective reports. Reviews, commentaries, non-English publications, articles without detailed results, and articles using AI, except for CADe and CADx systems for GI neoplasms, were excluded. These search strategies are shown in Figure 1. In this literature review, we outline the latest research focusing on CADe and CADx systems for GI neoplasms, including esophageal squamous cell carcinoma (ESCC), esophageal adenocarcinoma (EAC), GC, and colorectal polyps, and discuss the prospects of AI in gastroenterology.
Esophageal Squamous Cell Carcinoma
Esophageal cancer is the sixth most common cause of cancer-related mortality and the eighth most common cancer worldwide [4]. The two main types of esophageal cancer are ESCC, accounting for 85% of all esophageal cancers, and EAC [5]. ESCC is the predominant type of esophageal cancer in Asia, Africa, and South America [6]. Previous reports showed that the 5-year survival rate for advanced ESCC ranges from 10 to 30%. Thus, the prognosis of advanced ESCC is poor [7]. In contrast, early-stage detection and endoscopic curative resection can improve ESCC patients’ prognosis and quality of life [8‒10]. Detecting early-stage ESCC is challenging due to minor superficial irregularities, especially under white light imaging (WLI). Image-enhanced endoscopy techniques like narrow band imaging (NBI) and blue laser imaging (BLI) aid in detecting superficial ESCC [11]. However, less experienced endoscopists may still find it challenging to detect superficial ESCC compared to their experienced endoscopists [12]. AI systems could assist in detecting and diagnosing superficial ESCC, increasing the chances for endoscopic treatment for ESCC patients.
CADe System for ESCC
The effectiveness of CADe systems in ESCC is well documented across various studies (Table 1) [13‒24], encompassing 10 retrospective, one prospective, and one RCT. These retrospective studies showed that CADe systems for early ESCC had sensitivity ranging from 55 to 100% and specificity from 30 to 99.4% [14‒23]. Additionally, a recent systematic review and meta-analysis found that the sensitivity and specificity were 91.2% (range, 84.3–95.2%) and 80% (range, 64.3–89.9%), respectively, in the use of CADe system for ESCC [25].
Study design . | Name, year . | Target lesion . | Imaging modality . | Accuracy (%) . | Sensitivity (%) . | Specificity (%) . |
---|---|---|---|---|---|---|
CADe system | ||||||
Prospective | Yuan et al. [13] (2023) | Superficial ESCC | NBI | 92.4 (internal test) | 96.5 (internal test) | 86.5 (internal test) |
89.9 (external test) | 93.6 (external test) | 84.5 (external test) | ||||
91.4 (video test) | 98.6 (video test) | 84.3 (video test) | ||||
Retrospective | Cai et al. [14] (2019) | Early ESCC | WLI | 91.4 | 97.8 | 85.4 |
Ohmori et al. [15] (2020) | Superficial ESCC | WLI, NBI | 81 (WLI) | 90 (WLI) | 70 (WLI) | |
77 (NBI/BLI) | 100 (NBI/BLI) | 63 (NBI/BLI) | ||||
Fukuda et al. [16] (2020) | Superficial ESCC | NBI, BLI | 63 | 91 | 51 | |
Wang et al. [17] (2021) | Superficial ESCC | WLI, NBI | 90.9 | 96.2 | 70.4 | |
Waki et al. [18] (2021) | Superficial ESCC | NBI, BLI | N/a | 85.7 | 40 | |
Shiroma et al. [19] (2021) | Superficial ESCC | WLI, NBI | N/a | 75 (WLI) | 30 (WLI) | |
55 (NBI) | 80 (NBI) | |||||
Yang et al. [20] (2021) | Early ESCC | WLI | 99.2 (non-ME image) | 97.4 (non-ME image) | 99.4 (non-ME image) | |
88.1 (ME image) | 90.9 (ME image) | 85.0 (ME image) | ||||
Tang et al. [21] (2021) | Early ESCC | WLI | N/a | 97.9 | 88.6 | |
Tajiri et al. [22] (2022) | Superficial ESCC | NBI | 80.9 | 85.5 | 75 | |
Liu et al. [23] (2022) | Early ESCC | WLI | 85.7 (internal test) | 92.6 (internal test) | 80.0 (internal test) | |
84.5 (external test) | 89.5 (external test) | 79.0 (external test) | ||||
RCT | Yuan et al. [24] (2024) | Superficial ESCC | WLI, NBI | Findings: per-lesion miss rates were 1.7% in the AI-first group versus 6.7% in the routine-first group (p = 0.079) | ||
CADx system | ||||||
Prospective | Tani et al. [26] (2023) | ESCC | M-NBI | 80.6 | 68.2 | 83.4 |
Retrospective | Everson et al. [27] (2019) | IPCLs | M-NBI | 93.7 | 89.3 | 98 |
Fukuda et al. [16] (2020) | Superficial ESCC | NBI, BLI | 88 | 86 | 89 |
Study design . | Name, year . | Target lesion . | Imaging modality . | Accuracy (%) . | Sensitivity (%) . | Specificity (%) . |
---|---|---|---|---|---|---|
CADe system | ||||||
Prospective | Yuan et al. [13] (2023) | Superficial ESCC | NBI | 92.4 (internal test) | 96.5 (internal test) | 86.5 (internal test) |
89.9 (external test) | 93.6 (external test) | 84.5 (external test) | ||||
91.4 (video test) | 98.6 (video test) | 84.3 (video test) | ||||
Retrospective | Cai et al. [14] (2019) | Early ESCC | WLI | 91.4 | 97.8 | 85.4 |
Ohmori et al. [15] (2020) | Superficial ESCC | WLI, NBI | 81 (WLI) | 90 (WLI) | 70 (WLI) | |
77 (NBI/BLI) | 100 (NBI/BLI) | 63 (NBI/BLI) | ||||
Fukuda et al. [16] (2020) | Superficial ESCC | NBI, BLI | 63 | 91 | 51 | |
Wang et al. [17] (2021) | Superficial ESCC | WLI, NBI | 90.9 | 96.2 | 70.4 | |
Waki et al. [18] (2021) | Superficial ESCC | NBI, BLI | N/a | 85.7 | 40 | |
Shiroma et al. [19] (2021) | Superficial ESCC | WLI, NBI | N/a | 75 (WLI) | 30 (WLI) | |
55 (NBI) | 80 (NBI) | |||||
Yang et al. [20] (2021) | Early ESCC | WLI | 99.2 (non-ME image) | 97.4 (non-ME image) | 99.4 (non-ME image) | |
88.1 (ME image) | 90.9 (ME image) | 85.0 (ME image) | ||||
Tang et al. [21] (2021) | Early ESCC | WLI | N/a | 97.9 | 88.6 | |
Tajiri et al. [22] (2022) | Superficial ESCC | NBI | 80.9 | 85.5 | 75 | |
Liu et al. [23] (2022) | Early ESCC | WLI | 85.7 (internal test) | 92.6 (internal test) | 80.0 (internal test) | |
84.5 (external test) | 89.5 (external test) | 79.0 (external test) | ||||
RCT | Yuan et al. [24] (2024) | Superficial ESCC | WLI, NBI | Findings: per-lesion miss rates were 1.7% in the AI-first group versus 6.7% in the routine-first group (p = 0.079) | ||
CADx system | ||||||
Prospective | Tani et al. [26] (2023) | ESCC | M-NBI | 80.6 | 68.2 | 83.4 |
Retrospective | Everson et al. [27] (2019) | IPCLs | M-NBI | 93.7 | 89.3 | 98 |
Fukuda et al. [16] (2020) | Superficial ESCC | NBI, BLI | 88 | 86 | 89 |
CADe, computer-assisted detection; CADx, computer-assisted diagnosis; ESCC, esophageal squamous cell carcinoma; NBI, narrow band imaging; WLI, white light imaging; BLI, blue laser imaging; ME, magnified endoscopy; RCT, randomized controlled trial; M-NBI, magnified narrow band imaging; IPCL, intraepithelial papillary capillary loop.
Few prospective studies and RCTs have assessed CADe systems for ESCC. Yuan et al. [13] conducted a prospective evaluation in 2023, using retrospectively collected training and test images from patients who underwent endoscopic resection for superficial ESCC and precancerous lesions. Experienced endoscopists employed the CADe system in a clinical setting, observing patients’ esophagus with or without these conditions. They reported sensitivity, specificity, and accuracy rates of 98.6%, 84.3%, and 91.4%, respectively. Moreover, in 2024, Yuan et al. [24] reported the first multicenter, tandem, double-blind RCT on the CADe system for ESCC. Patients who underwent EGD for screening were randomly assigned to either the AI-first or routine-first groups, and the same endoscopist performed the EGD on the same day. The eligible patients who were sedated before the tandem EGD, pathologists, and statistical analysts were blinded to this grouping, whereas the endoscopists knew the grouping before the tandem EGD. As a result, they found that per-lesion miss rates were 1.7% in the AI-first group versus 6.7% in the routine-first group (risk ratio, 0.25; 95% CI: 0.06–1.08; p = 0.079), while per-patient miss rates were 1.9% in the AI-first group versus 5.1% in the routine-first group (risk ratio, 0.37; 95% CI: 0.08–1.71; p = 0.40). In contrast, the detection rate in the first examination was 1.8% in the AI-first group versus 1.3% in the routine-first group (risk ratio, 1.38; 95% CI: 1.03–1.86; p = 0 · 03). Although there were no significant differences in the per-lesion and per-patient miss rates for superficial ESCC and precancerous lesions, the CADe system significantly increased the detection rate [24].
CADx System for ESCC
Because most reports on CADe systems for ESCC were aimed at both detecting superficial ESCC and differentiating neoplastic or nonneoplastic lesions, we categorized these reports as CADe systems for ESCC. Consequently, only three studies evaluated the performance of the CADx system for ESCC, including two retrospective studies and one prospective study (Table 1) [16, 26, 27].
Fukuda et al. [16] retrospectively evaluated AI systems for ESCC and divided them into CADx and CADe systems. They collected 23,746 images from 1,544 pathologically proven superficial ESCC and 4,587 images from 458 noncancerous and normal tissues using nonmagnified NBI and BLI. Sensitivities, specificity, and accuracy were 86%, 89%, and 88% for the CADx system and 74%, 76%, and 75% for the experts, respectively [16]. Tani et al.’s [26] 2023 prospective study first assessed a real-time CADx system for ESCC in a clinical setting. Endoscopists evaluated patients at high risk for ESCC, classifying lesions as cancerous or noncancerous under magnified NBI before the real-time CADx system diagnoses. The CADx system’s accuracy, sensitivity, and specificity were 80.6%, 68.2%, and 83.4%, respectively. The study found the AI’s diagnostic accuracy was 5.1% lower than endoscopists’, failing to prove CADx’s noninferiority for ESCC diagnosis.
Esophageal Adenocarcinoma
EAC constitutes 14% of all esophageal cancers [5] and is the most common type of esophageal cancer in Western countries, unlike ESCC [6]. Its incidence is rising in these regions, with most EAC cases diagnosed at an advanced stage [28]. Advanced EAC patients face a grim prognosis, with a 5-year survival rate under 20%, necessitating invasive treatments like extended surgical resection [29]. Conversely, superficial EAC cases treated with endoscopic resection show improved prognoses [30].
Barrett’s esophagus (BE), defined as the replacement of squamous epithelium with metaplastic epithelium due to gastroesophageal reflux disease, is a potential precancerous lesion for EAC [31]. In general, the prevalence of BE is low, and the risk of progression from nonneoplastic BE to EAC remains unclear [32]. These BE characteristics make the detection and differentiation between nonneoplastic and neoplastic BE challenging, even for experienced endoscopists. Additionally, random biopsies are recommended for patients with long-segment BE because of the high risk of EAC [32, 33]. If we can optically detect and diagnose early EAC or neoplastic BE using AI systems, we can increase the biopsy yield, and patients with EAC may have a better prognosis.
CADe System for EAC and BE
Many studies on CADe systems for detecting early EAC and neoplastic BE have been reported, mainly in Western countries (Table 2) [34‒46]. In a previous systematic review and meta-analysis of the CADe for EAC, the pooled sensitivity and specificity were 93.1% (range 86.8–96.4%) and 86.9% (range 81.7–90.7%), respectively [25].
Study design . | Name, year . | Target lesion . | Imaging modality . | Accuracy (%) . | Sensitivity (%) . | Specificity (%) . |
---|---|---|---|---|---|---|
CADe system | ||||||
Prospective | De Groof et al. [34] (2019) | Neoplastic BE | WLI | 92 | 95 | 85 |
De Groof et al. [35] (2020) | Early neoplasia in BE | WLI | 89 | 90 | 88 | |
De Groof et al. [36] (2020) | Barrett’s neoplasia | WLI | 90 | 91 | 89 | |
Hussein et al. [37] (2022) | Dysplastic BE | M-WLI, M-CE | 86 | 91 | 79 | |
Fockens et al. [38] (2023) | Early neoplasia in BE | WLI | N/a | 90 (image test) | 80 (image test) | |
91 (video test) | 82 (video test) | |||||
Fockens et al. [39] (2023) | Early neoplasia in BE | WLI | N/a | 88 | 64 | |
Abedelrahim et al. [40] (2023) | Barrett’s neoplasia | WLI | 92 | 93.8 | 90.7 | |
Retrospective | Van der Sommen et al. [41] (2016) | Early neoplastic lesions in BE | WLI | N/a | 83 | 83 |
Ghatwary et al. [42] (2019) | EAC | WLI | N/a | 93 | 93 | |
Horie et al. [43] (2019) | EAC | WLI, NBI | 90 | 100 | N/a | |
Hashimoto et al. [44] (2020) | Early neoplasia in BE | WLI, NBI | 95.4 | 96.4 | 94.2 | |
Iwagami et al. [45] (2021) | EAC | WLI, NBI, BLI | 66 | 94 | 42 | |
Tsai et al. [46] (2023) | BE | NBI | N/a | 94.2 | 94.4 | |
CADx system | ||||||
Prospective | Hussein et al. [47] (2023) | Dysplastic BE | M-WLI, M-CE | 91 (test 1) | 94 (test 1) | 86 (test 1) |
90 (test 2) | 92 (test 2) | 84 (test 2) | ||||
89 (test 3) | 92 (test 3) | 82 (test 3) | ||||
Retrospective | Ebigbo et al. [48] (2019) | Neoplastic BE | WLI, NBI | N/a | 97 (WLI) | 88 (WLI) |
94 (NBI) | 80 (NBI) | |||||
Ebigbo et al. [49] (2020) | Neoplastic BE | WLI, NBI | 89.9 | 83.7 | 100 | |
Struyvenberg et al. [50] (2021) | Neoplastic BE | NBI | 84 (image test) | 88 (image test) | 78 (image test) | |
83 (video test) | 85 (video test) | 83 (video test) |
Study design . | Name, year . | Target lesion . | Imaging modality . | Accuracy (%) . | Sensitivity (%) . | Specificity (%) . |
---|---|---|---|---|---|---|
CADe system | ||||||
Prospective | De Groof et al. [34] (2019) | Neoplastic BE | WLI | 92 | 95 | 85 |
De Groof et al. [35] (2020) | Early neoplasia in BE | WLI | 89 | 90 | 88 | |
De Groof et al. [36] (2020) | Barrett’s neoplasia | WLI | 90 | 91 | 89 | |
Hussein et al. [37] (2022) | Dysplastic BE | M-WLI, M-CE | 86 | 91 | 79 | |
Fockens et al. [38] (2023) | Early neoplasia in BE | WLI | N/a | 90 (image test) | 80 (image test) | |
91 (video test) | 82 (video test) | |||||
Fockens et al. [39] (2023) | Early neoplasia in BE | WLI | N/a | 88 | 64 | |
Abedelrahim et al. [40] (2023) | Barrett’s neoplasia | WLI | 92 | 93.8 | 90.7 | |
Retrospective | Van der Sommen et al. [41] (2016) | Early neoplastic lesions in BE | WLI | N/a | 83 | 83 |
Ghatwary et al. [42] (2019) | EAC | WLI | N/a | 93 | 93 | |
Horie et al. [43] (2019) | EAC | WLI, NBI | 90 | 100 | N/a | |
Hashimoto et al. [44] (2020) | Early neoplasia in BE | WLI, NBI | 95.4 | 96.4 | 94.2 | |
Iwagami et al. [45] (2021) | EAC | WLI, NBI, BLI | 66 | 94 | 42 | |
Tsai et al. [46] (2023) | BE | NBI | N/a | 94.2 | 94.4 | |
CADx system | ||||||
Prospective | Hussein et al. [47] (2023) | Dysplastic BE | M-WLI, M-CE | 91 (test 1) | 94 (test 1) | 86 (test 1) |
90 (test 2) | 92 (test 2) | 84 (test 2) | ||||
89 (test 3) | 92 (test 3) | 82 (test 3) | ||||
Retrospective | Ebigbo et al. [48] (2019) | Neoplastic BE | WLI, NBI | N/a | 97 (WLI) | 88 (WLI) |
94 (NBI) | 80 (NBI) | |||||
Ebigbo et al. [49] (2020) | Neoplastic BE | WLI, NBI | 89.9 | 83.7 | 100 | |
Struyvenberg et al. [50] (2021) | Neoplastic BE | NBI | 84 (image test) | 88 (image test) | 78 (image test) | |
83 (video test) | 85 (video test) | 83 (video test) |
CADe, computer-assisted detection; CADx, computer-assisted diagnosis; EAC, esophageal adenocarcinoma; BE, Barrett’s esophagus; WLI, white light imaging; M-WLI, magnified white light imaging; M-CE, magnified chromoendoscopy; NBI, narrow band imaging; BLI, blue laser imaging.
De Groof et al. [34] first reported a prospective CADe study for EAC in 2019. In 2020, they developed a CADe system for neoplastic BE detection, achieving sensitivities, specificities, and accuracies of 90%, 89%, and 88%, surpassing general endoscopists of 72%, 74%, and 73%, respectively [35]. The system also detected neoplastic BE with 91% accuracy in real-time endoscopic procedures in a subsequent pilot study [36]. In 2023, Abedelrahim et al. [40] conducted a multicenter prospective study with 32 neoplastic and 43 nonneoplastic BE videos, showing the CADe system’s superior sensitivity over nonexperts (93.8 vs. 63.5%; p < 0.001).
Recently, the incidence of EAC in Asian countries has increased, especially in Japan [51, 52]. Iwagami et al. [45] developed a CADe system based on Japanese EAC cases, which, in a retrospective study, demonstrated a sensitivity, specificity, and accuracy of 94%, 42%, and 66%, respectively. Further studies are needed not only in Western countries but also in other countries with lower incidences of EAC.
CADx System for EAC and BE
Similar to CADx systems for ESCC, few studies have reported on CADx systems for EAC [47‒50]. Ebigbo et al. [49] (2020) developed an AI system using WLI and NBI images from a real-time camera to differentiate nonneoplastic BE from early EAC, achieving 83.7% sensitivity, 100.0% specificity, and 89.9% accuracy. Hussein et al. [47] (2023) created a CADx system for diagnosing BE dysplasia using magnifying WLI and chromoendoscopy, collecting prospective videos from patients with dysplastic and nondysplastic BE across institutions. They conducted validation tests on high-quality still images (test 1), all video frames (test 2), and selected video sequences (test 3), attaining 90% accuracy and over 90% sensitivity in all tests. Test 3 showed a diagnostic speed of 0.0135 s per frame, suggesting the system’s potential for real-time diagnosis [47].
Gastric Cancer
GC is the fourth most common cause of cancer-related deaths and the fifth most common cancer worldwide [4]. East Asian countries, including Japan, account for most GCs and are affected by Helicobacter pylori infection [53]. The 5-year overall survival rate for stage I GC exceeds 95%, while stage IV GC shows a survival rate below 20% [54], highlighting the importance of early detection and treatment for better outcomes.
EGD plays a crucial role in the diagnosis and treatment of GC [55]; however, early detection and diagnosis remain challenging. GC often has a background of atrophic gastritis caused by H. pylori infection, making the detection and diagnosis of early GC difficult because only subtle morphological changes occur in early GC lesions [56]. Missed rates for GC in screening EGD are reported at 5.7%, with less experienced endoscopists showing false-negative rates up to 11.5% [57, 58]. To address this, computer-aided detection/diagnosis (CAD) systems for GC have been actively developed.
CADe System for GC
The first CADe system for GC was developed by Hirasawa et al. in 2018 [59]. The CADe system was trained on 13,584 endoscopic still images of GC and validated on 2,296 independent still images. They showed that the CADe system had a sensitivity of 92.2% for detecting GC in image analysis. Since then, many studies on CADe systems have been conducted (Table 3) [59‒76]. Retrospective reports indicate CADe systems for GC achieved sensitivities of 58.4–100% and specificities of 79.1–97.6% [59, 64‒75]. A recent meta-analysis showed CADe system sensitivity and specificity for early GC were both 90% (95% CI: 0.86–0.93 and 0.86–0.92, respectively) [77].
Study design . | Name, year . | Target lesion . | Imaging modality . | Accuracy (%) . | Sensitivity (%) . | Specificity (%) . |
---|---|---|---|---|---|---|
CADe system | ||||||
Prospective | Luo et al. [60] (2019) | Upper GI cancers | WLI | 92.7 | 94.6 | 92.6 |
Wu et al. [61] (2022) | Gastric neoplasms | WLI | 91 | 87.8 | 93.2 | |
Wu et al. [62] (2022) | Gastric lesions | WLI | N/a | 98.2 (internal test 1) | 98.4 (internal test 1) | |
96.9 (internal test 2) | 90.6 (internal test 2) | |||||
95.6 (external test) | 90.8 (external test) | |||||
Gong et al. [63] (2023) | AGC, EGC, and dysplasia | WLI | Findings: the lesion detection rate was 95.6% (internal test) | |||
Retrospective | Hirasawa et al. [59] (2018) | GC | WLI, CE, NBI | N/a | 92.2 | N/a |
Sakai et al. [64] (2018) | EGC | WLI | 87.6 | 80 | 94.8 | |
Yoon et al. [65] (2019) | EGC | WLI | N/a | 91 | 97.6 | |
Ishioka et al. [66] (2019) | EGC | WLI, CE, NBI | N/a | 94.1 | N/a | |
Wu et al. [67] (2019) | EGC | WLI, NBI, BLI | 92.5 | 94 | 91 | |
Tang et al. [68] (2020) | EGC | WLI | 87.8 | 95.5 | 81.7 | |
Ikenoyama et al. [69] (2021) | GC | WLI, CE, NBI | N/a | 58.4 | 87.3 | |
Nam et al. [70] (2022) | Pathologic mucosal lesions | WLI | N/a | N/a | N/a | |
Niikura et al. [71] (2022) | GC | WLI | N/a | 100 | N/a | |
Jin et al. [72] (2022) | EGC | WLI, NBI | 85.2 | 93 | 82.6 | |
Zhou et al. [73] (2023) | EGC | WLI | 88.3 | 84.5 | 90.5 | |
Su et al. [74] (2023) | EGC | WLI | 90.8 | 81.1 | 93.5 | |
Quek et al. [75] (2023) | Neoplastic lesion | WLI | 77.7 | 59.1 | 79.1 | |
RCT | Wu et al. [76] (2021) | Gastric neoplasms | WLI | Findings: the gastric neoplasm miss rate was significantly lower in the AI-first group than in the routine-first group (6.1 vs. 27.3%; p = 0.015) | ||
CADx system | ||||||
Prospective | He et al. [78] (2021) | EGC | M-NBI | 83.6 | 92.5 | 82.5 |
Wu et al. [61] (2022) | EGC | M-NBI | 89 | 100 | 82.5 | |
Wu et al. [62] (2022) | Gastric neoplasms | WLI | 88.8 (internal test) | 92.9 (internal test) | 88 (internal test) | |
88.6 (external test) | 91.7 (external test) | 88.8 (external test) | ||||
72.0 (videos) | 100 (videos) | 53.2 (videos) | ||||
Gong et al. [63] (2023) | AGC, EGC, and dysplasia | WLI | 81.5 | 82.6 | 84.8 | |
Retrospective | Horiuchi et al. [79] (2020) | EGC | M-NBI | 85.1 | 87.4 | 82.8 |
Horiuchi et al. [80] (2020) | EGC | M-NBI | 85.3 | 95.4 | 71 | |
Namikawa et al. [81] (2020) | GC | WLI, NBI | 99 | 99 | 93.3 | |
Li et al. [82] (2020) | EGC | M-NBI | 90.9 | 91.1 | 90.64 | |
Ueyama et al. [83] (2021) | EGC | M-NBI | 98.7 | 98 | 100 | |
Hu et al. [84] (2021) | EGC | M-NBI | 77 | 79.2 | 74.5 | |
Zhang et al. [85] (2021) | EGC and HGIN | WLI | 78.7 | 36.8 | 91.2 | |
Li et al. [86] (2022) | EGC | M-NBI | 88.7 | 86.3 | 91.6 | |
Yao et al. [87] (2022) | EGC | WLI | 85.1 (test 1) | 85.3 (test1) | 84.4 (test 1) | |
86.0 (test 2) | 83.0 (test 2) | 92.2 (test 2) | ||||
Tang et al. [88] (2022) | EGC | NBI | 93.2 | 99 | 87.3 | |
Jin et al. [89] (2022) | EGC | WLI | Range 73.5–93.6 | Range 16.1–86.4 | Range 78.0–99.1 | |
Noda et al. [90] (2022) | EGC | Endocytoscopy | 83.2 | 76.4 | 92.3 | |
Gong et al. [91] (2022) | EGC | M-NBI | 79.5 | 72.2 | 85.5 | |
Nam et al. [70] (2020) | BGU, EGC, and AGC | WLI | BGU 95, EGC 89, AGC 93 (internal test) | BGU 63, EGC 94, AGC 90 (internal test) | BGU 98, EGC 82, AGC 94 (internal test) | |
BGU 86, EGC 79, AGC 79 (external test) | BGU 68, EGC 77, AGC 56 (external test) | BGU 50, EGC 89, AGC 47 (external test) | ||||
Yuan et al. [92] (2022) | EGC, AGC, SMT, polyp, ulcer, erosion, and lesion-free gastric mucosa | WLI | 85.7 | N/a | N/a | |
Ishioka et al. [93] (2023) | EGC and adenomas | WLI | 70.8 | 84.7 | 58.2 |
Study design . | Name, year . | Target lesion . | Imaging modality . | Accuracy (%) . | Sensitivity (%) . | Specificity (%) . |
---|---|---|---|---|---|---|
CADe system | ||||||
Prospective | Luo et al. [60] (2019) | Upper GI cancers | WLI | 92.7 | 94.6 | 92.6 |
Wu et al. [61] (2022) | Gastric neoplasms | WLI | 91 | 87.8 | 93.2 | |
Wu et al. [62] (2022) | Gastric lesions | WLI | N/a | 98.2 (internal test 1) | 98.4 (internal test 1) | |
96.9 (internal test 2) | 90.6 (internal test 2) | |||||
95.6 (external test) | 90.8 (external test) | |||||
Gong et al. [63] (2023) | AGC, EGC, and dysplasia | WLI | Findings: the lesion detection rate was 95.6% (internal test) | |||
Retrospective | Hirasawa et al. [59] (2018) | GC | WLI, CE, NBI | N/a | 92.2 | N/a |
Sakai et al. [64] (2018) | EGC | WLI | 87.6 | 80 | 94.8 | |
Yoon et al. [65] (2019) | EGC | WLI | N/a | 91 | 97.6 | |
Ishioka et al. [66] (2019) | EGC | WLI, CE, NBI | N/a | 94.1 | N/a | |
Wu et al. [67] (2019) | EGC | WLI, NBI, BLI | 92.5 | 94 | 91 | |
Tang et al. [68] (2020) | EGC | WLI | 87.8 | 95.5 | 81.7 | |
Ikenoyama et al. [69] (2021) | GC | WLI, CE, NBI | N/a | 58.4 | 87.3 | |
Nam et al. [70] (2022) | Pathologic mucosal lesions | WLI | N/a | N/a | N/a | |
Niikura et al. [71] (2022) | GC | WLI | N/a | 100 | N/a | |
Jin et al. [72] (2022) | EGC | WLI, NBI | 85.2 | 93 | 82.6 | |
Zhou et al. [73] (2023) | EGC | WLI | 88.3 | 84.5 | 90.5 | |
Su et al. [74] (2023) | EGC | WLI | 90.8 | 81.1 | 93.5 | |
Quek et al. [75] (2023) | Neoplastic lesion | WLI | 77.7 | 59.1 | 79.1 | |
RCT | Wu et al. [76] (2021) | Gastric neoplasms | WLI | Findings: the gastric neoplasm miss rate was significantly lower in the AI-first group than in the routine-first group (6.1 vs. 27.3%; p = 0.015) | ||
CADx system | ||||||
Prospective | He et al. [78] (2021) | EGC | M-NBI | 83.6 | 92.5 | 82.5 |
Wu et al. [61] (2022) | EGC | M-NBI | 89 | 100 | 82.5 | |
Wu et al. [62] (2022) | Gastric neoplasms | WLI | 88.8 (internal test) | 92.9 (internal test) | 88 (internal test) | |
88.6 (external test) | 91.7 (external test) | 88.8 (external test) | ||||
72.0 (videos) | 100 (videos) | 53.2 (videos) | ||||
Gong et al. [63] (2023) | AGC, EGC, and dysplasia | WLI | 81.5 | 82.6 | 84.8 | |
Retrospective | Horiuchi et al. [79] (2020) | EGC | M-NBI | 85.1 | 87.4 | 82.8 |
Horiuchi et al. [80] (2020) | EGC | M-NBI | 85.3 | 95.4 | 71 | |
Namikawa et al. [81] (2020) | GC | WLI, NBI | 99 | 99 | 93.3 | |
Li et al. [82] (2020) | EGC | M-NBI | 90.9 | 91.1 | 90.64 | |
Ueyama et al. [83] (2021) | EGC | M-NBI | 98.7 | 98 | 100 | |
Hu et al. [84] (2021) | EGC | M-NBI | 77 | 79.2 | 74.5 | |
Zhang et al. [85] (2021) | EGC and HGIN | WLI | 78.7 | 36.8 | 91.2 | |
Li et al. [86] (2022) | EGC | M-NBI | 88.7 | 86.3 | 91.6 | |
Yao et al. [87] (2022) | EGC | WLI | 85.1 (test 1) | 85.3 (test1) | 84.4 (test 1) | |
86.0 (test 2) | 83.0 (test 2) | 92.2 (test 2) | ||||
Tang et al. [88] (2022) | EGC | NBI | 93.2 | 99 | 87.3 | |
Jin et al. [89] (2022) | EGC | WLI | Range 73.5–93.6 | Range 16.1–86.4 | Range 78.0–99.1 | |
Noda et al. [90] (2022) | EGC | Endocytoscopy | 83.2 | 76.4 | 92.3 | |
Gong et al. [91] (2022) | EGC | M-NBI | 79.5 | 72.2 | 85.5 | |
Nam et al. [70] (2020) | BGU, EGC, and AGC | WLI | BGU 95, EGC 89, AGC 93 (internal test) | BGU 63, EGC 94, AGC 90 (internal test) | BGU 98, EGC 82, AGC 94 (internal test) | |
BGU 86, EGC 79, AGC 79 (external test) | BGU 68, EGC 77, AGC 56 (external test) | BGU 50, EGC 89, AGC 47 (external test) | ||||
Yuan et al. [92] (2022) | EGC, AGC, SMT, polyp, ulcer, erosion, and lesion-free gastric mucosa | WLI | 85.7 | N/a | N/a | |
Ishioka et al. [93] (2023) | EGC and adenomas | WLI | 70.8 | 84.7 | 58.2 |
CADe, computer-assisted detection; CADx, computer-assisted diagnosis; GC, gastric cancer; GI, gastrointestinal; AGC, advanced gastric cancer; EGC, early gastric cancer; WLI, white light imaging; CE, chromoendoscopy; NBI, narrow band imaging; BLI, blue laser imaging; RCT, randomized controlled trial; M-NBI, magnified narrow band imaging; HGIN, high-grade intraepithelial neoplasia; BGU, benign gastric ulcer; SMT, submucosal tumor.
A few prospective studies compared the performance to detect early GC between CADe systems endoscopists. In 2019, Luo et al. [60] first reported a prospective multicenter study to evaluate the CADe system for detecting GC and ESCC, and they demonstrated a high accuracy of 92.7% for cancer detection. Wu et al. reported several studies using a CAD system for gastric neoplasms named ENDOANGEL. ENDOANGEL has various functions, including detection of early GC (CADe system), differentiation of neoplastic and nonneoplastic lesions (CADx system), and prediction of tumor invasion depth [61, 62]. In a tandem RCT, they evaluated the miss rate of gastric neoplasms between routine EGD (routine-first group) and EGD with the CADe system (AI-first group) and showed that the miss rate of neoplasms in the AI-first group was significantly lower than that in the routine-first group (6.1 vs. 27.3%; p = 0.015). They suggested that overlooked gastric neoplastic lesions could be reduced using the CADe system [76]. In addition, a new prospective multicenter RCT using ENDOANGEL-GC, which simultaneously combines blind spot monitoring and detection systems in real time, is in progress; the detection rate of gastric neoplasms might be higher than that in previous studies [94].
CADx System for GC
Several CADx systems for GC have been developed (Table 3) [61‒63, 70, 78‒93], achieving sensitivities of 36.8–99% and specificities of 58.2–100% [70, 79‒93]. Wu et al. [61] evaluated ENDOANGEL’s diagnostic performance using magnified NBI images prospectively, achieving 100% sensitivity and 82.5% specificity, comparable to expert endoscopists. They also reported a CADx system for early GC under WLI with 92.9% sensitivity and 91.7% specificity, demonstrating high accuracy in differentiating early GC even under normal WLI [61]. In Korea, Gong et al. developed an AI system for GC classification into advanced GC, early GC, dysplasia, and nonneoplastic lesions under nonmagnified WLI. Testing on 3,976 prospectively collected images from five institutions showed the CADx system achieved 81.5% accuracy [63].
In contrast, Yuan et al. [92] first reported that the CADx system with normal WLI can classify lesions into not only neoplastic/nonneoplastic but also multiple classes. The CADx system was trained to classify lesions as early GC, advanced GC, submucosal tumors, polyps, ulcers, erosions, or normal mucosa. As a result, the overall accuracy was 85.7%, which was equivalent to that of expert endoscopists (85.1%) and higher than that of nonexpert endoscopists (78.8%) [92]. Recently, Ishioka et al. [93] developed a CADx system for early GC in Japan, which can diagnose GC under normal WLI in still images. The performance of the CADx was evaluated using still images of 150 neoplastic and 165 nonneoplastic lesions and demonstrated a superior sensitivity to that of specialists (84.7 vs. 65.8%) [93]. The performance of this endoscopic AI system was also externally validated in Singapore, demonstrating higher sensitivity in detecting high-grade dysplasia compared to endoscopists (80 vs. 29.1%; p = 0.0011) [75]. An RCT of CADx for GC has not yet been conducted, and future studies are necessary to evaluate the performance of CADx systems for GC in clinical settings.
Colorectal Polyps
Colorectal cancer is a major global cause of cancer-related deaths, and the complete removal of tumor lesions during colonoscopy is crucial for its prevention [95, 96]. However, the detection rate of lesions during colonoscopy is highly dependent on the performance of endoscopists, particularly less experienced ones, posing a challenge to achieving a high adenoma detection rate (ADR) [97]. A decrease in ADR is associated with an increase in colorectal cancer incidence [98]. Therefore, standardizing the quality of examinations and reducing oversight of tumor lesions is essential, irrespective of the endoscopist’s experience.
Furthermore, qualitative endoscopic diagnosis of discovered lesions relies on endoscopists. While opportunities for qualitative diagnosis are increasing with the growing number of colorectal polyp cases, performing highly accurate endoscopic diagnoses, especially by nonspecialists, is considered challenging [99]. Computer assistance using AI has been developed to mitigate human-dependent errors and ensure the quality of examinations. The following section discusses the current state of research on CAD systems for colorectal polyps, with a focus on prospective studies.
CADe System for Colorectal Polyp
Research on CADe for colorectal lesions began in the early 2000s [100, 101], but faced challenges like small sample sizes and high false positives. The adoption of deep learning in the late 2010s significantly improved speed and accuracy, revitalizing CADe interest. Misawa et al.’s [102] 2018 CADe reached 90% sensitivity. Later studies showed CADe’s sensitivity and specificity exceed 90% in real-time analysis, underscoring its utility [103, 104].
Against the backdrop of favorable outcomes in retrospective trials, several prospective RCTs have been conducted using CADe systems for colorectal polyps (Table 4) [105‒127]. Wang et al. [105] reported that the use of the CADe system significantly increased ADR in a single-center RCT involving a total of 1,058 participants (29.1 vs. 20.3%; p < 0.001). Similarly, multiple reports indicated a significant increase in ADR or polyp detection rate (PDR) with CADe using different algorithms [106‒109, 111]. Shaukat et al. [115] reported that the adenomas per colonoscopy significantly increased with the use of CADe (1.05 vs. 0.83; p = 0.002). Additionally, several studies reported a reduction in adenoma miss rate with the use of CADe [110, 112, 116, 117]. Furthermore, Wang et al. [111] reported a blind RCT using sham AI with intentionally false outputs, demonstrating that the CADe system contributed to an increase in ADR (34 vs. 28%; p = 0.030), while Xu et al. [113] reported no increase in PDR despite using CADe (38.8 vs. 36.2%; p = 0.183).
Name, year . | CADe system . | Validation dataset . | Findings . |
---|---|---|---|
Wang et al. [105] (2019) | EndoScreener | 1,058 patients | AI system significantly increased ADR (29.1 vs. 20.3%; p <0.001) |
Su et al. [106] (2019) | AQCS | 623 patients | AI significantly increased ADR (28.9 vs. 16.5%; p <0.001) |
Liu et al. [107] (2020) | HenanTongyu | 1,026 patients | AI significantly increased ADR (39 vs. 24%; p <0.001) |
Gong et al. [108] (2020) | ENDOANGEL | 704 patients | AI significantly increased ADR (16 vs. 8%; p = 0.001) |
Repici et al. [109] (2020) | GI Genius | 685 patients | AI significantly increased ADR (54.8 vs. 40.4%) |
Wang et al. [110] (2020) | EndoScreener | 369 patients | AMR was significantly lower with AI (13.9 vs. 40.0%; p <0.001) |
Wang et al. [111] (2020) | EndoScreener | 962 patients | AI system significantly increased ADR than the sham system (34 vs. 28%; p = 0.030) |
Kamba et al. [112] (2021) | - | 358 patients | AMR was significantly lower with AI (13.8 vs. 40.6%; p <0.001) |
Xu et al. [113] (2021) | - | 2,325 patients | AI system did not significantly increase PDR (38.8 vs. 36.2%; p = 0.183) |
Luo et al. [114] (2021) | - | 150 patients | AI system significantly increased PDR (38.7 vs. 34.0%; p <0.001) |
Shaukat et al. [115] (2022) | - | 1,359 patients | AI system significantly increased adenomas per colonoscopy (1.05 vs. 0.83; p = 0.002) |
Wallace et al. [116] (2022) | GI Genius | 230 patients | AMR was significantly lower with AI (15.5 vs. 32.4%; p <0.001) |
Glissen Brown et al. [117] (2022) | EndoScreener | 223 patients | AMR was significantly lower with AI than with high-definition white light colonoscopy (20.12 vs. 31.25%; p = 0.025) |
Lui et al. [118] (2023) | - | 216 patients | AI significantly increased ADR in proximal colon (44.7 vs. 34.6%) |
Ahmad et al. [119] (2023) | GI Genius | 614 patients | AI system did not significantly increase ADR (71.4 vs. 65.0%; p = 0.09) |
Mangas-Sanjuan et al. [120] (2023) | GI Genius | 3,213 patients | AI system did not significantly increase advanced colorectal neoplasia detection rate (34.8 vs. 34.6%; p = 0.91) |
Karsenti et al. [121] (2023) | GI Genius | 2,015 patients | AI system slightly increased ADR (37.5 vs. 33.7%; p = 0.051) |
Wei et al. [122] (2023) | EndoVigilant | 769 patients | AI system did not significantly increase ADR (35.9 vs. 37.2%; p = 0.774) |
Nakashima et al. [123] (2023) | CAD EYE | 415 patients | AI system significantly increased ADR (59.4 vs. 47.6%; p = 0.018) |
Gimeno-García et al. [124] (2024) | ENDO-AID | 370 patients | AI system significantly increased ADR (55.1 vs. 43.8%; p = 0.029) |
Yao et al. [125] (2024) | ENDOANGEL | 685 patients | AMR was significantly lower with AI (18.82 vs. 43.69%; p <0.001) |
Schöler et al. [126] (2024) | CAD EYE | 286 patients | AI system did not significantly increase ADR (43 vs. 41%; p = 0.696) |
Yamaguchi et al. [127] (2024) | CAD EYE | 231 patients | AMR was significantly lower with AI (25.6 vs. 38.6%; p = 0.033) |
Name, year . | CADe system . | Validation dataset . | Findings . |
---|---|---|---|
Wang et al. [105] (2019) | EndoScreener | 1,058 patients | AI system significantly increased ADR (29.1 vs. 20.3%; p <0.001) |
Su et al. [106] (2019) | AQCS | 623 patients | AI significantly increased ADR (28.9 vs. 16.5%; p <0.001) |
Liu et al. [107] (2020) | HenanTongyu | 1,026 patients | AI significantly increased ADR (39 vs. 24%; p <0.001) |
Gong et al. [108] (2020) | ENDOANGEL | 704 patients | AI significantly increased ADR (16 vs. 8%; p = 0.001) |
Repici et al. [109] (2020) | GI Genius | 685 patients | AI significantly increased ADR (54.8 vs. 40.4%) |
Wang et al. [110] (2020) | EndoScreener | 369 patients | AMR was significantly lower with AI (13.9 vs. 40.0%; p <0.001) |
Wang et al. [111] (2020) | EndoScreener | 962 patients | AI system significantly increased ADR than the sham system (34 vs. 28%; p = 0.030) |
Kamba et al. [112] (2021) | - | 358 patients | AMR was significantly lower with AI (13.8 vs. 40.6%; p <0.001) |
Xu et al. [113] (2021) | - | 2,325 patients | AI system did not significantly increase PDR (38.8 vs. 36.2%; p = 0.183) |
Luo et al. [114] (2021) | - | 150 patients | AI system significantly increased PDR (38.7 vs. 34.0%; p <0.001) |
Shaukat et al. [115] (2022) | - | 1,359 patients | AI system significantly increased adenomas per colonoscopy (1.05 vs. 0.83; p = 0.002) |
Wallace et al. [116] (2022) | GI Genius | 230 patients | AMR was significantly lower with AI (15.5 vs. 32.4%; p <0.001) |
Glissen Brown et al. [117] (2022) | EndoScreener | 223 patients | AMR was significantly lower with AI than with high-definition white light colonoscopy (20.12 vs. 31.25%; p = 0.025) |
Lui et al. [118] (2023) | - | 216 patients | AI significantly increased ADR in proximal colon (44.7 vs. 34.6%) |
Ahmad et al. [119] (2023) | GI Genius | 614 patients | AI system did not significantly increase ADR (71.4 vs. 65.0%; p = 0.09) |
Mangas-Sanjuan et al. [120] (2023) | GI Genius | 3,213 patients | AI system did not significantly increase advanced colorectal neoplasia detection rate (34.8 vs. 34.6%; p = 0.91) |
Karsenti et al. [121] (2023) | GI Genius | 2,015 patients | AI system slightly increased ADR (37.5 vs. 33.7%; p = 0.051) |
Wei et al. [122] (2023) | EndoVigilant | 769 patients | AI system did not significantly increase ADR (35.9 vs. 37.2%; p = 0.774) |
Nakashima et al. [123] (2023) | CAD EYE | 415 patients | AI system significantly increased ADR (59.4 vs. 47.6%; p = 0.018) |
Gimeno-García et al. [124] (2024) | ENDO-AID | 370 patients | AI system significantly increased ADR (55.1 vs. 43.8%; p = 0.029) |
Yao et al. [125] (2024) | ENDOANGEL | 685 patients | AMR was significantly lower with AI (18.82 vs. 43.69%; p <0.001) |
Schöler et al. [126] (2024) | CAD EYE | 286 patients | AI system did not significantly increase ADR (43 vs. 41%; p = 0.696) |
Yamaguchi et al. [127] (2024) | CAD EYE | 231 patients | AMR was significantly lower with AI (25.6 vs. 38.6%; p = 0.033) |
ADR, adenoma detection rate; PDR, polyp detection rate; AMR, adenoma miss rate.
A common feature observed in these studies is that CADe increases the detection rate of relatively low-grade lesions with a diameter of less than 10 mm or sessile serrated lesions [105‒115, 128]. However, the detection rate of high-grade lesions with a diameter of 10 mm or more did not change significantly using CADe [105‒115]. The CADe system by Gong et al. [108] enhances the detection of larger lesions and also tracks insertion, observation times, and colon preparation, indicating that lesion detection is influenced by both endoscopist performance and colonoscopy quality. Ahmad et al. [119] found that expert endoscopists using CADe increased PDR (85.7 vs. 79.7%; p = 0.05) but not ADR (71.4 vs. 65.0%; p = 0.09), suggesting CADe’s effectiveness varies with endoscopist skill. An RCT by Yamaguchi et al. [127] showed CADe aids trainees’ colonoscopy proficiency, potentially reducing missed adenomas, highlighting its benefit for novices.
Multiple CADe systems have been approved for use in various countries. Although RCTs using these commercial CADe systems have been reported, the results remain controversial [119‒127]. Therefore, further large-scale studies using these systems are required. These studies are expected to clarify the specific targets and conditions under which CADe is useful.
CADx System for Colorectal Polyp
Most polyps detected during colonoscopy are small and have a low probability of developing into advanced neoplasia or invasive cancer. Therefore, the implementation of CADx to accurately diagnose lesions could enable strategies such as “leave-in-situ” (opting not to resect diminutive polyps) or “resect-and-discard” (skipping pathological diagnosis for small lesions) approaches, which have the potential to significantly reduce costs and enhance efficiency [129].
Table 5 summarizes prospective studies conducted on CADx for colorectal polyps [130‒139]. Currently, NBI and BLI are the most widely used CADx modalities for colorectal lesions. These techniques involve image-enhanced observation methods that allow for the detailed evaluation of surface vessels, making them more suitable than WLI for the qualitative diagnosis of lesions. In 2011, Gross et al. [130] used a support vector machine algorithm to predict pathological diagnoses using magnified NBI images. In a prospective validation, they reported favorable results with a sensitivity of 95% and accuracy of 93% for differentiating tumor lesions. Additionally, Kominami et al. [131] reported real-time examinations using NBI with an accuracy of 97% for differentiating adenomas.
Name, year . | CADx system . | Imaging modality . | Validation dataset . | Accuracy (%) . | Sensitivity (%) . | Specificity (%) . |
---|---|---|---|---|---|---|
Gross et al. [130] (2011) | - | NBI | 214 patients, 434 polyps | 93.1 | 95.0 | 90.3 |
Kominami et al. [131] (2016) | - | NBI | 1,026 patients | 97.5 | 93.0 | 93.3 |
van der Zander et al. [132] (2021) | - | HDWL/BLI | 54 patients, 60 polyps | 95.0 | 95.6 | 93.3 |
Houwen et al. [133] (2023) | POLAR | NBI | 194 patients, 423 polyps | 79 | 89 | 38 |
Dos Santos et al. [134] (2023) | CAD EYE | BLI | 110 polyps | 81.8 | 76.3 | 96.7 |
Li et al. [135] (2023) | CAD EYE | NBI | 320 patients, 661 polyps | 71.6 | 61.8 | 87.4 |
Djinbachian et al. [136] (2024) | CAD EYE | BLI | 467 patients, 337 polyps | 77.2 | 84.8 | 64.4 |
Min et al. [137] (2019) | - | Linked color imaging | 91 patients, 181 polyps | 87.0 | 87.1 | 87.0 |
Baumer et al. [138] (2023) | GI Genius | WLI | 156 patients, 262 polyps | 84.4 | 89.7 | 75.3 |
Mori et al. [139] (2018) | - | Endocytoscopy/NBI | 791 patients, 466 polyps | - | 93.8 | 91.0 |
Name, year . | CADx system . | Imaging modality . | Validation dataset . | Accuracy (%) . | Sensitivity (%) . | Specificity (%) . |
---|---|---|---|---|---|---|
Gross et al. [130] (2011) | - | NBI | 214 patients, 434 polyps | 93.1 | 95.0 | 90.3 |
Kominami et al. [131] (2016) | - | NBI | 1,026 patients | 97.5 | 93.0 | 93.3 |
van der Zander et al. [132] (2021) | - | HDWL/BLI | 54 patients, 60 polyps | 95.0 | 95.6 | 93.3 |
Houwen et al. [133] (2023) | POLAR | NBI | 194 patients, 423 polyps | 79 | 89 | 38 |
Dos Santos et al. [134] (2023) | CAD EYE | BLI | 110 polyps | 81.8 | 76.3 | 96.7 |
Li et al. [135] (2023) | CAD EYE | NBI | 320 patients, 661 polyps | 71.6 | 61.8 | 87.4 |
Djinbachian et al. [136] (2024) | CAD EYE | BLI | 467 patients, 337 polyps | 77.2 | 84.8 | 64.4 |
Min et al. [137] (2019) | - | Linked color imaging | 91 patients, 181 polyps | 87.0 | 87.1 | 87.0 |
Baumer et al. [138] (2023) | GI Genius | WLI | 156 patients, 262 polyps | 84.4 | 89.7 | 75.3 |
Mori et al. [139] (2018) | - | Endocytoscopy/NBI | 791 patients, 466 polyps | - | 93.8 | 91.0 |
NBI, narrow band imaging; HDWL, high-definition white light; BLI, blue laser imaging; WLI, white light imaging.
Regarding prospective studies on convolutional neural network-based CADx using image-enhanced endoscopy, van der Zander et al. [132] reported an accuracy of 95.0% in distinguishing tumor lesions under NBI. Furthermore, an algorithm developed by Houwen et al. [133] using BLI demonstrated a sensitivity of 89%, a specificity of 38%, and an overall accuracy of 79% in diagnosing neoplastic lesions, showing comparable diagnostic capabilities to endoscopists (p = 0.10). In prospective studies, the CAD-EYE system differentiated neoplastic from nonneoplastic lesions with accuracies reported by Dos Santos et al. (81.8%) and Li et al. (71.6%) [134, 135], both lower than endoscopists’ discrimination abilities, suggesting the algorithm needs improvement. These studies underscored the comparative performance of CADx systems and endoscopists, but collaboration might enhance diagnostic outcomes [136]. Additionally, CADx with NBI effectively distinguishes polyps, including sessile serrated lesions [140]. Min and colleagues proposed using linked color imaging with CADx referencing NBI, achieving 87.0% accuracy [137]. The GI Genius system, employing WLI, reported an 84.0% accuracy [138]. These findings underline the necessity for further large-scale studies.
As a crucial modality for the discrimination of neoplasms, the utility of magnifying endoscopy is suggested; previous studies demonstrated that NBI magnifying colonoscopy accurately differentiates the pathology of polyps with an accuracy of 86.6–97.5% [130, 131, 141]. Moreover, endocytoscopy, an ultramagnifying endoscopy technique, is noteworthy. Mori et al. [142] referenced features of nuclear areas obtained from magnified images and reported an accuracy of 89.2% in distinguishing tumor lesions. After successive improvements to the algorithm, Mori and colleagues conducted a prospective trial using endocytoscopy, reporting a sensitivity of 93.8% and specificity of 91.0% when combining NBI to discriminate lesions with 5 mm or less diameter [139]. They also suggested the utility of CADx for diagnosing lesion depth [143], indicating the need for further validation through additional prospective trials.
Conclusion
Recent research on AI systems for GI endoscopy is summarized. Studies typically indicate that these systems outperform nonexpert endoscopists and match or surpass expert endoscopists in effectiveness. However, the predominance of retrospective studies suggests a need for more prospective research to fully evaluate endoscopic AI systems’ clinical applications. The American Society of Gastrointestinal Endoscopy has outlined a framework for AI development, evaluation, and real-time performance metrics [144].
In contrast, several problems should be solved soon. Firstly, the cost-effectiveness of endoscopic AI systems is uncertain. Reports indicate AI’s potential cost-effectiveness for colorectal polyps in reducing colorectal cancer incidence and unnecessary polypectomies [145, 146], but limited studies exist for AI in EGD [147], necessitating further research. Secondly, ethical considerations surround endoscopic AI’s use. In Japan, guidelines by the Japan Gastroenterological Endoscopy Society recommend endoscopists using AI have ample diagnostic experience. Similarly, the American Medical Association House of Delegates advises AI should augment rather than replace human intelligence.
Endoscopists will be required to use endoscopic AI systems for detection and diagnosis, as these can reduce the risk of overlooking lesions and equalize the diagnostic ability of endoscopists. In the future, early detection and diagnosis of GI neoplasms may improve prognosis using AI systems in gastroenterology.
Conflict of Interest Statement
T.T. is the CEO of AI Medical Service Inc. T.O. and J.S. are advisory members of AI Medical Service Inc.
Funding Sources
This research received no external funding.
Author Contributions
R.K., K.O., and T.T. conceived and designed the study. R.K., K.O., and T.O. drafted the manuscript. J.S., S.I., and T.T. contributed to the revision and editing. All the authors have approved the final manuscript for publication.