Introduction: Artificial intelligence (AI) is increasingly being researched and developed in the medical field and holds the potential to transform healthcare after successful implementation. For patients with colorectal cancer liver metastases (CRLM), many AI models have been developed, but knowledge about translation of these models in the clinical workflow is lacking. Therefore, this systematic review aimed to provide a contemporary overview of the current maturity status of AI models for patients with CRLM. Methods: A systematic search of the literature until November 2, 2023, was conducted in PubMed, Embase.com, and Clarivate Analytics/Web of Science Core Collection to identify eligible studies. Studies using AI and/or radiomics for patients with CRLM were considered eligible. Data on the study aim, study design, size of dataset, country, type of AI application, level of validation and clinical implementation status (NASA technology readiness levels) were collected. Risk of bias and applicability of the individual studies were evaluated using the Prediction Model Risk of Bias Assessment Tool (PROBAST). Results: A total of 117 studies were included. Ninety-seven studies (83%) have been published in the last 5 years. The most common study design was retrospective (96%). Thirty-five studies (30%) utilized a dataset of fewer than 50 patients with CRLM. Internal validation was performed in 63% of the studies and external validation in 17%. The remaining studies did not report validation. Half of the studies were classified as high risk of bias. None of the included studies performed real-time testing, workflow integration, clinical testing, or clinical integration. Conclusion: Although a rapid increase in research describing the development of AI models for patients with CRLM has been observed in recent years, not a single AI model has been translated into clinical practice.

Artificial intelligence (AI) is increasingly being used in the medical field to aid in clinical decision-making. For patients with colorectal cancer liver metastases (CRLM), treatment decision-making is challenging due to a wide variety of factors that influence treatment strategies. In this systematic review, we show that multiple studies have attempted to overcome this challenge by developing AI models to improve treatment decision-making for patients with CRLM. However, although multiple studies have developed AI models in recent years for patients with CRLM, we showed that not a single AI model has been translated into clinical practice. In other words, no clinical added value for the individual patient with CRLM has been demonstrated thus far using AI, even though many work efforts and costs have been put into this. Therefore, future research should prioritize efforts toward integrating AI models into clinical practice to realize their potential in improving patient care.

Colorectal cancer (CRC) is the third most prevalent cancer globally and the second leading cause of cancer-related mortality [1]. Over 50% of patients with CRC will develop liver metastases during the course of the disease [2]. For patients with colorectal cancer liver metastases (CRLM), treatment decision-making is challenging due to a wide variety of factors that influence treatment strategy (e.g., initial resectability, synchronicity of CRLM, timing of surgery, chances of recurrence and type of chemotherapy regimen), resulting in a variety of treatments and no definitive consensus [3]. Over the past decades, multiple risk scores were proposed to determine risk profiles; however, these risk scores were showed to have suboptimal predictive performance for adequate guidance in clinical decision-making [4, 5]. Unsolved challenges in patient stratification for optimal treatment strategy in patients with CRLM demonstrate the need for reliable biomarkers to support personalized treatment decision-making.

In recent years, artificial intelligence (AI) has emerged into the medical field and has revolutionized the approach to disease management. AI models perform tasks that would typically require human intelligence [6]. It has the potential to examine large amounts of data in a faster and more accurate way than conventional practices, which are labor- and resource-intensive procedures [7]. AI models can classify, catalog and correlate large amounts of data in order to generate patient-specific predictions. In addition, AI can accurately analyze medical imaging, pathology slices and genetic data [8]. For cancer care specifically, AI can aid in diagnosis, treatment planning, early detection of disease, drug development and clinical decision-making [9, 10]. These evolving practices hold potential to transform healthcare, promising improvements in precision medicine, personalized treatment regimens, and patients’ outcomes [11].

The positive attitude towards AI has resulted in multiple research advancements. Annually, countless AI models have been developed to enhance clinical decision-making. However, a notable discrepancy exists between the development of AI models and their implementation in clinical practice [12]. Most of the published studies within the medical field using AI consist of a retrospective design and are at high risk of bias. Limited datasets and a lack of transparency in validation processes lead to inaccurate data generalizations of the proposed AI model [11, 13]. A lack of proven clinical utility, feasibility, and effect on patient outcomes has been mentioned in various studies [12]. Adoption of the proposed models in clinical practice remains, therefore, limited. Several factors and challenges have been described, including the complexity of infrastructure, inadequate infrastructure, lack of automation, and even more importantly, limited knowledge on the deployment of AI models into clinical practice [14]. While developed countries advance in harnessing the potential of AI, the adoption and implementation of AI in developing countries present a distinctive set of challenges. Disparities in infrastructure, education and economics hinder the fair distribution of AI benefits [15].

For patients with CRLM specifically, several AI models have been developed and described within the literature, including diagnostics, prediction of disease development, prediction of response to chemotherapy, prediction of recurrence and prediction of overall survival [16‒24]. However, knowledge about validation, evaluation and implementation in clinical practice of these models is lacking. Therefore, this systematic review aimed to assess the current maturity status of AI models for patients with CRLM and the methodological quality of these models.

A systematic review of the literature was conducted to assess the current maturity status of AI models for patients with CRLM, to determine the methodological quality and the risk of bias of these models. The conduct and reporting of this review adhere to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [25]. The PRISMA 2020 Checklist can be found in online supplementary 1 (for all online suppl. material, see https://doi.org/10.1159/000546572). The review was registered in an international prospective register of systematic reviews, PROSPERO (Record ID CRD42024508592), before the initiation of the literature search.

Search Strategy

PubMed, Embase.com, and Clarivate Analytics/Web of Science Core Collection were searched for relevant literature from inception to November 2, 2023. Searches were devised in collaboration with a medical information specialist (K.A.Z.). Search terms including synonyms, closely related words and keywords were used as index terms or free-text words: “colorectal cancer” and “metastasis” and “artificial intelligence.” The search contained no date, no study design, and no language restrictions. Duplicate articles were excluded using the R-package “ASYSD,” an automated deduplication tool [26], followed by manual deduplication in Endnote (X20.0.3) by a medical information specialist (K.A.Z.). The full search strategies are detailed in online supplementary 2.

Study Selection Process and Eligibility

Two reviewers (M.Z. and R.K.) independently screened all potentially relevant titles and abstracts for eligibility using pre-set inclusion- and exclusion criteria with the help of Rayyan, a web application for systematic reviews. Studies were included if they met the following criteria: (i) patients aged ≥18 years; (ii) patients with CRLMs; and (iii) use or development of AI models, including radiomics research. Studies were excluded when they met the following criteria: (i) primary liver tumors; (ii) only non-colorectal liver metastases; (iii) recurrence of colorectal cancer without specifically addressing liver metastases; (iv) animal studies; (v) in vitro or ex vivo studies; (vi) reviews and meta-analyses, case reports, studies with a CRLM population of <15, conference abstracts; and (vii) and studies in any other language than English.

Articles were rejected during initial review when they did not meet inclusion criteria. Differences in judgment were discussed and resolved through a consensus procedure. All remaining articles were used for full-text assessment and independently assessed by the same two reviewers. Similarly to the previous process, Rayyan was used for full-text assessment and articles were excluded when they did not meet inclusion criteria, labeling each article with pre-set exclusion labels. Conflicts were resolved through a consensus procedure. All studies were checked for new citations.

Data Extraction

The following study data were extracted using a predefined data extraction form: (i) general aim of the study (categorized as prediction, identification, detection, improvement or correlation) followed by specific study aims; (ii) study design (categorized as retrospective, prospective or clinical, i.e., randomized controlled trials [RCT]); (iii) size of the dataset, including (1) number of all patients used for analysis and (2) number of patients with CRLM specifically used for data analysis when the study addressed other metastatic sites too; (iv) country where study was conducted (including information about single- or multicenter design); (v) type of AI/radiomics used or developed (categorized as AI [algorithm], AI [machine learning], AI [deep learning], radiomics, radiomics and AI, or not reported); (vi) level of validation (categorized as internal validation, external validation, prospective validation, clinical validation or no [reported] validation); (vii) clinical implementation status, which was assessed by applying the general concept of technology readiness levels introduced by National Aeronautics and Space Administration (NASA): problem identification (level 1), proposal of solution (level 2), model prototyping and development (level 3 and 4), model validation (level 5), real-time testing (level 6), workflow integration (level 7), clinical testing (level 8), and integration in clinical practice (level 9) [6, 27]; and (viii) availability and accessibility of public dataset, including information on access to dataset.

Risk of Bias Assessment

The Prediction Model Risk of Bias Assessment Tool (PROBAST) was applied to assess the risk of bias (ROB) and the applicability of diagnostic and prognostic prediction model studies [28]. ROB and applicability were categorized as high, low, or unclear for each domain, and all signaling questions were closely followed using the guidance notes by Moons et al. [29].

Statistical Analysis

Study characteristics were reported as categorical and continuous variables. Categorical variables were reported as numbers and percentages. Continuous variables were reported as mean with standard deviation (SD) if normally distributed and as median with interquartile range (IQR) if not normally distributed. All analyses were performed using R, version 4.3.2 (R Foundation for Statistical Computing).

Literature Search

We identified 4,929 records from three databases. After removing duplicates, 2,446 references remained for title and abstract screening. Of these, 2,179 articles were deemed irrelevant to the topic and were excluded when reviewing for title and abstract. A total of 270 full-text articles were reviewed for eligibility. A flow chart of the data search and study selection is shown in online supplementary Figure S1. Additional searching of the references for the included articles did not yield any new relevant references. During full-text screening, 24 articles could not be retrieved. Of the remaining 223 articles assessed for eligibility, 117 studies were included in this systematic review. The primary reasons for excluding the 106 full-text articles were that studies did not specifically mention (the number of) patients with CRLM and that studies did not develop or validate AI or radiomic models. All included studies are listed in the reference list of the manuscript.

Study Characteristics and Study Designs

All studies were published after 2005. Of the included studies, 113 (97%) were published in the last 10 years and 97 studies (83%) in the last 5 years. A schematic representation of the increase in published articles on this topic is shown in Figure 1. Most studies on this topic were published in 2022 and 2023, with 26 studies published each year (Fig. 1a). The most common general study aims were prediction (69 studies [59%]) and identification (37 studies [32%]). Online supplementary Table S1 shows an overview of each general study aim subdivided into specific study aims. Four studies had two specific study aims within the prediction group. Predicting the development of CRLM, predicting response to treatment, predicting survival outcomes and identifying disease subtypes were analyzed within 19, 17, 16, and 12 studies, respectively. Most studies used AI (when merging the subcategories of AI, i.e., algorithm, machine- and deep learning) for analysis (41 studies [35%]). None of the studies mentioned the use of AI using solely an algorithm. Use of radiomics (36 studies [31%]) and radiomics in combination with AI (36 studies [31%]) for analysis had a shared second place in usage. Machine learning models were more frequently used than deep learning models, 27 studies (24%) and 14 studies (12%), respectively. Four studies (3%) did not report about specifics of AI models used for their analysis.

Fig. 1.

Number of articles published on AI/radiomics models for patients with CRLM. a Number of articles published each year. b Cumulative number of articles published each year.

Fig. 1.

Number of articles published on AI/radiomics models for patients with CRLM. a Number of articles published each year. b Cumulative number of articles published each year.

Close modal

An overview of all countries in which the studies were conducted, categorized as <5 studies per country, between 5 and 10 studies per country, between 10 and 20 studies per country and >20 studies per country, is depicted in online supplementary Figure S2. Studies were conducted in 18 different countries, of which most studies were conducted in China (41 studies [35%]) and the USA (22 studies [19%]). The majority of studies were conducted in high-income countries (60%), 38% in upper-middle income countries, 2% in lower-middle income countries, and none in low-income countries. Almost half of the radiomics studies were conducted in China (17/36 studies [47%]), and no radiomics studies were published before 2018. Machine learning- and deep learning AI models were most commonly used in the USA (14/41 studies [34%]). Of the 117 included studies, most studies had a retrospective study design (112 studies [96%]), five studies (4%) had a prospective design and none of the studies used a randomized controlled trial design. A total of 82 studies (70%) had a single-center design, 30 studies (26%) had a multicenter design, and 5 studies (4%) did not report how many centers were involved.

Dataset Size and Level of Validation

The median sample size of the total dataset used for analysis and of the patients with CRLM specifically across all 117 studies was 112 (IQR: 63–307) and 92 (IQR: 40–202) patients, respectively. The median sample size for studies reporting on internal validation was 112 (IQR: 76–272) and 91 (IQR: 40–166) patients, respectively, and for external validation was 274 (IQR: 18,135–693) and 214 (IQR: 114–511), patients respectively. Thirty-five studies (30%) had a dataset of fewer than 50 patients with CRLM and a total of 14 studies (12%) had a dataset of more than 500 patients with CRLM.

Most studies (74 studies [63%]) only performed internal validation, indicating that the assessment of the model performance was based on the population set that was used for the development [30]. Only 20 studies (17%) validated their model with an external validation cohort. All of these 20 studies have been published in the last 5 years. No single study validated the results in a prospective or clinical setting. The remaining 23 articles (20%) reported or performed any validation.

Online supplementary Table S2 shows the proportion of studies according to the different study designs, including level of validation and size of dataset of patients with CRLM specifically. Of all studies utilizing a retrospective study design, half of the studies utilized a dataset size of 50–100 patients for internal validation, and 45% utilized a dataset of 150–500 for external validation. None of the studies that utilized a prospective study design had more than 500 patients included in their analyses.

Notably, this search included a few studies that used the same cohort for their analysis. More specifically, two studies from the USA used the NCDB database; two studies from Italy used the same database and added another database in the subsequent study for external validation; three studies from Italy conducted by the same research group used the same cohort for their analysis; and two studies from Germany used the same cohort in which again the subsequent study added a database for external validation.

Clinical Implementation Status

All studies surpassed level 1 and 2 of the clinical implementation status which is defined as problem identification and proposal of solution, respectively (shown in Fig. 2). Almost all studies (96 studies [82%]) found themselves in the model prototyping and development phase (level 3 and 4) and the remaining 21 studies (18%) in the external model validation phase (level 5), the latter solely published from 2018 onward. None of the included study surpassed level 5 of the clinical implementation status, indicating none of the studies either performed real-time testing, workflow integration, clinical testing, or integration in clinical practice (shown in Fig. 2).

Fig. 2.

Cumulative number of studies published according to their clinical implementation status and year of publication. In recent years, many studies reporting on model prototyping and development (level 3 and 4) have been published. The movement of the clinical implementation status is currently mainly horizontal, whereas a diagonal movement toward the integration in clinical practice (level 9) is desired.

Fig. 2.

Cumulative number of studies published according to their clinical implementation status and year of publication. In recent years, many studies reporting on model prototyping and development (level 3 and 4) have been published. The movement of the clinical implementation status is currently mainly horizontal, whereas a diagonal movement toward the integration in clinical practice (level 9) is desired.

Close modal

Availability and Accessibility of Public Datasets

Of the 117 included studies, six studies (5%) provided a publicly available dataset, including information on dataset accessibility. Two other studies report partially publicly available data, and one other study published some available data, each of the three studies with accessible links. A total of 31 studies (26%) indicated that data are available upon request, with 12 of these studies explaining the reasons for not making the data publicly available. One study reported partly available data upon request, due to restrictions related to human genetic recourses. Two studies stated that their data are not yet publicly available; however, both studies did not specify when data would become available and accessible. Another six studies explicitly reported no availability of a public dataset, often stating that supporting findings are available within the article. Finally, 70 of the 117 studies (60%) did not have a data statement included, and were, therefore, categorized as not reported.

Risk of Bias and Applicability Assessment

The PROBAST assessment was restricted to 101 articles since the assessment was not suitable for 16 articles. Using the PROBAST criteria, the overall ROB was classified as high in 51 of the 101 (50%) studies (shown in Fig. 3). High risk of bias most often originated in the domains “predictors” and “analysis.” In addition, most studies (55%) showed a low concern for overall applicability. High concern for overall applicability was seen in 35 of the 101 (35%) studies. High concern for applicability was mostly observed in the “predictors” domain, 76 studies (76%), respectively. An overview of all study characteristics can be found in online supplementary 7.

Fig. 3.

Risk of bias and applicability assessment of 101 studies according to the domains proposed by PROBAST showed as percentages. The left part of the figure displays the risk bias for each domain categorized as low, high, and unclear. The right part (*) of the figure displays the concern for applicability for each domain, categorized as low, high, and unclear.

Fig. 3.

Risk of bias and applicability assessment of 101 studies according to the domains proposed by PROBAST showed as percentages. The left part of the figure displays the risk bias for each domain categorized as low, high, and unclear. The right part (*) of the figure displays the concern for applicability for each domain, categorized as low, high, and unclear.

Close modal

This review demonstrates the potential of AI seen by the recent increase in research and development of AI models for patients with CRLM. None of these AI models, however, performed real-time testing, workflow integration, clinical testing or clinical integration, indicating a low maturity status. In addition, most of the developed models were developed using small datasets, lacked external validation, and showed a high risk of bias.

The finding that not a single study implemented the developed model in clinical practice appear contradictory to the conclusions of many of the included studies as the majority conclude that the use of AI models holds great promise for improving care for patients with CRLM. This phenomenon is well known in studies on prediction models using machine learning techniques, in which “spin practices” – being referred to as the discordance between results and conclusions – lead to unjustified optimism about the performance of these models [31]. Most of the included studies base their conclusions solely on internally validated results, and only in a few studies these results were externally validated. These findings are in concordance with studies from other specialties, as in the intensive care unit and radiology department [6, 32]. Moreover, reviews on primary and secondary liver tumors, and various other cancer types are reporting a lack of external validation [33‒36]. By means of internal validation, a predictive model’s performance is assessed on data of patients that are also used for development of the model. This leads to a significant overestimation of the model’s performance, especially when the patient cohort is small [30]. Large datasets are essential for machine learning models to be successfully trained and to improve the accuracy of the developed model [37].

One of the reasons for not surpassing the research and development phase (level 3/4), is the ease to use a retrospective study for the development of an AI model. By using retrospective data, the use of historically labeled data is used to train and test the developed algorithm. However, only by use of prospective studies the real-world medical setting can be manufactured and the true utility of the developed AI model can be understood [38]. For this reason, the question arises if studies without external validation accurately represent the model’s performance on data from an unseen dataset, or on real-world data in clinical practice. Reproducibility of a developed AI model is not only reflected by (external) validation, but also by technical challenges, referring to the ability to achieve consistent and reliable performance across different datasets, institutions, and clinical settings [39]. Technical differences in imaging protocols and data preprocessing might further hinder reproducibility across institutions. Liver specific technical challenges might include motion artifacts due to breathing and variability in contrast enhancement [40].

This review, supported by the available literature, suggests a discrepancy between the research output on the one hand, and on the other hand, the weaknesses of the study designs and methodologies, assessed by the size of dataset and risk of bias. If this discrepancy continues, it is debatable whether the conceptualization of models, with its proclaimed promise to revolutionize healthcare, will translate into clinical significance. van de Sande et al. [41] described that because of this trend of AI studies not surpassing the development phase and having poor methodological quality, a point of disillusionment expectations, called an “AI winter,” could occur. A systematic review on the clinical implementation status of AI models in intensive care units reached similar conclusions, stating that models rarely progress beyond the development phase to the validation, evaluation, and implementation phase [6]. A review from Aristidou et al. [42] supports these findings, and extrapolates them to the entirety of developed AI models in healthcare. They report that even mature, commercialized AI models are seldom implemented in clinical practice. This is partly due to a lack of evidence from published studies and reliance on retrospective, in silico data that does not reflect real-world clinical practice. They advocate for transparent data documentation, open data depositories for AI application development, and explainable, understandable AI decision-making processes for clinicians [11, 42].

The existing discrepancy between the development and implementation of AI models led to the advancements of frameworks, which are tools to facilitate the development, training, and deployment of AI models [43]. A study by Boverhof et al. [43] proposed a framework for valuation of AI models within the radiology sector from conception to local implementation. They emphasize the need for and importance of understanding AI technology in its local environment. Regarding local environment, most AI models and studies were conducted in high-income countries [44]. Our review showed similar results, in which 60% of the studies were conducted in high-income countries and 38% in upper-middle income countries. A review by Aderibigbe et al. [15] described the gap between the global advancements in AI technology on the one hand and on the other hand the potential of implementation of AI models in developing countries, countries that already face different and diverse challenges.

From our experience, several aspects of the research and development of AI models for clinical use could be useful for future researchers to enhance implementation. To start with, there always should be a clinical need or indication for the development of an AI tool with the intention for clinical implementation, rather than developing a model without clinical purpose. Future studies should demonstrate the added value of a developed AI model in comparison to current clinical practice or an appropriate baseline model. Such comparative assessments are crucial to justify clinical integration and to ensure that AI models lead to meaningful improvements in clinical outcomes or workflow efficiencies. Second, research teams should adhere to developed tools and guidelines to ensure transparent, complete and interpretable reporting of studies, such as the TRIPOD + AI statement and MI-CLAIM [45, 46]. A robust methodology should be proposed and an external validation cohort following the same robust methodology should be acquired [41, 47]. The study population should preferably consist of multicenter data. A proper methodology assures the generalizability of the model to unseen datasets, which is essential for ultimately clinical implementation. Third, by involving the end-users early in the development phase, their insights and clinical needs can be gathered before development, allowing it to be fine-tuned to their specific demands and needs [48]. Fourth, patients should already be involved during the design and development of the AI model to genuinely address patients’ needs and concerns, thereby fostering trust and relevant [49]. Lastly, by organizing evaluation sessions with the multidisciplinary team including patient representative, valuable insights can be gathered for further optimization and integration of the model in the clinical setting [49].

In addition to addressing the challenges encountered during the research and development phase of AI models, we propose seven strategies to facilitate the successful translation and clinical implementation of these models. First, engage stakeholders by a collaboration with a multidisciplinary team involving clinical and technical researchers, the envisioned end-users, clinicians treating patients with the disease of interest, and ICT experts with experience in clinical implementation of (in-house developed) models [41]. Second, evaluate the institution’s infrastructure and recourse to determine preparedness for AI integration [50]. Third, integrate the model into the existing workflow, minimizing disruptions and enhancing user adoption. Fourth, offer structured training to the end-users to familiarize them with the capabilities, limitations and appropriate use of the AI model [51]. Fifth, engage patients to educate them on the developed AI application to foster trust and transparency [52]. Sixth, continuously monitor and evaluate performance of the developed model to make in-time adjustments to maintain efficacy [53]. Seventh, adhere to ethical guidelines and regulatory standards, addressing patient privacy, data security, and if obligated, CE-marking [54].

This study has several limitations. Firstly, a total of 24 full-text articles could not be retrieved. Secondly, this review specifically focused on AI models for patients with CRLM. However, it is possible that other developed AI models from other specialties could be translated into this patient population. In addition, PROBAST was used in this review to assess the risk of bias of all included studies, while PROBAST was initially designed for prediction models, and so the findings should be interpreted in this context. Lastly, as the number of articles published over the years rapidly increased, there is a possibility of having lost relevant articles since the date the literature search was performed.

In conclusion, a rapid and notable increase in the development of AI models for patients with CRLM has been observed in recent years within the literature. However, none of these proposed AI models has been translated into clinical practice. Methodological shortcomings might have hampered the validation, evaluation, and implementation of these AI models. Despite the rapid increase in AI research and development, no clinical added value for the individual patient with CRLM has been demonstrated thus far. Future research should prioritize validation methodologies using predefined guidelines, multicenter studies, and efforts toward integrating AI models into clinical practice to realize their potential in improving patient care.

A statement of ethics is not applicable because this study is based exclusively on published literature.

All authors declare no potential conflicts of interest.

This work has received funding by the Dutch Cancer Society (KWF Kankerbestrijding), project number 14002/2021-2, and an unrestricted grant from the Cancer Center Amsterdam Foundation.

R. Kemna: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, visualization, and writing – original draft. J.M. Zeeuw: conceptualization, data curation, formal analysis, investigation, methodology, visualization, and writing – original draft. K.A. Ziesemer, J.I. Bereska, and H. Marquering: data curation and writing – review and editing. M. Ali: methodology, software, formal analysis, investigation, visualization, writing – original draft. I.M. Verpalen: conceptualization, methodology, resources, supervision, and writing – review and editing. J. Stoker, R.J. Swijnenburg, J. Huiskens, and G. Kazemier: conceptualization, funding acquisition, methodology, resources, supervision, and writing – review and editing.

Additional Information

Ruby Kemna and J. Michiel Zeeuw share the first authorship.

All data generated or analyzed during this study are included in this article and its supplementary material files. Further inquiries can be directed to the corresponding author.

1.
Sung
H
,
Ferlay
J
,
Siegel
RL
,
Laversanne
M
,
Soerjomataram
I
,
Jemal
A
, et al
.
Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
.
2021
;
71
(
3
):
209
49
.
2.
Chow
FC
,
Chok
KS
.
Colorectal liver metastases: an update on multidisciplinary approach
.
World J Hepatol
.
2019
;
11
(
2
):
150
72
.
3.
Adam
R
,
Kitano
Y
.
Multidisciplinary approach of liver metastases from colorectal cancer
.
Ann Gastroenterol Surg
.
2019
;
3
(
1
):
50
6
.
4.
Bolhuis
K
,
Wensink
GE
,
Elferink
MAG
,
Bond
MJG
,
Dijksterhuis
WPM
,
Fijneman
RJA
, et al
.
External validation of two established clinical risk scores predicting outcome after local treatment of colorectal liver metastases in a nationwide cohort
.
Cancers
.
2022
;
14
(
10
):
2356
.
5.
Zakaria
S
,
Donohue
JH
,
Que
FG
,
Farnell
MB
,
Schleck
CD
,
Ilstrup
DM
, et al
.
Hepatic resection for colorectal metastases: value for risk scoring systems
.
Ann Surg
.
2007
;
246
(
2
):
183
91
.
6.
van de Sande
D
,
van Genderen
ME
,
Huiskens
J
,
Gommers
D
,
van Bommel
J
.
Moving from bytes to bedside: a systematic review on the use of artificial intelligence in the intensive care unit
.
Intensive Care Med
.
2021
;
47
(
7
):
750
60
.
7.
Davenport
T
,
Kalakota
R
.
The potential for artificial intelligence in healthcare
.
Future Healthc J
.
2019
;
6
(
2
):
94
8
.
8.
Johnson
AE
,
Ghassemi
MM
,
Nemati
S
,
Niehaus
KE
,
Clifton
DA
,
Clifford
GD
.
Machine learning and decision support in critical care
.
Proc IEEE Inst Electr Electron Eng
.
2016
;
104
(
2
):
444
66
.
9.
Luchini
C
,
Pea
A
,
Scarpa
A
.
Artificial intelligence in oncology: current applications and future perspectives
.
Br J Cancer
.
2022
;
126
(
1
):
4
9
.
10.
Huang
S
,
Yang
J
,
Fong
S
,
Zhao
Q
.
Artificial intelligence in cancer diagnosis and prognosis: opportunities and challenges
.
Cancer Lett
.
2020
;
471
:
61
71
.
11.
van Genderen
ME
,
van de Sande
D
,
Hooft
L
,
Reis
AA
,
Cornet
AD
,
Oosterhoff
JHF
, et al
.
Charting a new course in healthcare: early-stage AI algorithm registration to enhance trust and transparency
.
NPJ Digit Med
.
2024
;
7
(
1
):
119
.
12.
Fasterholdt
I
,
Naghavi-Behzad
M
,
Rasmussen
BSB
,
Kjølhede
T
,
Skjøth
MM
,
Hildebrandt
MG
, et al
.
Value assessment of artificial intelligence in medical imaging: a scoping review
.
BMC Med Imaging
.
2022
;
22
(
1
):
187
.
13.
Kolla
L
,
Parikh
RB
.
Uses and limitations of artificial intelligence for oncology
.
Cancer
.
2024
;
130
(
12
):
2101
7
.
14.
Kim
B
,
Romeijn
S
,
van Buchem
M
,
Mehrizi
MHR
,
Grootjans
W
.
A holistic approach to implementing artificial intelligence in radiology
.
Insights Imaging
.
2024
;
15
(
1
):
22
.
15.
Aderibigbe
AO
,
Ohenhen
PE
,
Nwaobia
NK
,
Gidiagba
JO
,
Ani
EC
.
Artificial intelligence in developing countries: bridging the gap between potential and implementation
.
Comput Sci IT Res J
.
2023
;
4
(
3
):
185
99
.
16.
Li
M
,
Li
X
,
Guo
Y
,
Miao
Z
,
Liu
X
,
Guo
S
, et al
.
Development and assessment of an individualized nomogram to predict colorectal cancer liver metastases
.
Quant Imaging Med Surg
.
2020
;
10
(
2
):
397
414
.
17.
Taghavi
M
,
Trebeschi
S
,
Simões
R
,
Meek
DB
,
Beckers
RCJ
,
Lambregts
DMJ
, et al
.
Machine learning-based analysis of CT radiomics model for prediction of colorectal metachronous liver metastases
.
Abdom Radiol
.
2021
;
46
(
1
):
249
56
.
18.
Lee
S
,
Choe
EK
,
Kim
SY
,
Kim
HS
,
Park
KJ
,
Kim
D
.
Liver imaging features by convolutional neural network to predict the metachronous liver metastasis in stage I-III colorectal cancer patients based on preoperative abdominal CT scan
.
BMC Bioinformatics
.
2020
;
21
(
Suppl 13
):
382
.
19.
Vorontsov
E
,
Cerny
M
,
Régnier
P
,
Di Jorio
L
,
Pal
CJ
,
Lapointe
R
, et al
.
Deep learning for automated segmentation of liver lesions at CT in patients with colorectal cancer liver metastases
.
Radiol Artif Intell
.
2019
;
1
(
2
):
180014
.
20.
Ma
J
,
Dercle
L
,
Lichtenstein
P
,
Wang
D
,
Chen
A
,
Zhu
J
, et al
.
Automated identification of optimal portal venous phase timing with convolutional neural networks
.
Acad Radiol
.
2020
;
27
(
2
):
e10
e18
.
21.
Kim
K
,
Kim
S
,
Han
K
,
Bae
H
,
Shin
J
,
Lim
JS
.
Diagnostic performance of deep learning-based lesion detection algorithm in CT for detecting hepatic metastasis from colorectal cancer
.
Korean J Radiol
.
2021
;
22
(
6
):
912
21
.
22.
Khalili
K
,
Lawlor
RL
,
Pourafkari
M
,
Lu
H
,
Tyrrell
P
,
Kim
TK
, et al
.
Convolutional neural networks versus radiologists in characterization of small hypoattenuating hepatic nodules on CT: a critical diagnostic challenge in staging of colorectal carcinoma
.
Sci Rep
.
2020
;
10
(
1
):
15248
.
23.
Spelt
L
,
Nilsson
J
,
Andersson
R
,
Andersson
B
.
Artificial neural networks--a method for prediction of survival following liver resection for colorectal cancer metastases
.
Eur J Surg Oncol
.
2013
;
39
(
6
):
648
54
.
24.
Paredes
AZ
,
Hyer
JM
,
Tsilimigras
DI
,
Moro
A
,
Bagante
F
,
Guglielmi
A
, et al
.
A novel machine-learning approach to predict recurrence after resection of colorectal liver metastases
.
Ann Surg Oncol
.
2020
;
27
(
13
):
5139
47
.
25.
Moher
D
,
Liberati
A
,
Tetzlaff
J
,
Altman
DG
;
PRISMA Group
.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
.
BMJ
.
2009
;
339
:
b2535
.
26.
Hair
K
,
Bahor
Z
,
Macleod
M
,
Liao
J
,
Sena
ES
.
The Automated Systematic Search Deduplicator (ASySD): a rapid, open-source, interoperable tool to remove duplicate citations in biomedical systematic reviews
.
bioRxiv
.
2021
:
2021
.
27.
Mankins
JC
.
Technology readiness levels
.
White Paper
;
1995
.
Vol. 6
; p.
1995
.
28.
Wolff
RF
,
Moons
KGM
,
Riley
RD
,
Whiting
PF
,
Westwood
M
,
Collins
GS
, et al
.
PROBAST: a tool to assess the risk of bias and applicability of prediction model studies
.
Ann Intern Med
.
2019
;
170
(
1
):
51
8
.
29.
Moons
KGM
,
Wolff
RF
,
Riley
RD
,
Whiting
PF
,
Westwood
M
,
Collins
GS
, et al
.
PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration
.
Ann Intern Med
.
2019
;
170
(
1
):
W1
33
.
30.
Steyerberg
EW
,
Harrell
FE
Jr
,
Borsboom
GJ
,
Eijkemans
MJ
,
Vergouwe
Y
,
Habbema
JD
.
Internal validation of predictive models: efficiency of some procedures for logistic regression analysis
.
J Clin Epidemiol
.
2001
;
54
(
8
):
774
81
.
31.
Andaur Navarro
CL
,
Damen
JAA
,
Takada
T
,
Nijman
SWJ
,
Dhiman
P
,
Ma
J
, et al
.
Systematic review finds “spin” practices and poor reporting standards in studies on machine learning-based prediction models
.
J Clin Epidemiol
.
2023
;
158
:
99
110
.
32.
Kim
DW
,
Jang
HY
,
Kim
KW
,
Shin
Y
,
Park
SH
.
Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers
.
Korean J Radiol
.
2019
;
20
(
3
):
405
10
.
33.
Bakrania
A
,
Joshi
N
,
Zhao
X
,
Zheng
G
,
Bhat
M
.
Artificial intelligence in liver cancers: decoding the impact of machine learning models in clinical diagnosis of primary liver cancers and liver cancer metastases
.
Pharmacol Res
.
2023
;
189
:
106706
.
34.
Jia
LL
,
Zhao
JX
,
Zhao
LP
,
Tian
JH
,
Huang
G
.
Current status and quality of radiomic studies for predicting KRAS mutations in colorectal cancer patients: a systematic review and meta-analysis
.
Eur J Radiol
.
2023
;
158
:
110640
.
35.
Nardone
V
,
Reginelli
A
,
Grassi
R
,
Boldrini
L
,
Vacca
G
,
D'Ippolito
E
, et al
.
Delta radiomics: a systematic review
.
Radiol Med
.
2021
;
126
(
12
):
1571
83
.
36.
Chu
LC
,
Park
S
,
Kawamoto
S
,
Yuille
AL
,
Hruban
RH
,
Fishman
EK
.
Current status of radiomics and deep learning in liver imaging
.
J Comput Assist Tomogr
.
2021
;
45
(
3
):
343
51
.
37.
Rajkomar
A
,
Dean
J
,
Kohane
I
.
Machine learning in medicine
.
N Engl J Med
.
2019
;
380
(
14
):
1347
58
.
38.
Kelly
CJ
,
Karthikesalingam
A
,
Suleyman
M
,
Corrado
G
,
King
D
.
Key challenges for delivering clinical impact with artificial intelligence
.
BMC Med
.
2019
;
17
:
195
9
.
39.
Moassefi
M
,
Rouzrokh
P
,
Conte
GM
,
Vahdati
S
,
Fu
T
,
Tahmasebi
A
, et al
.
Reproducibility of deep learning algorithms developed for medical imaging analysis: a systematic review
.
J Digit Imaging
.
2023
;
36
(
5
):
2306
12
.
40.
Choe
KA
,
Smith
RC
,
Wilkens
K
,
Constable
RT
.
Motion artifact in T2-weighted fast spin-echo images of the liver: effect on image contrast and reduction of artifact using respiratory triggering in normal volunteers
.
J Magn Reson Imaging
.
1997
;
7
(
2
):
298
302
.
41.
van de Sande
D
,
Van Genderen
ME
,
Smit
JM
,
Huiskens
J
,
Visser
JJ
,
Veen
RER
, et al
.
Developing, implementing and governing artificial intelligence in medicine: a step-by-step approach to prevent an artificial intelligence winter
.
BMJ Health Care Inform
.
2022
;
29
(
1
):
e100495
.
42.
Aristidou
A
,
Jena
R
,
Topol
EJ
.
Bridging the chasm between AI and clinical implementation
.
Lancet
.
2022
;
399
(
10325
):
620
.
43.
Boverhof
BJ
,
Redekop
WK
,
Bos
D
,
Starmans
MPA
,
Birch
J
,
Rockall
A
, et al
.
Radiology AI Deployment and Assessment Rubric (RADAR) to bring value-based AI into radiological practice
.
Insights Imaging
.
2024
;
15
(
1
):
34
.
44.
Sharma
M
,
Savage
C
,
Nair
M
,
Larsson
I
,
Svedberg
P
,
Nygren
JM
.
Artificial intelligence applications in health care practice: scoping review
.
J Med Internet Res
.
2022
;
24
(
10
):
e40238
.
45.
Collins
GS
,
Moons
KGM
,
Dhiman
P
,
Riley
RD
,
Beam
AL
,
Van Calster
B
, et al
.
TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods
.
BMJ
.
2024
;
385
:
e078378
.
46.
Norgeot
B
,
Quer
G
,
Beaulieu-Jones
BK
,
Torkamani
A
,
Dias
R
,
Gianfrancesco
M
, et al
.
Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist
.
Nat Med
.
2020
;
26
(
9
):
1320
4
.
47.
He
J
,
Baxter
SL
,
Xu
J
,
Xu
J
,
Zhou
X
,
Zhang
K
.
The practical implementation of artificial intelligence technologies in medicine
.
Nat Med
.
2019
;
25
(
1
):
30
6
.
48.
Amann
J
,
Blasimme
A
,
Vayena
E
,
Frey
D
,
Madai
VI
;
Precise4Q consortium
.
Explainability for artificial intelligence in healthcare: a multidisciplinary perspective
.
BMC Med Inform Decis Mak
.
2020
;
20
(
1
):
310
.
49.
Banerjee
S
,
Alsop
P
,
Jones
L
,
Cardinal
RN
.
Patient and public involvement to build trust in artificial intelligence: a framework, tools, and case studies
.
Patterns
.
2022
;
3
(
6
):
100506
.
50.
Alami
H
,
Lehoux
P
,
Denis
JL
,
Motulsky
A
,
Petitgand
C
,
Savoldelli
M
, et al
.
Organizational readiness for artificial intelligence in health care: insights for decision-making and practice
.
J Health Organ Manag
.
2020
;
35
(
1
):
106
14
. ahead-of-print(ahead-of-print).
51.
Misra
R
,
Keane
PA
,
Hogg
HDJ
.
How should we train clinicians for artificial intelligence in healthcare
.
Future Healthc J
.
2024
;
11
(
3
):
100162
.
52.
Jeyakumar
T
,
Younus
S
,
Zhang
M
,
Clare
M
,
Charow
R
,
Karsan
I
, et al
.
Preparing for an artificial intelligence-enabled future: patient perspectives on engagement and health care professional training for adopting artificial intelligence technologies in health care settings
.
JMIR AI
.
2023
;
2
:
e40973
.
53.
Feng
J
,
Phillips
RV
,
Malenica
I
,
Bishara
A
,
Hubbard
AE
,
Celi
LA
, et al
.
Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare
.
NPJ Digit Med
.
2022
;
5
(
1
):
66
.
54.
Mennella
C
,
Maniscalco
U
,
De Pietro
G
,
Esposito
M
.
Ethical and regulatory challenges of AI technologies in healthcare: a narrative review
.
Heliyon
.
2024
;
10
(
4
):
e26297
.