Abstract
Background: Clinical artificial intelligence (AI) has reached a critical inflection point. Advances in algorithmic science and increased understanding of operational considerations in AI deployment are opening the door to widespread clinical pathway transformation. For surgery in particular, the application of machine learning algorithms in fields such as computer vision and operative robotics is poised to radically change how we screen, diagnose, risk-stratify, treat, and follow up patients, in both pre- and postoperative stages, and within operating theatres. Summary: In this paper, we summarise the current landscape of existing and emerging integrations within complex surgical care pathways. We investigate effective methods for practical use of AI throughout the patient pathway, from early screening and accurate diagnosis to intraoperative robotics, postoperative monitoring, and follow-up. Horizon scanning of AI technologies in surgery is used to identify novel innovations that can enhance surgical practice today, with potential for paradigm shifts across core domains of surgical practice in the future. Any AI-driven future must be built on responsible and ethical usage, reinforced by effective oversight of data governance and of risks to patient safety in deployment. Implementation is additionally bound to considerations of usability and pathway feasibility, and the need for robust healthcare technology assessment and evidence generation. While these factors are traditionally seen as barriers to translating AI into practice, we discuss how holistic implementation practices can create a solid foundation for scaling AI across pathways. Key Messages: The next decade will see rapid translation of experimental development into real-world impact. AI will require evolution of work practices, but will also enhance patient safety and surgical quality outcomes, and provide significant value for surgeons and health systems. Surgical practice has always sat on a bedrock of technological innovation. For those that follow this tradition, the future of AI in surgery starts now.
Introduction
Artificial intelligence (AI) has received unprecedented attention over the past decade, with substantial progress in developing and testing clinical use-cases. AI is an umbrella term describing different computational technologies which enable algorithms to learn from, interpret, produce predictions from, or autonomously act on clinical data. These include fields of machine learning (ML) from structured data, computer vision, augmented reality (AR), and operative robotics [1]. Technological diversity enables application across complex patient journeys and multimodal clinical data, with potential for impact at multiple points of any patient experience. The natural result has been innumerable areas of study, which have evolved through iterative advances in algorithms and clinical data, and which are presently reaching maturity in their capacity for predictive performance.
Discussion of AI-led transformation is not new, and previous dawns have proven to be false. Research into AI remains dominated by retrospective evaluations showing promising model performance, but outside of headlines in popular media, real-world impact has remained far more limited. The reasons for translational failures have been previously discussed [2], but can be summarised by (1) lack of consideration of feasibility and effectiveness in real-world pathways; (2) lack of maturity in data or digital infrastructure; and (3) poor model performance when pushed to real-world settings with diverse populations and practices unique to local contexts. As such, instead of an expected revolution in care delivery, we have experienced a more gradual evolution where real-world benefits of AI are tested and demonstrated for select use-cases, and where existing pathways are enhanced and made more precise instead of being replaced entirely.
With gradual but meaningful progress, the landscape is now poised to move into a new stage of sustainable and widespread deployment. This inflection point has emerged from a convergence of numerous factors which we summarise here: (1) AI algorithms have reached a stage of maturity where they can be easily applied to clinical data, and are able to maximise potential information within clinical datasets; (2) clinical data are more interoperable, with greater population coverage, and are more readily available to support AI development and prediction in clinical environments; (3) driven in part by learnings from previous deployment failures, we now have far greater scientific understanding of the pathway components that are foundational elements of supporting any continuous AI deployment; (4) there is established recognition that AI is merely a pathway instrument, and model accuracy must support clinical efficiency or patient outcomes. As such, there is increased academic, institutional, and regulatory attention paid to evidence generation and impact assessment across a post-deployment (or post-market) lifecycle; (5) broad agreement on definitions for responsibility in clinical AI deployment enables robust frameworks to support any device deployment; and finally (6) year-on-year increases in provider, academic, and capital funding for AI development and testing have resulted in a critical mass of validated and regulator-approved models that are becoming embedded sustainably within active pathways. This confluence of algorithmic, data, and implementation maturity are factors that have already enabled value-driven AI utilisation in non-healthcare industries [3]. Healthcare is likely to follow in these footsteps.
Surgical pathways are well defined, evidence-driven, often technologically supported at baseline, and supply tangible and well-validated measures for performance, efficiency, and patient-centred outcomes. These factors are likely responsible for strong levels of AI model maturity and the number of regulator-approved software-as-medical devices that sit in key stages of any surgical pathway. These may include radiomic devices for screening and diagnosis, models that utilise rich clinical data for operative risk prediction, AI-assisted robotic surgical devices, models that optimise operational pathway efficiency, and decision support systems in the postoperative and follow-up stage. As such, we are increasingly likely to see AI-based workflow evolution across a multitude of surgical disciplines, supported by evidence of efficiency-savings and outcome improvement.
In this paper, we first provide a primer on current AI algorithmic science, before exploring specific use-cases that are reaching maturity in their application to surgical pathways and highlighting a case example where surgical AI will work in collaboration with clinicians at various points along a patient journey. We discuss how these use-cases must be implemented holistically with consideration of clinical pathway feasibility, protection of patients through data governance and responsible AI practices, and a focus on resulting impacts through health technology assessment frameworks and real-world evidence generation. Finally, we peer into a future landscape of surgery once AI technologies have advanced significantly.
An AI Primer
AI broadly refers to the development of computers that are able to carry out tasks normally associated with human intelligence [4]. More specifically, it describes machine-based systems that input human-defined objectives to make predictions, recommendations, or decisions influencing their environment [5, 6]. Since its emergence over 60 years ago, AI now permeates many aspects of daily life, ranging from search engines and global financial instruments to entertainment recommender systems and home virtual assistants.
Various subtypes of AI have demonstrated the potential to be clinically beneficial. ML describes the use of algorithms to make predictions about data and automatically improve over time [4]. It is separated from other fields of AI by the process of learning behaviours that are not explicitly programmed by the developer, and can be further subdivided into supervised, unsupervised, or reinforcement learning [7]. Supervised learning trains algorithms with labelled data to predict an output. On the other hand, unsupervised learning uses unlabelled data to allow algorithms to explore unknown patterns or outcomes. Finally, reinforcement learning refers to a trial-and-error approach, akin to operant conditioning. All three approaches have been evaluated in healthcare applications [1].
Deep learning is a further subset within ML that uses neural networks to analyse large datasets and solve complex problems. Additionally, deep neural networks are neural networks containing multiple hidden layers with even higher processing ability, taking inspiration from the complex neural networks in biological nervous systems. One of the main applications that has highlighted the remarkable ability of deep learning to identify complex patterns is its diagnostic performance in medical imaging [8].
Further subdivisions particularly relevant to surgery are natural language processing (NLP) and computer vision [4]. NLP refers to the conversion of synthesised free language into automated and structured text. This has implications mainly within scrutinising clinical text and electronic health records (EHRs) but has increasing scope with the possibility of automating digital scribing. Computer vision enables algorithms to analyse images and videos to interpret and extract patterns, expanding the utility of recorded surgical images and video data. These various AI technologies will be explored in turn as we discuss their application within surgical domains.
To frame the context for safe and effective AI applications in surgery, we propose easily followed criteria based on guidance from the World Health Organization (WHO) [5] and the Coalition for Health AI [9], termed testable, useable, reliable, beneficial, operable (Fig. 1). Firstly, AI technologies must be testable to allow algorithms and outcomes to be objectively verified and measured against pre-set standards at local, national, and international levels. Secondly, usability means that the technology is acceptable to patients and clinicians and easy-to-use and does not negatively impact clinical workflows. Thirdly, AI systems should be reliable so that clinicians and patients can trust the system to perform consistently with reproducible results. Fourthly, they must be beneficial to patient care, insofar as the system is clinically effective and safe and does not lead to unacceptable negative effects. Finally, the AI system must be operable, with assurance of sufficient operator training to ensure implementation of the technology is not limited by lack of operator ability or knowledge. We recommend that key stakeholders involved in the design, development, and deployment of future AI technology in surgery follow these principles.
Proposed TURBO framework that provides easy-to-follow criteria for stakeholders to develop AI systems in surgery. TURBO, testable, useable, reliable, beneficial, operable.
Proposed TURBO framework that provides easy-to-follow criteria for stakeholders to develop AI systems in surgery. TURBO, testable, useable, reliable, beneficial, operable.
Applications of AI across Surgical Domains
The strength of AI lies in the data that feed it. Although large volumes of health data exist, its implementation into clinical practice is hindered partly by the high dimensionality and complexity of data which require sophisticated solutions to translate into real-world benefit. However, the amount of data that can be collected from patients is innumerable, and the technology required to process it is progressing rapidly. Disparate data sources are being used to develop AI models that integrate data across all modalities, termed “multimodal AI” [10]. This is predicted to act as a gateway into an era of personalised medicine, and there is no doubt that this will spill over heavily into surgical practice. Figure 2 illustrates the predicted applications of AI in surgery. We consider the application of AI across these domains with examples from the US Food and Drug Administration (FDA) regulatory-approved AI/ML software-as-medical devices [11], as well as from significant trials and cutting-edge research. A search strategy based on our institutional expertise was performed to identify relevant studies from the literature. Prior systematic reviews conducted by our group have highlighted applications of AI within various surgical fields [8, 12‒15].
Chart depicting the applications of AI in surgery and the data sources. Applications are divided into preoperative, intraoperative, and postoperative stages.
Chart depicting the applications of AI in surgery and the data sources. Applications are divided into preoperative, intraoperative, and postoperative stages.
Population Screening
AI technology has tremendous potential to transform population-wide screening strategies. Various national screening programmes have demonstrated early diagnostic and mortality benefits for breast cancer [16], colorectal cancer [17], and abdominal aortic aneurysms [18] among others. However, at the same time, there have been long-standing debates regarding optimal screening strategies and the need to implement “personalised screening,” which ultimately depends on patient-specific factors as well as selection of screening tools, frequency and timing, and risk stratification [19, 20].
Breast cancer screening has so far received the most attention in the AI field by leveraging the impressive ability of AI systems to analyse medical images. Several developed national health policies, such as the USA and UK, use mammography-based screening strategies at regular intervals to detect breast cancer. However, there is a disparity in the accuracy of the traditional method of experts interpreting imaging, with significant rates of false positives and false negatives [21]. The use of second independent readers increases the cancer detection rate, but at the expense of higher recall rate and workload burden [22]. A number of lesion detection devices have been approved for screening mammography utilising various AI technologies, including ProFound AI software (iCAD, USA), INSIGHT MMG (Lunit Inc., South Korea), and Transpara (ScreenPoint Medical, The Netherlands) [11]. Of these, INSIGHT MMG and Transpara are currently in two large prospective trials and have reported positive preliminary findings that suggest non-inferiority as a second reader compared to double reading by two radiologists [23, 24]. Therefore, it is not difficult to see a near future where deep learning systems are utilised directly as an extra or second reader in breast cancer screening, both to improve cancer detection rate and to alleviate the workloads of radiologists. Similarly, AI has the potential to be utilised in any imaging-based screening platform, and work is underway to apply it for screening in colorectal cancer [25] and lung cancer [26].
Applications into workflow triage have also gathered attention. Triaging clinical data into a “high-risk” workstream as opposed to a low-risk one can direct higher risk cases to clinicians earlier, as well as streamline the processing of massive amounts of data generated from screening. The INSIGHT MMG algorithm was successfully able to triage mammograms into either no radiology assessment or enhanced assessment by radiologists, resulting in notable benefits of reducing workload and improving early detection rate [27].
In recent years, molecular biology has also experienced a rapid paradigm shift similar to AI technology. “Multiomics” refers to the analysis of large datasets of small molecules – consisting of the genome, epigenome, proteome, metabolome, and microbiome – that influence biological processes [28]. Although application of AI to this field is still at its infancy, analysis of genomic information has been reported for the early detection of cancer. For example, an ML-driven model was tested and independently validated on circulating tumour DNAs to detect early-stage hepatocellular carcinoma by assessing somatic copy number alterations [29]. AI and the “multiomics” are primed to have a synergistic relationship, in which massive datasets of small molecules combined with clinical data and imaging are fed into sophisticated AI algorithms to identify populations at risk, and screen, diagnose, and prognosticate disease with increasing precision.
Diagnosis of Symptomatic Patients
AI technologies evaluating imaging data similarly have transformative potential for early and more accurate diagnosis of symptomatic patients. For example, numerous deep learning techniques have been reported in ophthalmology for diagnosing retinal disease on optical coherence tomography or retinal fundus photographs [8]. Regulatory-approved devices that have shown high levels of accuracy in diagnosing diabetic retinopathy include IDx-DR (Digital Diagnostics, USA) and EyeArt (Eyenuk, USA) [11]. Deep learning also allows for real-time computer-aided detection of colorectal polyps, with improved adenoma detection rates demonstrated by the approved GI Genius system (Medtronic, USA) [11, 30]. Analysis of cost-effectiveness of AI-assisted colonoscopy based on Markov model microsimulations suggests that AI integration can further reduce the relative reduction of colorectal cancer mortality and overall lead to a yearly saving of $290 million in the USA [31].
Other applications of deep learning in diagnosing surgical conditions include fractures on plain radiographs [32‒34]; prostate cancer on MRIs [35, 36] and biopsies [37, 38]; intracranial haemorrhages [39, 40] and aneurysms [41]; thyroid cancer [42]; small bowel obstruction [43]; pancreatic disease [44]; and aortic dissection [45]. This is not an exhaustive list but illustrates the impressive ability of AI technology to diagnose imaging sequences, leveraging the unique capability of deep learning to identify and learn complex patterns. With many studies demonstrating superior or non-inferior performance compared to human experts, accurate and timely diagnosis of patients enables more effective and faster surgical decision-making, ultimately improving patient care. In the future, AI algorithms may be able to aid in complex surgical diagnoses by using a combination of clinical assessment findings and investigation results. For example, a retrospective analysis of a cohort of paediatric patients presenting to hospital found that using a supervised ML algorithm showed promise in diagnosing acute appendicitis and identifying complicated inflammation based on biochemical markers and ultrasound findings [46].
Automated speech recognition and NLP can automatically record, extract, and summarise information from physician-patient consultations [47]. Potential benefits include reducing the administrative burden of clinicians, which is a significant reason for burnout [48, 49], allowing clinicians to spend more time with patients, while addressing rising outpatient demands. An ML algorithm was trained to automatically chart symptoms discussed between patients and clinicians in an encounter, performing well on clearly mentioned symptoms [50]. More work is needed to improve performance on vague symptoms and to analyse data to aid diagnostic decisions. However, another envisioned benefit is to use extracted information to support surgical diagnostic workflows from clinical encounters [47].
Preoperative Risk Prediction
Preoperative risk tools have emerged as important adjuncts to aid clinicians and patients predict the risks of surgery based on several patient-specific and operative factors. Commonly used tools include the APACHE III prognostic system for mortality of critically ill patients [51], the ACS NSQIP surgical risk calculator [52], and POSSUM for mortality and morbidity [53]. However, established calculators often use subjective inputs or assume linear relationships between variables in their mathematical models, which is not the case in clinical reality [54].
Recent advances can leverage the ability of ML to analyse large diverse datasets and model non-linear relationships to improve the predictive capacity of risk tools. The Predictive Optimal Trees in Emergency Surgery Risk calculator in emergency laparotomy patients utilises an ML learning-derived optimal classification tree method and outperformed the ACS NSQIP in accuracy of preoperative risk prediction of its patient cohort [55]. Another study developed and validated an ML preoperative risk algorithm, referred to as MyRiskSurgery, that analysed data in patients’ EHR to predict major postoperative complications and mortality, allowing for dynamic real-time interpretation of risk factors with continuous fine-tuning from clinician feedback [56]. Future prediction tools will likely use a combination of several patient, physician, and operation-specific factors to provide an accurate, user-friendly, and personalised risk assessment score that can aid preoperative planning.
Intraoperative Guidance
The role of computer vision in analysing surgical videos has been widely discussed in the literature [1, 57, 58]. This has partly become possible due to the mainstream adoption of video-based minimally invasive surgical techniques, such as laparoscopic, robotic, endoscopic, and endovascular procedures. Massive volumes of surgical videos recorded daily can provide a rich global dataset for AI computer vision algorithms, predominantly based on deep learning techniques, to amalgamate, read, segment, and analyse procedural videos. However, application into this field has not yet been as successful as imaging or pathology because surgical videos are intrinsically more dynamic with objects in constant motion and contain varied and obscured surgical planes. Furthermore, annotating training sets is time laborious for experts, and there is not always a clear established consensus on defined steps during an operation [57].
So far, studies have involved attempting to delineate anatomy in commonly performed procedures. Unsurprisingly, the laparoscopic cholecystectomy, as an index general surgery procedure with a renowned “critical view of safety,” has received considerable attention so far given the importance of the anatomical identification of key structures [59, 60]. Deep learning models using semantic segmentation were trained to identify safe and dangerous zones during laparoscopic cholecystectomies with an impressive degree of performance [61]. Similar work has been carried out for other procedures, including laparoscopic sleeve gastrectomy [62], cataract surgery [63], and endovascular aneurysm repair [64]. Although this area is still in its infancy, real-time intraoperative guidance has the potential to provide accurate visual support for surgeons in challenging operations, established by the experience of experts worldwide [65].
AI-powered three-dimensional anatomical reconstruction can aid preoperative planning and intraoperative navigation of complex operations. The FDA-approved cloud-based surgical AI tool Cydar EV Maps (Cydar Medical, UK) generates a patient-specific map of arterial anatomy using preoperative anatomical mapping and real-time intraoperative fluoroscopic imaging, and has been shown to reduce radiation exposure and improve the safety of complex aortic aneurysm endovascular repair [66]. Furthermore, increasing interest in mixed reality technology, which includes virtual reality (VR) and AR, has led to the development of head-mounted displays to provide intraoperative guidance. While VR technology immerses the user into an artificially designed digital world, AR projects holographic virtual images into the operator’s vision while in a real-life scenario [67]. Head-mounted displays, such as the commercially available HoloLens (Microsoft Corporation, Redmond, WA, USA), can be controlled with voice and gesture commands, and have shown promise in improving surgical outcomes by improving the operator’s visual-motor axis [68] and facilitating accurate dissection of tissue through AR overlays [69]. Using AI to identify anatomy and reconstruct three-dimensional surgical fields could be integrated into AR holographic technology to provide the surgeon with an immersive and dynamic intraoperative environment.
Operative Robotics
The adoption of robotics into surgery provides several intraoperative benefits, including improved ergonomics, dexterity, and a realistic magnified three-dimensional view [70]. Established perioperative outcomes include reduction in blood loss and transfusion rates, shorter inpatient stays, and reduced complication rates [14]. Subsequently, robotic surgery is increasingly being used as the minimal invasive option of choice for common operations as opposed to traditional laparoscopic surgery [71], with more than 12 million procedures performed by 60,000 surgeons worldwide on the da Vinci system alone [72]. As a digital surgical approach, robotics is a core technology that is ripe for integration with AI to further refine, or even revolutionise, intraoperative surgical practice. Although this application can be integrated with the technology discussed within the Intraoperative Guidance section, functions best applied to robotics include motion analysis, instrument recognition, improved haptic feedback and tissue sensing, intraoperative visualisation, and task completion [4, 58, 73].
Multiple studies have utilised deep learning with convolutional neural networks to automate the objective skill evaluation of basic technical surgical tasks such as suturing and knot-tying using raw kinematic or video data [74‒77]. This has the potential to standardise the assessment of surgical tasks at early stages of training, reduce subjective human biases in providing feedback, and obviate the time-intensive nature of manual expert rating. Within motion analysis, specific applications include surgical activity and gesture recognition [78], which involves segmentation of the start and end times of each motion trajectory change [79, 80]; instrument tracking and identification [81‒83]; and temporal segmentation of surgical procedures into clinically relevant steps [84‒86]. The principles of computer vision discussed prior have also been applied to robotic surgery, highlighting the ability to integrate video data to improve performance [87, 88]. Aside from evaluating surgical skills for basic surgical skills, algorithms can be trained on robot system data consisting of kinematic, intraoperative events and video data, in combination with postoperative outcomes, to develop automated performance metrics and predict intraoperative performance. This has been developed for robot-assisted radical prostatectomy [89] and will likely be created and validated for others.
As clinicians, scientists, and engineers continue to collaborate in designing ever more sophisticated AI solutions, the question must be asked: can we develop and deploy truly autonomous operative robots? Currently established robotic platforms, such as the da Vinci system, are based on the master-slave paradigm between surgeon and robot, with the surgeon in full control and the robot replicating movements [12]. A hierarchy of robotic autonomy has been conceptualised, often paralleled to driving autonomy that scales from level 0 (no automation) to level 5 (full driving automation) [90]. Applied to the medical robotics field, six levels of autonomy have been suggested, with higher levels of autonomy likely to require increasing AI technology integration (Table 1). Currently developed systems reach level 2 (task autonomy) and level 3 (conditional autonomy); uses range from trajectory planning [91], bone drilling [92], and autonomous soft tissue surgery, e.g., suturing [93, 94]. Fully autonomous robotic systems able to perform skin-to-skin operations are largely conceptual at the moment, but possible benefits include standardising operative techniques; amassing more sensory data to perform surgery with more precision, stability, and accuracy; eliminating the risk of human fatigue, stress, and bias; and performing surgery in locations deemed unreachable or unsafe for humans, e.g., remote areas or military combat zones [12, 95, 96]. The potential impact and considerations of autonomous robots will be discussed later.
Six levels of robotic medical autonomy adapted from the classification of vehicular autonomy, ranging from no autonomy to full autonomy [97]
Degree of autonomy . | Description . |
---|---|
Level 0 | No autonomy – human operator performs all functions |
Level 1 | Robot assistance – human retains continuous control |
Level 2 | Task autonomy – human has discrete control, with specific tasks given to robot |
Level 3 | Conditional autonomy – human selects autonomous strategy performed by robot |
Level 4 | High autonomy – robot makes decisions but under supervision of human |
Level 5 | Full autonomy – procedure fully performed by robot with no human supervision required |
Degree of autonomy . | Description . |
---|---|
Level 0 | No autonomy – human operator performs all functions |
Level 1 | Robot assistance – human retains continuous control |
Level 2 | Task autonomy – human has discrete control, with specific tasks given to robot |
Level 3 | Conditional autonomy – human selects autonomous strategy performed by robot |
Level 4 | High autonomy – robot makes decisions but under supervision of human |
Level 5 | Full autonomy – procedure fully performed by robot with no human supervision required |
Education and Training
The technologies described prior will also likely have a profound impact on the future of surgical education and training. The ever-increasing number of operative videos provides students and trainee surgeons with a wealth of information to learn from, and ML applied to surgical videos offers the opportunity to clarify anatomy, detail the steps in a procedure, and provide real-time feedback and guidance. Indeed, described applications in surgical education include recognition of surgical steps, instruments, gesture and errors, and anatomy [98]. The opportunity to integrate ML-based technology into VR- and AR-based simulation has the potential to be an invaluable supplement for surgical training and help address the significant learning curve required to master surgical disciplines.
Large language models (LLMs), particularly ChatGPT, have recently dominated the wider public sphere due to their impressive ability to process and generate narrative text [99]. LLMs utilise neural networks and NLP models to deliver human-like output on a variety of prompts, recently performing remarkable feats such as passing the US medical licensing exams [100]. They have the potential to transform the paradigm of medical education. Use-cases in automating educational tasks include grading of assessments, teaching support, prediction of student performance, real-time feedback, and content generation (e.g., questions and answers) [101]. As LLMs continue to mature and the input data are refined further, they have the potential to become an essential tool in surgical practice. Students and trainees can use LLMs to easily access educational resources and clinical information in a user-friendly format to improve their knowledge base. Generated diagnostic and management suggestions can aid surgeons in delivering better evidence-based surgical care to patients.
Postoperative Care
There is scope for AI to optimise postoperative care and predict adverse events. Postoperative complications contribute to healthcare burden by increasing mortality and morbidity, prolonging inpatient stays, and accumulating significant costs [102]. However, as causes of postoperative complications are often multifactorial in nature, using conventional statistical techniques to predict their occurrence is insufficient [1, 103]. With its capability to harness multiple data sources and powerful algorithms, AI could process pre- and intraoperative data consisting of various patient and procedure factors to predict complications. For example, an ML-based random forest model was developed to predict anastomotic leaks after anterior resections [104]. The study, which used a variety of variables including patient sex, location of tumour, surgeon volume, and surgical approach, found that the tool could effectively predict an anastomotic leak and could even provide advice on whether a defunctioning stoma was advisable in high-risk cases. Other ML algorithms have been developed for surgical site infections [105], complications post-bariatric surgery [106, 107], postoperative bleeding [108], and postoperative liver and pancreatic surgery [109, 110], with varying degrees of accuracy. However, this technology is still far from being implemented into clinical practice as there are limitations in the datasets used and inherent higher rates of complications may overestimate the predictive capacity of algorithms [103].
Effective Discharge Planning
Another postoperative area, which is closely related to improving patient flow, is in planning discharge. A personalised and structured discharge plan overall reduces inpatient length of stay, lowers readmissions to hospital, and may increase patient satisfaction [111]. This is an increasingly important topic in healthcare policy as secondary care institutions contend with growing pressures on admissions and acute bed shortages, exacerbated by growing and ageing populations, and a post-pandemic status quo [112, 113]. Supervised learning models have been trained to effectively predict optimal safe discharge of patients across different surgical specialities and hospitals. The discharge after surgery using artificial intelligence (DESIRE) study developed a random forest model on 1,677 surgery patients admitted to a tertiary hospital in the Netherlands, demonstrating high performance in predicting the need for hospital intervention after the second postoperative day (area under the ROC curve 0.88) [114]. This was trained and externally validated on 2,035 patients (split between training, validation, and test datasets) in three other hospital population cohorts, highlighting the potential to tailor models on local data to retain accuracy [115]. Between 2017 and 2019, the authors predicted that 913 bed-days could have been avoided. Therefore, there is potential of specialty or hospital-specific algorithms in reducing unnecessary inpatient stays, which can, in turn, ease bed pressures and improve postoperative recovery. However, prospective studies to evaluate the clinical implementation of models are required to ensure models are safe and effective, which the authors of the DESIRE study acknowledge is the next step.
Patient Follow-Up
Follow-up completes the surgical patient journey. Although this has yet to be explored in detail, one study developed an automated AI-assisted follow-up system for postoperative orthopaedic patients utilising ML, speech recognition, and voice simulation technology [116]. The AI system successfully followed up patients and received a higher rate of feedback than manual human-led follow-up. However, the composition of the feedback obtained from patients was different between the two groups; AI-assisted systems received more feedback relating to the inpatient experience and health education received, while human-led follow-up was more focused on postoperative symptoms. This may have been because patients connect more naturally with humans and were therefore willing to provide medical information to human operators. Future studies should focus on what effect these systems have on patient satisfaction as well as whether there is any impact on readmission or reintervention rates. In any case, with the potential to follow up multiple patients simultaneously [116] and the advancing capabilities of LLMs to integrate factual knowledge and interpersonal communication skills [99], there is the potential to incorporate AI-assisted systems in order to provide simple information to patients, answer questions, and triage concerning symptoms for clinician review. Even simple solutions could have profound benefits in easing workload of clinicians and addressing the ever-growing outpatient waiting lists [117].
The AI Surgical Patient Journey – an Example
We predict that AI technology will likely be embedded in virtually all aspects of future surgical practice. Specific use-cases outlined above can feasibly be applied to pathology or interventions across the spectrum of surgery. To highlight the potential of AI to integrate in a scalable and safe manner along the surgical patient journey, we consider a common hypothetical near-future scenario in which a patient with benign symptomatic gallstone disease undergoes a cholecystectomy (Fig. 3).
Example of an AI surgical pathway for a hypothetical patient with gallstone disease undergoing a cholecystectomy.
Example of an AI surgical pathway for a hypothetical patient with gallstone disease undergoing a cholecystectomy.
Gallstones affect 10–15% of adult populations and carry a significant health burden, increased hospital admissions, morbidity, and even mortality [118]. Estimated USD 6.5 billion are spent annually on direct and indirect costs of gallstone disease in the US alone, and cholecystectomy is now one of the most commonly performed elective procedures, with over 700,000 cholecystectomies carried out annually in the USA [119]. Changing lifestyles means that the prevalence of gallstone disease is rising in developed countries [120]. Streamlining the process with AI has the potential to make the diagnosis and management of gallstone disease more effective and safer for patients.
A patient presenting with classical right upper quadrant pain undergoes an ultrasound scan, which is interpreted by validated and accurate deep learning technology to identify gallstones. They are triaged to an outpatient consultation with a surgeon, and their encounter is recorded by ML speech recognition technology using NLP that affords the surgeon and patient more time to discuss the appropriate management option. Assuming they arrive to an agreed shared decision to undergo a procedure, a real-time automated risk prediction tool based on the information recorded above, tailored to the patient, institution, surgeon, and procedure, is calculated and discussed between the patient and clinician, and passed on to other allied health professionals such as the anaesthetist for perioperative assessment.
The surgeon opts to carry out a robotic cholecystectomy using a robotic system equipped with advanced sensors utilising AI systems to track and stabilise movements, provide greater haptic feedback, and improve tissue sensing. AR is used to overlay the surgeon’s view with real-time annotations of key anatomical landmarks, using algorithms developed from thousands of recorded procedure videos collected worldwide. Step-by-step recommendations are advised to guide the surgeon, with the surgeon still retaining overall control. Key anatomical areas (e.g., Calot’s triangle) and danger zones (e.g., common bile duct and duodenum) are visualised clearly in colour, providing the operating surgeon a clearer view on the optimal planes of dissection. The surgeon may also elect to train a junior colleague on a second console, who is able to confidently understand and delineate the anatomy with screen overlays. After the operation, the console gives the surgeons feedback on their performance based on metrics captured automatically and recommended areas of improvement, which is especially useful for surgical trainees who are still progressing through the learning curve. Realistic VR simulators and clinical LLMs refined on large medical databases are available for trainee surgeons to refine their practice and knowledge as part of the surgical training curriculum.
The data are fed into a postoperative tool to predict the risk of complications based on all of the preoperative factors highlighted prior and intraoperative events (e.g., risk of bile leak if the area of dissection was close to the common bile duct). This tool is dynamic and changes with real-time clinical data collected about the patient in the postoperative phase, allowing clinical staff to predict complications early and recommend an inpatient stay if necessary. At the same time, a discharge tool predicts optimal timing for safe discharge for the patient to expedite patient flow. As most institutions do not follow up day-case cholecystectomies [121], an automated AI-assisted follow-up system can telephone the patient soon after the surgery to ensure satisfactory postoperative recovery and answer any queries, with any concerns triaged to a health professional.
Challenges in Implementation
The promise of AI is significant and, even within the surgical domain, there are likely manifold more transformative uses in addition to those outlined so far. However, it is important to temper any expectations that AI is a panacea that can solve every problem in healthcare. Indeed, as the principles of AI technology have never been seen before, this will require vigilant and deliberate planning to implement safely and ethically into recognised surgical care frameworks. Key barriers that limit its implementation include ethics, governance, patient safety, data bias and drift, and cybersecurity (Fig. 4). An exploration of these issues in detail is outside the scope of this review but has been considered elsewhere [122‒124].
Challenges and barriers to implementation of AI technology in surgical practice with solutions proposed.
Challenges and barriers to implementation of AI technology in surgical practice with solutions proposed.
The traditional belief around AI is to support the surgical decision-making process [123], but the medical field inherently involves difficult decisions made by human clinicians, often encompassing a heavy backdrop of ethical and moral issues in which the correct decision is not always evident. Risk prediction algorithms could provide surgeons with powerful decision support tools, but these may pose ethical dilemmas. For example, in the case of the tool that predicts anastomotic leak post-anterior resection [104], if the surgeon disregards the tool’s suggestion to form a defunctioning stoma, would they be at fault? We suggest that current AI tools should be considered as one of many sources to augment clinical decision-making, but robust guidelines and frameworks must be produced to guide clinicians in their use. Furthermore, the rapid and dynamic pace of development of AI systems means that frameworks and guidance developed now may not be applicable in the near future. Nevertheless, there has been a concerted effort by international organisations to deliver comprehensive and lasting guidance on the ethics and governance of AI use for health, including the WHO [5] and OECD [6]. The testable, useable, reliable, beneficial, operable framework introduced in this review is based on this guidance and provides important principles for the development of health AI systems.
Practically, it has not always been easy to adhere to these principles. For example, transparency is crucial to ensure that clinically useful algorithms are developed with data that are representative of real-world clinical scenarios likely to be encountered. This is particularly important in a patient-facing and intervention heavy field such as surgery. Traditionally, medical data have been siloed to protect the privacy of patients, requiring datasets to be labelled [125]. Without any clear rigorous standards on the descriptions of datasets, there is a knock-on detrimental effect on the transparency of AI algorithms and potential introduction of bias. For instance, a scoping review on the transparency of datasets in the literature revealed that the majority of dataset descriptions used to develop deep algorithms for triaging or diagnosing images of skin disease were insufficient for analysis or replication, highlighting concerns for the clinical translation of the algorithms [126]. AI techniques such as deep learning are based on a “black box” design, which facilitates the automated detection of higher level patterns among data, but is difficult for either clinicians or engineers to determine the rationale of the outputs [127]. This has important ethical and practical implications including the moral and legal justification of clinicians utilising inscrutable algorithms on patients, as well as the accountability, safety, and reliability of the algorithms if poor training data are used [128].
It is recognised that data bias can lead to less effective algorithms that underestimate care needs in disadvantaged groups, exacerbating population-wide inequalities in gender, race, ethnicity, and socioeconomic status [129]. Algorithms that are trained on data from one population group may not be applicable to another, and under-served populations risk underdiagnosis through models developed with unrepresentative training datasets [130]. This could clearly have a profound effect on AI surgical care pathways that operate at a population screening level, mandating the need to develop strategies to mitigate risk of bias and substandard patient care. Less apparent sources of bias have been noted in surgical AI systems assessing surgical skills. A group of surgical AI systems evaluating videos of robotic surgery videos demonstrated both under-skilling and over-skilling biases affecting surgeon cohorts at varying rates [131]. The authors addressed this by developing an add-on application that predicts essential frames in the video based on human annotation, significantly reducing both biases exhibited by the AI system. Another challenge is data drift, which refers to the disparity between data used for training and real-world implementation, caused by factors such as differences in clinical practice between regions, and changes in populations and disease patterns over time [132]. Various bias and drift mitigation strategies should be considered at the pre-processing, processing, and post-processing stages to balance AI systems [129, 133].
Developing effective and fair AI algorithms will likely need larger and more diverse datasets than previously used. As the digitisation of health records continues, there is increasing access to large commercial datasets containing millions of health data points, including patient demographics, clinical entries, laboratory results, imaging, and genomic data [134]. Data sources include a combination of EHR, patient registries, insurance databases, wearables and ambient sensors, and social media. These can create extremely powerful AI algorithms that could be deployed at a population level or subsequently tweaked by regional health practice to be more applicable for specific sub-populations. In any case, it is crucial for transparency at every stage of development of these algorithms to retain testability and reliability.
To this end, there has been a drive to improve the transparency and quality in reported AI datasets and algorithms through the proposal of rigorous standards for conducting research. SPIRIT-AI and CONSORT-AI are recently developed international standards for clinical trial protocols and interventions, intended to ensure transparent reporting of AI research and allow for more confident identification of safe and effective AI interventions [135‒137]. Identified as likely important applications of future AI research in healthcare, versions and protocols for other AI-specific frameworks and standards have been established, including STARD-AI [138], TRIPOD-AI [139], PROBAST-AI [139], and QUADAS-AI [140]. The idea, development, exploration, assessment, and long-term study framework [141] facilitates evaluation of the stages of surgical innovation, and modifications have been proposed for application to medical devices (idea, development, exploration, assessment, and long-term study-D) [142]. Following on from this, DECIDE-AI was developed as reporting guideline on AI-based clinical decision support systems, which is particularly relevant to potential surgical applications. By establishing frameworks, reporting guidelines and quality assessment standards for clinical trials, diagnostic, prognostic, or intervention studies, these frameworks will outline comprehensive standards for research conducted in this field to improve the generalisability and applicability of outcomes. This can standardise research methodology, which is currently recognised as a major limitation of evidence synthesis studies in AI, and allow key stakeholders to confidently identify safe and effective AI tools for clinical use [8].
A tenet of healthcare practice is patient safety, which must be preserved by implementation of AI technology. As demonstrated in this paper, the opportunity of AI to improve positive patient outcomes is substantial at all aspects of the patient journey, but there are risks with careless implementation. Rigorous standards for conducting AI research and improved transparency of methods should aid patient safety measures. The Donabedian model has been a traditional approach to assess quality assurance in systems, based on structure, process, and outcome [143]. More recently, the Systems Engineering Initiative for Patient Safety model is an updated framework for understanding how a complex socio-technical system influences processes to produce outcomes [144, 145]; in healthcare, it is commonly used as a tool by institutions to evaluate system design, risk analysis, and patient safety incidents, reflecting the complex interaction between environmental, personal, technological, and task-related factors in health practice. There are two main implications: firstly, research and deployment of AI technology in clinical practice should be integrated into patient safety models to evaluate the effect on patient safety; secondly, AI technology can aid in defining the complex interlink between factors influencing patient safety and support policy decision-making.
Finally, widespread implementation of AI in surgery would require the acquisition, storage, and processing of large amounts of patient, clinical, and video data. The accumulation of diverse and dynamic real-time data requires robust infrastructure that maintains its integrity, security, and privacy. Cybersecurity breaches of healthcare data risk catastrophic leaks, corruption, or loss of sensitive patient data [146]. The transfer of data between public institutions such as hospitals, which collect patient data, and private companies developing cutting-edge technology increases the risk of cybersecurity breaches. Therefore, as data become increasingly interoperable, patient data must be safeguarded and stored in Secure Data Environments, as well as de-identified to maintain patient trust and adhere to governance laws [147, 148]. Data governance laws and frameworks for surgical AI systems must be developed with this in mind by surgeons, engineers, computer scientists, policymakers, bioethicists, and patients.
The Future of Surgery
In the use-cases outlined, the clinician remains in control at all times and uses AI-based technology to augment and inform surgical decision-making, while maintaining patient safety. This is realistic in the short-term setting and likely more acceptable to surgeons, patients, and policymakers. However, assuming the current trajectory of advancing AI technology, what could a future landscape look like and what are the accompanying considerations?
As AI systems progress, they become less distinguishable from humans in achieving varying tasks. This has become apparent in interpretating imaging [8] and answering medical exam questions [100] where AI systems are reaching or even surpassing the level of humans. Eventually, this may be the case for more complex tasks such as human interaction, clinical diagnostics, and performing surgery. The future may consist of conversational LLMs that integrate speech recognition, deep learning, and NLP technology to autonomously carry out consultations with patients, ultimately recommending a diagnosis and management plan (based on current best practice through collating updated guidelines and contemporary peer-reviewed research), while counselling patients and answering queries. Importantly, these tools will be empathy-driven to retain a human element to the consultation [149]. Separately, the evolution of operative robots has been conceptualised into the following generations: stereotaxic, endoscopic, bioinspired, and microbots [150]. Eventually, autonomous (fifth-generation) operative robots, taking the form of cyborg humanoids or swarm-like intelligence platforms, may be deployed into remote areas or conflict zones and could be controlled by surgeons through stereoscopic lenses and holographic technology (human-in-the-loop) [96]. In the distant future, they may even offer a superior alternative to human-led interventions and constitute the gold-standard best practice, precluding the need for human supervision or control [150].
This conceptual next generation of AI is termed artificial general intelligence and refers to autonomous systems that match or exceed human intelligence, achieving consciousness, sentience, and agency [151]. The Turing test was devised to determine whether computers could act in a manner indistinguishable to humans and has since been modified as a diagnostic technique (modified Turing test) to provide a quantitative framework for scientists and engineers to assess the capabilities of next-generation AI-based technologies [152]. However, this raises an important ethical and legal question: if a robot which possesses agency makes a mistake and leads to human harm, can it take responsibility? According to an analysis of the legal and ethical frameworks relating to this scenario, responsibility can be classified into accountability, liability, and culpability [96]. Although it may be possible for an autonomous robot to take accountability and liability, it may not be able to be culpable, i.e., punishable by a court of law, as it has no legally recognised concept of civil liberties. Culpability would need to be assigned to humans either involved in the manufacturing or operation of the robot, subsequently raising the question of whether humans should be removed from the loop at all. Similarly, in a situation where a surgeon operates a robot remotely and there is a loss of signal or malfunction leading to patient harm, there would need to be discussions regarding who is responsible.
As AI technology and robots become more autonomous, there are other ethical dilemmas to consider. For example, existing principles on robotics centre around human-robot interactions (specifically preventing human harm), but there will need to be frameworks to govern interaction between two or more artificial entities. The AIonAI law has been proposed to recognise a sentient robot’s universal right to dignity and fair treatment, and to prevent abuse and exploitation from other AI technologies [153]. Other questions include the following: how and to what extent should autonomous robots be trained to make ethical decisions? Would regulatory frameworks be different in other countries or hostile zones? And if autonomous robots achieve better performance than surgeons, how do we certify them and do we hold them to higher professional standards than humans? These are only some issues that arise with progression of AI [96]. Fundamentally, any technological advancement in AI technology must be accompanied by commensurate, robust, and malleable ethical, regulatory, and legal frameworks that govern its use, which must be carefully developed with the commitment of surgeons, engineers, researchers, bioethicists, legal professionals, and policymakers.
Conclusion
This review has outlined the vast potential for AI in the future of surgery with an impact predicted to improve or transform all aspects of surgical patient care. We have also emphasised the need to collectively identify and manoeuvre through challenges in its implementation, as well as appreciate important ethical and legal considerations as AI technology advances towards more autonomous capacities. The primary goal of health innovation is to enhance clinical practice and advance patient care. In particular, the surgical field has always strived to innovate, from performing complex operations that were once deemed impossible to perform, to ushering in an era of minimally invasive technologically powered devices. AI stands to be a gateway into a future era of healthcare, and the global surgical community must embrace it.
Conflict of Interest Statement
H.A. is Chief Scientific Officer of Preemptive Health and Medicine, Flagship Pioneering.
Funding Sources
This study was not supported by any sponsor or funder.
Author Contributions
H.A. constructed the themes for the manuscript. A.G., P.V., J.Z., and M.F. wrote the manuscript. All authors reviewed and critically revised and approved the manuscript.