Abstract
Introduction: Despite multiple prognostic indicators described for oral cavity squamous cell carcinoma (OCSCC), its management still continues to be a matter of debate. Machine learning is a subset of artificial intelligence that enables computers to learn from historical data, gather insights, and make predictions about new data using the model learned. Therefore, it can be a potential tool in the field of head and neck cancer. Methods: We conducted a systematic review. Results: A total of 81 manuscripts were revised, and 46 studies met the inclusion criteria. Of these, 38 were excluded for the following reasons: use of a classical statistical method (N = 16), nonspecific for OCSCC (N = 15), and not being related to OCSCC survival (N = 7). In total, 8 studies were included in the final analysis. Conclusions: ML has the potential to significantly advance research in the field of OCSCC. Advantages are related to the use and training of ML models because of their capability to continue training continuously when more data become available. Future ML research will allow us to improve and democratize the application of algorithms to improve the prediction of cancer prognosis and its management worldwide.
Introduction
Oral cavity squamous cell carcinoma (OCSCC) has experienced a gradual increase all over the world in the past few decades, and currently, is the most common type of head and neck malignant tumor and the 8th leading cause of death worldwide with 300,000 new cases and 145,000 deaths per year worldwide [1, 2]. Surgery continues to be the first-line therapy, and the National Comprehensive Cancer Network recommends primary surgical management in both early- and late-stage disease [3], i.e., excision of the primary tumor with or without neck dissection [4, 5].
Despite the multiple prognostic indicators described for OCSCC, lymph node metastasis (LNM) continues to be the most relevant factor [5-10]. Hence, neck assessment is considered mandatory for risk stratification due to the high risk of occult LNM (OLNM). Consequently, different strategies have been described to manage the risk of OLNM, such as close clinical follow-up, reserving therapeutic neck dissection just for those patients who subsequently develop an LNM, sentinel lymph node biopsy, or elective neck dissection. The selection of patients that will benefit from elective neck dissection has been a matter of debate over decades, and one has to balance the rate of pathologically confirmed N0 patients who undergo unnecessary surgery and the long-term sequelae associated with neck dissection, against the benefits of surgically addressing occult nodal metastases [11].
More recently, depth of invasion (DOI) of the primary oral cancer has become the most commonly used histopathologic variable to predict the risk of occult nodal metastasis [12]. Other prognostic factors described in the literature include tumor location, histologic grade, sex, age, extracapsular spread, perineural invasion, lymphovascular invasion, recurrence, or tobacco consumption [13-18]. These factors have been combined to create multivariate regression models and nomograms to predict survival [19-21]. However, these factors have not been widely adopted because of their low predictive accuracy and difficulty to use in daily practice.
Machine learning (ML) is a subset of artificial intelligence and with the increasing availability of large national databases and computing power, the amount of potential input data has increased, and it has become necessary to explore novel approaches of data analysis, to achieve more accurate and precise predictions [22-29]. In this regard, ML represents an alternative way of developing cancer survival prediction models.
The main objective of using ML techniques in medicine is to produce a model that can be used to predict the medical outcome (either diagnosis or the prognosis) of the patient from multivariate data. All these algorithms were designed to learn and find statistical regularities or common patterns from each dataset. The aim of this manuscript was therefore to review and provide a comprehensive summary of all the evidence related to the use of ML algorithms as a noninvasive tool to evaluate the OCSCC prognosis and stratify the risk of recurrence.
Methods
The systematic approach employed for the search strategy in peer-reviewed journals on the use of ML algorithms to evaluate prognosis in OCSCC patients was based on the recommendations of the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, reflected in the PRISMA checklist file included [30]. Inclusion criteria were based on the population, intervention, comparison, outcome, timing, and setting framework [31]. The heterogeneity among studies, mainly due to the absence of randomization, limited our ability to statistically combine data into a formal meta-analysis.
Eligibility Criteria
All prospective, retrospective, controlled, or uncontrolled studies published in peer-reviewed English language journals that investigated the role of ML in OCSCC patients and described an algorithm to evaluate prognosis were considered.
Participant’s Inclusion/Exclusion Criteria
Studies were considered for analysis that reported results of patients aged >18 years and described the use of ML algorithms to evaluate prognosis in OCSCC cases. Studies including other head and neck subsites or in patients aged <18 years or not related to ML as a tool to investigate prognosis were excluded.
Intervention and Comparison
This study investigated the role of ML in determining prognosis in OCSCC cases by evaluating predictors such as demographic data, clinical symptoms, imaging, features, pathological data, or genomic data. ML algorithms were compared with traditional statistical methods as well as other clinical prognostic instruments.
Outcomes
The primary outcome evaluated was progression or recurrence rates, through area under the curve (AUC), and algorithm accuracy analysis.
Timing
In studies evaluating overall survival, a minimum median follow-up time of 3 years after treatment was used to evaluate prognosis.
Setting
Tertiary academic and nonacademic hospitals were included.
Search Strategy
PubMed, Google Scholar, SciELO, and Scopus searches were conducted by 2 independent authors (C.M.C.-E. and M.M.-Y.) to identify articles published between 1958 and March 2020 that fit the inclusion criteria. Studies were screened for the availability of full texts. The following keywords were used: ([“oral cavity cancer” OR “oral cavity squamous cell carcinoma” OR “machine learning” OR “oral cancer” AND “prognosis”]). Wherever applicable, a manual review of relevant articles referenced was carried out to identify studies missed using the search strategy (Fig. 1). Finally, a critical analysis of the selected studies was performed (Table 1). Ethics Committee approval was not required for this review.
Assessment of Evidence
Data extraction was done in duplicate to avoid errors. Evidence level of studies included was appraised using the Oxford Center for Evidence-Based Medicine (OCEBM) Levels of Evidence [32]. The risk of bias in individual cohort studies was assessed according to the risk of bias in nonrandomized studies of interventions tool (ROBIN-I) [33]. Missing data among studies included were revised and summarized.
Results
A total of 81 manuscripts were reviewed, and 46 studies met the inclusion criteria. Of those, 38 were excluded for the following reasons: use of a classical statistical method (N = 16), nonspecific for OCSCC (N = 15), and not being related to OCSCC survival (N = 7). In total, 8 studies were included in the final analysis. All eight were studied and analyzed; the information is summarized in Table 1 [34-41].
The demographic data available for the included studies are summarized in Table 1. There was a high variation in the number of patients included in each study, ranging between 31 and 33,065 patients [34-41]. In all the studies, the strategy followed corresponded to a supervised ML algorithm approach, and the type of algorithms used were dynamic Bayesian network, artificial neural networks (ANN), support vector machine (SVM), logistic regression (LR), decision forest, random forest, gradient boosting machine learning architecture, decision jungle, permutation feature importance, boosted decision tree, naive Bayes, decision tree, and Bayesian network [34-41]. Except in one case [37], all the studies corresponded to a comparison among different ML algorithms.
The main targets described in the included studies corresponded to the evaluation of recurrence prognosis in 3 studies [35, 37, 38]; OLNM prediction in 2 studies [39, 40]; risk of progression, 3-year prognosis, and overall survival prediction in one study, respectively [34, 36, 41]. The most commonly described metric was accuracy, which ranged between 68% and 100% [34-41]. Other metrics described were the AUC, F1 score, precision, and recall. Metrics from studies included related to the AUC, sensitivity (SE), specificity (SP), positive predictive value (PPV), and negative predictive value (NPV) were described in online supplementary Table 1 (see www.karger.com/doi/10.1159/000520672 for all online suppl. material).
According to the Oxford Center for Evidence-Based Medicine grading system, all studies received a grading of 3. The risk of bias according to ROBIN-I is shown in Table 2. Missing data are summarized in online supplementary Table 2.
Discussion
OCSCC represents a significant health problem globally [42, 43]. Despite the wide range of prognostic indicators described, the absolute risk estimated for individual patients remains understudied and is not commonly applied when counseling patients [44, 45]. Clinicians generally rely on the tumor, node, metastasis (TNM) classification to convey the severity of a cancer diagnosis and its prognosis [46]. The increasing use of computer support in hospitals around the world and the increasing availability of large national databases create an outstanding opportunity to explore possibilities of prediction techniques beyond traditional statistics [47].
Evidence about the ML Algorithm as Computer Assistant Tool to Evaluate the OCSCC Prognosis
ML algorithms offer different results that can be categorized as variables; usually these variables can be the AUC that measures the ability of a binary classifier to distinguish between classes and is used as a summary of the ROC curve. SE, defined as a measure of the proportion of current positive cases that got predicted as positive (true positive). SP, defined as the proportion of current negative cases, which got predicted as negative (true negative). PPV, which can be defined as the proportion of predicted positives cases which are current positives and reflects the probability of a predicted positive to be a true positive. NPV, which can be defined as the proportion of predicted negative cases which are real negatives and reflects the probability that a predicted negative case can be a true negative. However, despite these concepts are well known for almost every doctor, it is sometimes difficult to translate these concepts to the clinical scenario and obtain valid conclusions for clinical purposes.
We identified the absence of values like PPV or NPV in some papers, something related directly with the scientific nature of these manuscripts because metrics like sensibility and SP are the most used in the clinical scenario, despite metrics like PPV or NPV besides being similar, are more useful to explain prognosis to our patients. Also, it is important to know that in ML metrics, precision can correspond to PPV, and recall can correspond to SE (online suppl. Table 1).
The first approach to ML in OCSCC was published by Exarchos et al. [34], who compared different algorithms to evaluate the risk of progression among 86 patients affected by OCSCC based on baseline clinical data and disease evolution monitoring data. They reported highest accuracy of 86% using the dynamic Bayesian network [34]. In another paper published by Exarchos et al. [34] to evaluate the risk of recurrence in 41 patients affected by OCSCC, clinical, radiological, and genomic data were used to validate and train the models (Bayesian network, ANN, SVM, decision tree, and random forest). The proposed approach resulted in perfect discrimination (accuracy: 100%) between patients with and without disease recurrence [35]. A similar study was published by Chang et al. [36] evaluating 3-year survival prognosis. The authors developed a hybrid feature selection model and compared it with ML methods, both based on the correlation of clinicopathologic and genomic markers. They reported that the hybrid model resulted in the best accuracy (accuracy = 93.81%, AUC = 0.90) with the selected features of tissue invasion and p63 being the most relevant prognostic markers [36].
In 2019, at least 5 papers related to the use of ML in OCSCC prognosis were published. Alabi et al. [37, 38] published 2 studies in which they summarized data from Finland and Brazil to estimate the risk of locoregional recurrence in early-stage SCC of the tongue. Both papers are in agreement with the initial large sample size studies using the same dataset [37, 38]. In the first paper, the authors compared the use of an ANN versus LR (overall accuracy was 92.7% vs. 86.5%) using a web-based application [37]. In the second paper, the authors compared different types of algorithms with the SVM and naive Bayes, which reported the highest accuracy of 68% and 70%, respectively [38]. Bur et al. [39] published a study including 1,961 patients from the National Cancer Database in the USA that included clinical and histopathological variables like age, sex, race, ethnicity, primary tumor site, histology, grade, and DOI in an attempt to develop and validate an algorithm to predict OLNM in clinically node-negative OCSCC. They reported that the decision forest was the best classifier, achieving an AUC of 0.840. The most relevant finding of this study was the fact that for the single-institution data, the predictive performance of ML exceeded that of the DOI model (AUC = 0.657, p = 0.007), and that when compared to the DOI model, ML reduced the number of recommended neck dissections, while simultaneously improving SE and SP [39]. Mermod et al. [40] also tried to develop, compare, and validate a different ML algorithm to estimate the risk of OLNM, by combining markers of lymphangiogenesis and angiogenesis with clinicopathological features. According to the validation cohort, the authors estimated that their model would have reduced the risk of overtreating the neck from 82% to only 9%. However, the authors do highlight that in 4% of patients, OLNM would still have been overlooked [40].
In the largest published study including 33,065 patients from the National Cancer Database, Karadaghy et al. [41] intended to develop a model to predict the 5-year overall survival in patients with OCSCC. Variables included patient characteristics such as age, sex, race/ethnicity, and comorbid disease; tumor characteristics included T, N, and M scores and staging as determined both clinically and pathologically as well as some additional tumor variables such as tumor grade, extracapsular spread, and perineural invasion. After training the algorithms, the authors compared the accuracy of the model against the TNM staging. The reported AUC for this ML model was 0.80 (95% CI: 0.79–0.81), with an accuracy of 71%, precision of 71%, and recall of 68%. In comparison, the AUC of the TNM staging system was 0.68 (95% CI: 0.67–0.70), with an accuracy of 65%, a precision of 69%, and a recall of 52% [41].
Implications for Clinical Practice
Currently, the American Joint Commission of Cancer (AJCC) 8th staging system is the most widely used system for assessment of prognosis in OCSCC patients. Therefore, some authors propose the use of nomograms enable to visualize the prognostic strength of various relevant factors in a single model, with a higher accuracy regarding survival prediction than conventional TNM staging system or an individual molecular biomarker [48-51].
However, previous authors considered that traditional statistical methods such as Cox proportional hazards regression, LR, and Kaplan-Meier may slow the progress of prediction models [41]. This assumption is illustrated by the inability of the aforementioned methods to handle medical data with high variability, nonlinear interactions, heterogeneous distributions or learn new patterns from the data generated prospectively to modify and improve prognosis accuracy [52, 53]. By contrast, ML may be a better instrument to handle large datasets with complex, nonlinear, and heterogeneous distributions [52-54]. This is a situation commonly found when we try to analyze large cancer databases, including different types of variables [55]. Another common difference between traditional models studies and ML studies, correspond to the assessment of the data including only sample analyzed without the inclusion of new data within the algorithm.
At this early point of ML research, the real impact of these techniques on OCSCC prognostication is difficult to determine. Also, validating the role of ML algorithms in OCSCC is difficult at this point as there are unanswered questions relating to, e.g., whether one can trust ML algorithms to predict prognosis, whether data collection was good, which one is the best modeling algorithm, and whether it is possible to agree on the use of this technology in a multidisciplinary cancer committee setting. One would also need to convince doctors involved in the treatment of OCSCC about the benefits of this technique. All these questions are relevant to assess the real impact of ML in a real-life clinical setting. However, except for some web-based applications used in a research environment, there is currently no widely available implementation in a real clinical setting.
Despite this, the predictive ability of ML and its ability to improve recently developed methods such as DOI to predict the risk of OLNM or even to improve prognostication of the current TNM system, by combining these data with demographic, clinical, and histopathological data, and other variables can be accepted from the reported literature [39, 41]. Use of ML-based computer prognosis support may be translated into a more precise prognosis of survival and reduce surgical or treatment morbidity, thereby improving patients’ quality of life and permitting more accurate counseling. In addition, the development of such algorithms is affordable, especially with the support of a great open-source community providing a host of solutions to develop the software. ML may also permit the real possibility to fine-tune and adapt prognostic tools to suit demographic, clinical, and histopathological data of individual countries or regions, and even take into consideration resource constraints.
Limitations and Future Challenges Related to the Application of ML in OCSCC
As we have noted, almost all the studies till date have been designed to be a proof of concept about the feasibility of building ML-based cancer prognostic tool. There remain significant challenges in applying ML techniques in the prediction of the prognosis of OCSCC.
Some of these challenges include the relatively small number of patients and data available, and the retrospective nature of almost all the studies, increasing the risk of suboptimal performance due to overfitting problems. It is necessary to apply regularization methods such as ridge and lasso or L1 and L2, among others, to combat this. Imbalanced patient cohorts can be another limitation, e.g., in some high-mortality cancers like OCSCC, it is common to find fewer survivors in the study group. This might affect the performance of the algorithm because of the imbalanced data used to train. The high risk of missing data from highly completed datasets is also a challenge. To overcome this problem, a data imputation method based on known data can be used, as proposed by Rendleman et al. [55] who described a multivariate imputation by the chained equations (MICE) model that works with the assumption that the missing data are missing in a random fashion.
Another challenge is the need for a more generic consensus in the best ML training algorithm because of the difficulty in evaluating model accuracy among different studies. One solution is to start building cancer patient databases for prognosis analysis [56-60]. However, specific infrastructure is needed for data storage to build, develop, and train ML models. Such endeavors also need to consider privacy requirements of healthcare data, and the buy-in of political administrations, the research community, and personal awareness [61]. Finally, there is a global need for computational researchers who have expertise in biomedical research and ML to improve research in this area. Finally, we need to highlight the little available evidence about ML as a computer assistant approach across the literature as a significant limitation of our study.
Conclusion
ML has the potential to significantly advance research in the field of OCSCC. Advantages are related to the use and training of ML models because of their capability to continue training continuously when more data become available. Future ML research will allow us to improve and democratize the application of algorithms to improve the prediction of cancer prognosis and its management worldwide.
Statement of Ethics
Ethics Committee approval for systematic reviews was not required in any of the participant institutions.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
No funding was received for this study.
Author Contributions
1.Carlos M. Chiesa-Estomba: A, B, C, D.
2.Manuel Graña: A, C.
3.Alfonso Medela: A.
4.Jon A. Sistiaga-Suarez: A.
5.Jerome R. Lechien: A.
6.Christian Calvo-Heriquez: A.
7.Miguel Mayo-Yanez: B.
8.Luigi Angelo Vaira: B.
9.Alberto Grammatica: B.
10.Giovanni Cammaroto: D.
11.Tareck Ayad: D.
12.Johannes J. Fagan: D.
A Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; AND
B Drafting the work or revising it critically for important intellectual content; AND
C Final approval of the version to be published; AND
D Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Data Availability Statement
All data generated or analyzed during this study are included in this article (and/or) its online supplementary material. Further inquiries can be directed to the corresponding author.