Abstract
Introduction: Pancreatic ductal adenocarcinoma (PDAC) has the lowest survival rate among all major cancers due to a lack of symptoms in early stages, early detection tools, and optimal therapies for late-stage patients. Thus, effective and non-invasive diagnostic tests are greatly needed. Recently, circulating miRNAs have been reported to be altered in PDAC. They are promising biomarkers because of stability in the blood, ease of non-invasive detection, and convenient screening methods. This study aimed to use blood-based miRNA biomarkers and various analysis methods in the development of a machine-learning (ML) model for PDAC. Methods: Blood-based miRNAs associated with PDAC were collected from open sources. miRNA sequences, targeted genes, and involved pathways were used to construct a set of descriptors for an ML model. Results: Bioinformatics analysis revealed that most genes in pancreatic cancer and insulin signaling pathways were targeted by the PDAC-related miRNAs. The best-performing ML model with the Random Forest classifier was able to achieve an accuracy of 88.4%. Model evaluations of an independent PDAC-associated miRNAs test set had 100% accuracy while non-cancer miRNAs had 52.4% accuracy, indicating specificity to PDAC. Conclusions: Our results suggest an ML model developed using blood-based miRNA biomarkers’ target gene, pathway, and sequence features could be potentially implicated in PDAC diagnostics.
Introduction
According to the American Cancer Society, pancreatic cancer (PC) is the deadliest among all common cancer types due to having an overall 5-year survival rate of 13% from 2013 through 2019. In comparison, breast and prostate cancers, the two most common cancers, have 5-year survival rates of 91 and 97%, respectively [1]. Only 15% of PC patients are diagnosed in the localized stages, with 28% at regionalized stages and 47% at distant stages. Additionally, the 5-year survival rate of PC is 44% if detected before spreading, 16% if spread to nearby tissue, and 3% if spread to other parts of the body [1]. The most common PC is pancreatic ductal adenocarcinoma (PDAC), accounting for more than 90% of PC diagnoses [2].
There are two main factors that contribute to the high mortality rate of PDAC: late diagnosis and lack of effective therapies after metastasis. Currently, only 20% of cases of PDAC are treatable with surgical resection at presentation [3]. Even with the most plausible chance of a cure, up to 80% of patients suffer from PDAC recurrences within 2 years after surgical tumor removal. The outcomes of surgical resection can be improved using systemic chemotherapy, radiation therapy, and combination approaches [4]. First-line therapies for metastatic PDAC can improve the prognosis, but treatment efficacy is still limited; the median overall survival was 9.27 and 6.87 months for PDAC patients treated with FOLFIRINOX and gemcitabine plus nab-paclitaxel, respectively [5]. Thus, it is critical to develop effective, non-invasive, and convenient biomarkers and detection methods for PDAC diagnosis. Carbohydrate antigen 19-9 (CA19-9) is the only FDA-approved blood-based biomarker in PDAC, but it can be challenging to use in a practical clinical setting due to low sensitivity and specificity [6].
MicroRNAs (miRNAs) are small, non-coding RNAs that are involved in post-transcriptional regulation of genetic expression by binding to 3′ or 5′ untranslated regions of mRNAs. This interaction suppresses gene expression via inhibition of protein translation and/or promotion of mRNA cleavage [7]. Found in various bodily fluids, including pancreatic fluid and blood, miRNAs have very widespread application throughout the body [8]. Circulating miRNAs act as hormone-adjacent signals released from diseased tissue as extracellular communication between cells [6]. Deregulation of miRNAs can lead to cancer due to miRNAs’ important roles in PC initiation, progression, and metastasis [9]; abnormally expressed miRNA levels were detected in the blood of PC patients, suggesting that miRNAs could potentially be biomarkers [10]. A study by Zou et al. [11] showed that a panel of six serum miRNAs (let-7b-5p, miR-192-5p, miR-19a-3p, miR-19b-3p, miR-223-3p, and miR-25-3p) has a diagnostic sensitivity of 93.3% and a specificity of 96% for the detection of stage I-IV PC, and the AUC value was 0.978. Dittmar et al. [12] identified three miRNAs including upregulated miR-34a-5p, miR-130a-3p, miR-222-3p in plasma as a promising biomarker for early-stage PDAC detection. Similarly, Khan et al. [13] found that five serum miRNAs, three upregulated (miR-215-5p, miR-122-5p, and miR-192-5p) and two downregulated (miR-30b-5p and miR-320b), could distinguish patients with PDAC from chronic pancreatitis and healthy controls.
Circulating miRNAs are not only non-invasive but also stable and sensitive [6, 8], providing a potential solution to the diagnostic problems faced by PDAC. Currently, numerous miRNA-based diagnosis tools are offered to clinicians. For example, the OsteomiR™ kit analyzes specific circulating miRNAs in human serum and plasma for identifying bone quality [7]. With an estimated 66,440 people diagnosed and 51,750 dying from PC in 2024 so far [1], miRNAs could provide important insight into effective diagnosis and reduction of mortality rate.
Although miRNAs have many important applications toward bettering humanity’s health, the research is resource intensive. Thus, the use of machine learning (ML) in miRNA research has become an increasingly popular topic. With more than 186,000 articles regarding the computational discovery of miRNA-disease association online [14, 15], many miRNA databases, such as miRTarBase, DIANA-miRPath, and miR2Disease, that aggregate information about genes, pathways, and other biological entities associated with corresponding miRNAs have been established [15]. Previous studies developed ML models that can predict cancer diagnosis and recurrence based on miRNA signatures such as breast cancer, melanoma, and laryngeal cancer [16‒19].
There have been studies that demonstrate the association between certain circulating miRNAs and PDAC [8]. This study is the first to create an effective diagnostic model for PDAC using ML for the analysis of a large quantity of blood-based miRNAs and their features.
Methods
miRNA sequences, target genes, and related pathways were used for the construction of descriptors for ML models. This is because miRNAs associated with PDAC are theorized to share common features involved in tumorigenesis and cancer development. The following programs and databases were used for ML model development and miRNA analysis: miRTarBase v. 9.0 [20], DIANA-miRPath v. 4.0 [21], miRbase [22], and Waikato Environment for Knowledge Analysis (WEKA) [23]. A flowchart of the methods is shown in Figure 1.
Selection of miRNAs
Based on an open source search, 69 blood-based miRNAs (serum or plasma) associated with PDAC were identified [8, 10] and used for ML model development. A list of 69 random miRNAs unrelated to PDAC was generated from the miRNA database miRBase as a control [22]. For model validation, 14 additional plasma or serum miRNAs associated with PDAC were extracted from three papers [13, 24, 25]. For testing the ML model specificity, we used miRNA sets linked to the disease unrelated to PC: 21 plasma miRNA sarcopenic obesity biomarkers [26].
Identification of Target Genes
The miRNA target genes were downloaded from miRTarBase v. 9.0 database, an online database of experimentally proven miRNA target genes [20]. A Python script was developed to extract the target genes then assign “yes” or “no” to each miRNA. A “yes” indicates that the miRNA targets the specific gene, while “no” indicates that the miRNA does not target that gene.
Identification of Pathways
Another component of the miRNA descriptors was based on miRNA target pathways downloaded from DIANA-miRPath v. 4.0 database with KEGG pathway analysis [21]. A Python script was developed to assign “yes” to the pathways with significant p value (≤0.05) and “no” if the p value >0.05.
Identification of miRNA Sequences
miRNA sequences were downloaded from the miRBase database [22], and inspired from previous research in our laboratory, a Python program was developed to generate miRNA descriptors based on the composition of sequences [19]. The miRNA descriptors included the number of bases, frequency, mean mass, and hydrogen bonds. If the miRNA has 2, 3, and/or 4 base pair motifs within the entire miRNA sequence, first five base pairs, and the last five base pairs, a “yes” was assigned. Meanwhile, “no” was assigned if the motifs were not found in the miRNA sequence.
Machine-Learning Analysis
The target gene, pathway, and sequence descriptors were combined to create a miRNA descriptor table with 138 miRNAs (69 PDAC + 69 random). In total, there were 12,441 miRNA descriptors. An additional column named “class” was added to label the 69 PDAC-associated miRNAs as “selected” while the 69 random control miRNAs were labeled “random.”
This descriptor table was loaded in WEKA and attribute selection was applied to the data through the InfoGainAttributeEval function and the ranker search method, ranking the attributes by the amount of information gained with respect to the class. After attribute filtering, all descriptors below a certain cutoff were removed.
The resulting attributes were then used to analyze the performance of multiple ML classification algorithms in WEKA software [23]. Performance of the classifiers was compared by measuring the accuracy of recognition of PDAC-related miRNAs. Random Forest algorithm with 10-fold cross-validation had the best classification performance and was therefore being used to build the model with training sets and corresponding testing sets. The validity of the PDAC ML model was further evaluated with a test dataset with the additional 14 blood-based miRNAs associated with PDAC and 21 plasma miRNAs associated with sarcopenic obesity.
Results
Sixty-nine blood-based miRNAs associated with PDAC were selected from open sources [8, 10] and are listed in Table 1. An ML model was built based on the target genes, implicated pathways, and sequence features of the 69 PDAC miRNAs and 69 random miRNAs.
hsa-let-7b-5p | hsa-miR-181b-5p | hsa-miR-221-3p | hsa-miR-486-5p |
hsa-miR-100-5p | hsa-miR-181c-5p | hsa-miR-222-3p | hsa-miR-492 |
hsa-miR-106b-5p | hsa-miR-181d-5p | hsa-miR-223-3p | hsa-miR-5100 |
hsa-miR-107 | hsa-miR-182-5p | hsa-miR-22-3p | hsa-miR-574-3p |
hsa-miR-10b-5p | hsa-miR-18a-5p | hsa-miR-22-5p | hsa-miR-607 |
hsa-miR-1202 | hsa-miR-1915-3p | hsa-miR-24-3p | hsa-miR-628-3p |
hsa-miR-124-3p | hsa-miR-192-5p | hsa-miR-25-3p | hsa-miR-629-5p |
hsa-miR-1246 | hsa-miR-193b-3p | hsa-miR-30c-5p | hsa-miR-642b-3p |
hsa-miR-126-3p | hsa-miR-196a-5p | hsa-miR-34a-5p | hsa-miR-663a |
hsa-miR-1275 | hsa-miR-196b-5p | hsa-miR-3679-5p | hsa-miR-744-5p |
hsa-miR-1290 | hsa-miR-19a-3p | hsa-miR-373-3p | hsa-miR-7-5p |
hsa-miR-130a-3p | hsa-miR-19b-3p | hsa-miR-378a-3p | hsa-miR-885-5p |
hsa-miR-133a-3p | hsa-miR-200a-3p | hsa-miR-429 | hsa-miR-939-5p |
hsa-miR-134-5p | hsa-miR-200b-3p | hsa-miR-4466 | hsa-miR-99a-5p |
hsa-miR-146a-5p | hsa-miR-20a-5p | hsa-miR-4516 | hsa-miR-99b-5p |
hsa-miR-155-5p | hsa-miR-210-3p | hsa-miR-4687-3p | |
hsa-miR-16-5p | hsa-miR-212-3p | hsa-miR-483-3p | |
hsa-miR-181a-5p | hsa-miR-21-5p | hsa-miR-484 |
hsa-let-7b-5p | hsa-miR-181b-5p | hsa-miR-221-3p | hsa-miR-486-5p |
hsa-miR-100-5p | hsa-miR-181c-5p | hsa-miR-222-3p | hsa-miR-492 |
hsa-miR-106b-5p | hsa-miR-181d-5p | hsa-miR-223-3p | hsa-miR-5100 |
hsa-miR-107 | hsa-miR-182-5p | hsa-miR-22-3p | hsa-miR-574-3p |
hsa-miR-10b-5p | hsa-miR-18a-5p | hsa-miR-22-5p | hsa-miR-607 |
hsa-miR-1202 | hsa-miR-1915-3p | hsa-miR-24-3p | hsa-miR-628-3p |
hsa-miR-124-3p | hsa-miR-192-5p | hsa-miR-25-3p | hsa-miR-629-5p |
hsa-miR-1246 | hsa-miR-193b-3p | hsa-miR-30c-5p | hsa-miR-642b-3p |
hsa-miR-126-3p | hsa-miR-196a-5p | hsa-miR-34a-5p | hsa-miR-663a |
hsa-miR-1275 | hsa-miR-196b-5p | hsa-miR-3679-5p | hsa-miR-744-5p |
hsa-miR-1290 | hsa-miR-19a-3p | hsa-miR-373-3p | hsa-miR-7-5p |
hsa-miR-130a-3p | hsa-miR-19b-3p | hsa-miR-378a-3p | hsa-miR-885-5p |
hsa-miR-133a-3p | hsa-miR-200a-3p | hsa-miR-429 | hsa-miR-939-5p |
hsa-miR-134-5p | hsa-miR-200b-3p | hsa-miR-4466 | hsa-miR-99a-5p |
hsa-miR-146a-5p | hsa-miR-20a-5p | hsa-miR-4516 | hsa-miR-99b-5p |
hsa-miR-155-5p | hsa-miR-210-3p | hsa-miR-4687-3p | |
hsa-miR-16-5p | hsa-miR-212-3p | hsa-miR-483-3p | |
hsa-miR-181a-5p | hsa-miR-21-5p | hsa-miR-484 |
Bioinformatics Analysis of PDAC-Associated miRNAs
KEGG database analysis illustrated the PC pathway, with 59 of 69 PDAC miRNAs targeting 67 genes out of 78 genes. Figure 2a is a KEGG fragment plot that displays most of the key genes present in the PC pathway targeted by PDAC miRNAs. The details for each miRNA that targets genes in the PC pathway are illustrated in online supplementary Table S1 (for all online suppl. material, see https://doi.org/10.1159/000540329).
The insulin signaling pathway promotes PC initiation and progression by inducing tumorigenic inflammation, regulating lipid and glucose metabolic reprogram, overcoming apoptosis, stimulating cancer metastasis, and activating tumor microenvironment formation [27, 28]. As a result, it is a promising target for drug discovery in PC. KEGG pathway analysis revealed that PDAC miRNAs targeted many components of the insulin pathway with 60 of 69 PDAC miRNAs targeting 98 of 153 genes. Figure 2b is a KEGG fragment plot that displays most of the key genes present in the insulin signaling pathway targeted by PDAC miRNAs. The details for each miRNA and genes in the insulin signaling pathway they target are illustrated in online supplementary Table S2.
Performance Comparison of the Different Classifiers
Descriptors generated with target pathways, genes, and sequences were evaluated by employing the InfoGainAttributeEval filter and a ranker search. Using a threshold of 0.05, the number of attributes for the training dataset was reduced from 12,441 to 226 for the final model.
Using WEKA, a PDAC diagnostic model was created, and the performances of several classifiers, including Random Forest, J48, LMT (logistic model tree), SMO (sequential minimal optimization), and SGD (stochastic gradient descent), with 10-fold cross-validation, were compared. Parameters and training/testing set were kept the same. The following statistic metrics were used for evaluation: the accuracy, precision, recall, F-measure, the area under the receiver operating characteristic curve (AUC), and the area under the precision-recall curve (AUC-PR). Recall is also named the true-positive rate, which is the probability of correctly classifying a positive class. F-measure evaluates the performance of an ML model by combining precision and recall. The AUC compares sensitivity versus specificity across a range of values to show the performance of a classifier; AUC is widely used to measure the accuracy of diagnostic tests. An AUC of 70–80% is considered acceptable, 80–90% is considered excellent, and greater than 90% is considered outstanding. AUC-PR demonstrates precision for the corresponding sensitivity (recall) values. Table 2 displays the performance comparison of all the classifiers; Random Forest had the best accuracy at 88.4% and performance with an average AUC of 94.2% and AUC-PR value of 93.3%.
Classifier . | Accuracy, % . | Precision, % . | Recall, % . | F-measure, % . | AUC, % . | AUC-PR, % . |
---|---|---|---|---|---|---|
Random Forest | 88.4 | 89.6 | 88.4 | 88.3 | 94.2 | 93.3 |
SMO | 84.8 | 86.1 | 84.8 | 84.6 | 84.8 | 79.9 |
J48 | 82.6 | 83.3 | 82.6 | 82.5 | 82.2 | 78.6 |
LMT | 79.7 | 81.4 | 79.7 | 79.4 | 90.2 | 90.5 |
SGD | 79.7 | 81.4 | 79.7 | 79.4 | 79.7 | 74.2 |
Classifier . | Accuracy, % . | Precision, % . | Recall, % . | F-measure, % . | AUC, % . | AUC-PR, % . |
---|---|---|---|---|---|---|
Random Forest | 88.4 | 89.6 | 88.4 | 88.3 | 94.2 | 93.3 |
SMO | 84.8 | 86.1 | 84.8 | 84.6 | 84.8 | 79.9 |
J48 | 82.6 | 83.3 | 82.6 | 82.5 | 82.2 | 78.6 |
LMT | 79.7 | 81.4 | 79.7 | 79.4 | 90.2 | 90.5 |
SGD | 79.7 | 81.4 | 79.7 | 79.4 | 79.7 | 74.2 |
SMO, sequential minimal optimization; LMT, logistic model tree; SGD, stochastic gradient descent; AUC, area under the receiver operating characteristic curve; AUC-PR, area under the precision-recall curve.
To explore whether the number of attributes had a significant impact on the prediction performance, different threshold levels for InfoGain attribute filtering with 10-fold cross-validation were tested. The best prediction accuracy of each classifier was observed at a different number of attributes (Fig. 3a). Random Forest with a threshold of 0.05 and 226 attributes had the best accuracy compared to the other classifiers. Various folds of cross-validation for different classifiers were also compared and shown in Figure 3b. Among them, Random Forest classifier with 10-fold cross-validation had the best performance and was thus selected for the PDAC ML model validation.
Testing of Independent Datasets on Developed Model
Fourteen serum/plasma miRNAs reported as PDAC biomarkers [13, 24, 25] (listed in online suppl. Table S3) were used to test our model. Four (hsa-miR-1290, hsa-miR-155-5p, hsa-miR-192-5p, and hsa-miR-21-5p) overlapped with the training model dataset set. For these 14 miRNAs, we generated a dataset with the same descriptors (226 attributes) as the training set. When inputting these 14 miRNAs in the established ML model, all 14 miRNAs were classified as associated with PDAC with 100% prediction accuracy.
A separate miRNA dataset of an unrelated disease was also tested with the model. From an open source search, 21 plasma miRNAs associated with sarcopenic obesity were obtained (listed in online suppl. Table S4) [26]. For these 21 miRNAs, we generated the dataset with the same descriptors (226 attributes) as the training set and input it into our model. With an accuracy of 52.4%, this low classification accuracy of the sarcopenic obesity’s data supports that the generated PDAC model can differentiate between PDAC and other diseases.
Discussion
PDAC is an extremely aggressive malignancy with the highest mortality rate of all major cancers. Lack of early symptoms and rapid progression make PDAC patients most diagnosed at advanced stages. Therefore, accurate diagnosis is vital for improving the survival of PDAC. For better diagnostic methods, miRNAs and ML are increasingly being used to identify biomarkers. In recent years, circulating miRNAs have become promising biomarkers due to the stability in the blood, ease of non-invasive detection, and convenient screening methods [29]. However, variations of the PDAC biomarkers of circulating miRNA were reported by different studies. Thus, it is critical to use a significant number of circulating miRNAs associated with PDAC to build a diagnostic model. ML, a useful tool for analysis of large and complex data sets, has enabled us to evaluate powerful diagnostic biomarkers and models for PDAC. In this study, we collected blood-based miRNA biomarkers of PDAC reported in the recent two decades, used comprehensive data mining according to miRNA profiling, and developed an ML system for PDAC diagnosis. The best classifier was Random Forest with a 10-fold cross-validation accuracy of 88.4%. Model performance evaluated by using an independent PC dataset and sarcopenic obesity dataset had accuracies of 100% and 52.4% respectively, demonstrating that this model was accurately trained for specifically recognizing PDAC-related miRNAs.
The 69 PDAC-related miRNAs collected in this study targeted many genes implicated in several important PC and insulin signaling pathways such as PI3K-Akt, MAPK, Jak-STAT, p53, and TGFβ (Fig. 2a, b). In the KEGG PC pathway, the top miRNAs were hsa-miR-34a-5p and hsa-miR-16-5p that targeted 21 and 17 genes, respectively; hsa-miR-155-5p and hsa-miR-21-5p targeted 16 genes (online suppl. Table S1). Tang et al. [30] demonstrated that hsa-miR-34a inhibited PC growth by downregulating Snail1 and Notch1. Furthermore, Hidalgo-Sastre et al. [31] discovered that hsa-miR-34a suppressed PDAC development by modulating the immune microenvironment. Basu et al. [32] found that hsa-miR-16 acts as a tumor suppressor by attenuating PDAC cell growth through down-regulation of the anti-apoptotic gene BCL2. hsa-miR-21 was discovered to be overexpressed in PDAC patients and was further associated with poor prognosis; it regulated the stemness of PDAC cells and thus played roles in mesenchymal transition [33]. Meanwhile, a study by Ma et al. [34] demonstrated that hsa-miR-21 promotes PDAC cell migration and mesenchymal transition via activating Ras/Erk pathway.
hsa-miR-16-5p is listed as the most important miRNA in the insulin signaling pathway, targeting 26 genes. hsa-let-7b-5p and hsa-miR-155-5p were second most involved with both targeting 18 genes (online suppl. Table S2). The hsa-let-7 family was dysregulated in PDAC patients, which may result in promoting insulin receptor/insulin-like growth factor (IGF) signaling pathways that are key for PDAC development and progression [35]. A study by Wang and Gao [36] revealed that hsa-miR-155-5p activates the Akt/NF-κB pathway that promotes immune evasion and PC cell proliferation. hsa-miR155 is associated with the pathogenesis of both type 2 diabetes and PC, with diabetes being a known risk factor for PC [37]. Altogether, these top-ranked miRNAs play an important role in PC development and could be potential targets for PC treatment.
In this study, we developed an ML diagnostics model of PDAC using blood-based miRNA biomarkers and their features including the sequence, targeted genes, and pathways. Our model has the potential to classify whether a given miRNA is associated with PDAC based on its associated features. Future research should include the integration of more advanced machine learning algorithms and the inclusion of additional types of biomarkers, such as CA19-9 and CEA (carcinoembryonic antigen) [8], to enhance diagnostic power.
Despite the promising results, our study has several limitations. First, most miRNA biomarkers used in our model were identified among the patients with mixed stages of PDAC. Although our model can potentially determine whether a patient’s miRNA panel is associated with PDAC, it cannot determine whether the panel is associated with early or advanced stage of PDAC. Second, the sample size of miRNA datasets, though comprehensive, may still not capture the full spectrum of miRNA variability in PDAC before clinical use. Some newly discovered circulating miRNAs associated with PDAC were not included in the miRTarBase or DIANA-miRPath databases and were thus not included in our study. Third, although the miRNA biomarkers used for our model were derived from published studies and experimentally evaluated, the specificity of these miRNA biomarkers also needs further validation in larger, independent cohorts with standardized methodology. Additionally, the potential for false positives and false negatives in the current AI analyses must be acknowledged.
Future work will focus on evaluating the model’s performance, specifically in high-risk populations and in early-stage detection. High risk factors for PDAC include age, family history, diabetes, obesity, chronic pancreatitis, and gene mutations of BRCA1, BRCA2, or ATM [38, 39]. For example, age is an important risk factor for the development of PC as people aged 65–74 are most frequently diagnosed: 68% of patients were older than 65 [39]. In future studies adding these high-risk factors in the ML model may help improve diagnostic accuracy. Blackford and colleagues reported the 5-year overall survival for stage 1A PDAC is 83.7% in 2012, and 74.3% for stage 1B, decreasing to 13.3% for stage IIA, 15.5% for stage IIB, 3.2% for stage III and 2.8% for stage IV [40]. Hence, it is critical to have a sensitive and non-invasive method to diagnosis early-stage PDAC. Building ML diagnostic models using miRNA profile and other biomarkers associated with different PDAC stages, especially stage I and II, as well as applying these models to individuals with one or more high risk factors provides a possible strategy for PDAC screening are our two next goals.
Additionally, we would like to use AI to address the role of miRNAs in different stages of PDAC, chemoresistance and therapy response to identify prognostic biomarkers for predicting survival, surgical outcome monitoring, and effective drugs selection. Based on our knowledge, this is the first AI system created for potential diagnosis of PDAC using blood-based miRNAs as biomarkers. Future work will enhance its performance of PDAC diagnosis, especially for early-stage detection and for screening in high-risk population. However, before the diagnostic system can be used in a clinical setting, a more comprehensive validation in larger cohort’s studies is needed.
Statement of Ethics
An ethics statement was not required for this study type, no human or animal subjects or materials were used.
Conflict of Interest Statement
The authors declare that they have no competing interests.
Funding Sources
The authors have not received any financial support.
Author Contributions
J.Y.T. proposed the project, designed the study, collected the data, developed a machine-learning model, and wrote the manuscript; I.F.T., S.K., and V.L.K. proposed the study concept, methodology, supervised the project, and edited the manuscript. All authors approved the final manuscript.
Data Availability Statement
The Python code used for this study is available upon request. Further inquiries can be directed to the corresponding author.