Introduction: Pancreatic ductal adenocarcinoma (PDAC) has the lowest survival rate among all major cancers due to a lack of symptoms in early stages, early detection tools, and optimal therapies for late-stage patients. Thus, effective and non-invasive diagnostic tests are greatly needed. Recently, circulating miRNAs have been reported to be altered in PDAC. They are promising biomarkers because of stability in the blood, ease of non-invasive detection, and convenient screening methods. This study aimed to use blood-based miRNA biomarkers and various analysis methods in the development of a machine-learning (ML) model for PDAC. Methods: Blood-based miRNAs associated with PDAC were collected from open sources. miRNA sequences, targeted genes, and involved pathways were used to construct a set of descriptors for an ML model. Results: Bioinformatics analysis revealed that most genes in pancreatic cancer and insulin signaling pathways were targeted by the PDAC-related miRNAs. The best-performing ML model with the Random Forest classifier was able to achieve an accuracy of 88.4%. Model evaluations of an independent PDAC-associated miRNAs test set had 100% accuracy while non-cancer miRNAs had 52.4% accuracy, indicating specificity to PDAC. Conclusions: Our results suggest an ML model developed using blood-based miRNA biomarkers’ target gene, pathway, and sequence features could be potentially implicated in PDAC diagnostics.

According to the American Cancer Society, pancreatic cancer (PC) is the deadliest among all common cancer types due to having an overall 5-year survival rate of 13% from 2013 through 2019. In comparison, breast and prostate cancers, the two most common cancers, have 5-year survival rates of 91 and 97%, respectively [1]. Only 15% of PC patients are diagnosed in the localized stages, with 28% at regionalized stages and 47% at distant stages. Additionally, the 5-year survival rate of PC is 44% if detected before spreading, 16% if spread to nearby tissue, and 3% if spread to other parts of the body [1]. The most common PC is pancreatic ductal adenocarcinoma (PDAC), accounting for more than 90% of PC diagnoses [2].

There are two main factors that contribute to the high mortality rate of PDAC: late diagnosis and lack of effective therapies after metastasis. Currently, only 20% of cases of PDAC are treatable with surgical resection at presentation [3]. Even with the most plausible chance of a cure, up to 80% of patients suffer from PDAC recurrences within 2 years after surgical tumor removal. The outcomes of surgical resection can be improved using systemic chemotherapy, radiation therapy, and combination approaches [4]. First-line therapies for metastatic PDAC can improve the prognosis, but treatment efficacy is still limited; the median overall survival was 9.27 and 6.87 months for PDAC patients treated with FOLFIRINOX and gemcitabine plus nab-paclitaxel, respectively [5]. Thus, it is critical to develop effective, non-invasive, and convenient biomarkers and detection methods for PDAC diagnosis. Carbohydrate antigen 19-9 (CA19-9) is the only FDA-approved blood-based biomarker in PDAC, but it can be challenging to use in a practical clinical setting due to low sensitivity and specificity [6].

MicroRNAs (miRNAs) are small, non-coding RNAs that are involved in post-transcriptional regulation of genetic expression by binding to 3′ or 5′ untranslated regions of mRNAs. This interaction suppresses gene expression via inhibition of protein translation and/or promotion of mRNA cleavage [7]. Found in various bodily fluids, including pancreatic fluid and blood, miRNAs have very widespread application throughout the body [8]. Circulating miRNAs act as hormone-adjacent signals released from diseased tissue as extracellular communication between cells [6]. Deregulation of miRNAs can lead to cancer due to miRNAs’ important roles in PC initiation, progression, and metastasis [9]; abnormally expressed miRNA levels were detected in the blood of PC patients, suggesting that miRNAs could potentially be biomarkers [10]. A study by Zou et al. [11] showed that a panel of six serum miRNAs (let-7b-5p, miR-192-5p, miR-19a-3p, miR-19b-3p, miR-223-3p, and miR-25-3p) has a diagnostic sensitivity of 93.3% and a specificity of 96% for the detection of stage I-IV PC, and the AUC value was 0.978. Dittmar et al. [12] identified three miRNAs including upregulated miR-34a-5p, miR-130a-3p, miR-222-3p in plasma as a promising biomarker for early-stage PDAC detection. Similarly, Khan et al. [13] found that five serum miRNAs, three upregulated (miR-215-5p, miR-122-5p, and miR-192-5p) and two downregulated (miR-30b-5p and miR-320b), could distinguish patients with PDAC from chronic pancreatitis and healthy controls.

Circulating miRNAs are not only non-invasive but also stable and sensitive [6, 8], providing a potential solution to the diagnostic problems faced by PDAC. Currently, numerous miRNA-based diagnosis tools are offered to clinicians. For example, the OsteomiR kit analyzes specific circulating miRNAs in human serum and plasma for identifying bone quality [7]. With an estimated 66,440 people diagnosed and 51,750 dying from PC in 2024 so far [1], miRNAs could provide important insight into effective diagnosis and reduction of mortality rate.

Although miRNAs have many important applications toward bettering humanity’s health, the research is resource intensive. Thus, the use of machine learning (ML) in miRNA research has become an increasingly popular topic. With more than 186,000 articles regarding the computational discovery of miRNA-disease association online [14, 15], many miRNA databases, such as miRTarBase, DIANA-miRPath, and miR2Disease, that aggregate information about genes, pathways, and other biological entities associated with corresponding miRNAs have been established [15]. Previous studies developed ML models that can predict cancer diagnosis and recurrence based on miRNA signatures such as breast cancer, melanoma, and laryngeal cancer [16‒19].

There have been studies that demonstrate the association between certain circulating miRNAs and PDAC [8]. This study is the first to create an effective diagnostic model for PDAC using ML for the analysis of a large quantity of blood-based miRNAs and their features.

miRNA sequences, target genes, and related pathways were used for the construction of descriptors for ML models. This is because miRNAs associated with PDAC are theorized to share common features involved in tumorigenesis and cancer development. The following programs and databases were used for ML model development and miRNA analysis: miRTarBase v. 9.0 [20], DIANA-miRPath v. 4.0 [21], miRbase [22], and Waikato Environment for Knowledge Analysis (WEKA) [23]. A flowchart of the methods is shown in Figure 1.

Fig. 1.

Flowchart of the methods. PDAC-associated and random miRNAs were extracted from public sources and paired with their corresponding attributes based on sequences, targeted genes, and involved pathways. These were then loaded in WEKA for analysis using attribute filtering and various ML algorithms. Finally, the model was evaluated using the independent PDAC and other disease datasets.

Fig. 1.

Flowchart of the methods. PDAC-associated and random miRNAs were extracted from public sources and paired with their corresponding attributes based on sequences, targeted genes, and involved pathways. These were then loaded in WEKA for analysis using attribute filtering and various ML algorithms. Finally, the model was evaluated using the independent PDAC and other disease datasets.

Close modal

Selection of miRNAs

Based on an open source search, 69 blood-based miRNAs (serum or plasma) associated with PDAC were identified [8, 10] and used for ML model development. A list of 69 random miRNAs unrelated to PDAC was generated from the miRNA database miRBase as a control [22]. For model validation, 14 additional plasma or serum miRNAs associated with PDAC were extracted from three papers [13, 24, 25]. For testing the ML model specificity, we used miRNA sets linked to the disease unrelated to PC: 21 plasma miRNA sarcopenic obesity biomarkers [26].

Identification of Target Genes

The miRNA target genes were downloaded from miRTarBase v. 9.0 database, an online database of experimentally proven miRNA target genes [20]. A Python script was developed to extract the target genes then assign “yes” or “no” to each miRNA. A “yes” indicates that the miRNA targets the specific gene, while “no” indicates that the miRNA does not target that gene.

Identification of Pathways

Another component of the miRNA descriptors was based on miRNA target pathways downloaded from DIANA-miRPath v. 4.0 database with KEGG pathway analysis [21]. A Python script was developed to assign “yes” to the pathways with significant p value (≤0.05) and “no” if the p value >0.05.

Identification of miRNA Sequences

miRNA sequences were downloaded from the miRBase database [22], and inspired from previous research in our laboratory, a Python program was developed to generate miRNA descriptors based on the composition of sequences [19]. The miRNA descriptors included the number of bases, frequency, mean mass, and hydrogen bonds. If the miRNA has 2, 3, and/or 4 base pair motifs within the entire miRNA sequence, first five base pairs, and the last five base pairs, a “yes” was assigned. Meanwhile, “no” was assigned if the motifs were not found in the miRNA sequence.

Machine-Learning Analysis

The target gene, pathway, and sequence descriptors were combined to create a miRNA descriptor table with 138 miRNAs (69 PDAC + 69 random). In total, there were 12,441 miRNA descriptors. An additional column named “class” was added to label the 69 PDAC-associated miRNAs as “selected” while the 69 random control miRNAs were labeled “random.”

This descriptor table was loaded in WEKA and attribute selection was applied to the data through the InfoGainAttributeEval function and the ranker search method, ranking the attributes by the amount of information gained with respect to the class. After attribute filtering, all descriptors below a certain cutoff were removed.

The resulting attributes were then used to analyze the performance of multiple ML classification algorithms in WEKA software [23]. Performance of the classifiers was compared by measuring the accuracy of recognition of PDAC-related miRNAs. Random Forest algorithm with 10-fold cross-validation had the best classification performance and was therefore being used to build the model with training sets and corresponding testing sets. The validity of the PDAC ML model was further evaluated with a test dataset with the additional 14 blood-based miRNAs associated with PDAC and 21 plasma miRNAs associated with sarcopenic obesity.

Sixty-nine blood-based miRNAs associated with PDAC were selected from open sources [8, 10] and are listed in Table 1. An ML model was built based on the target genes, implicated pathways, and sequence features of the 69 PDAC miRNAs and 69 random miRNAs.

Table 1.

PDAC-related serum or plasma miRNAs that were derived from the review papers [8, 10]

hsa-let-7b-5p hsa-miR-181b-5p hsa-miR-221-3p hsa-miR-486-5p 
hsa-miR-100-5p hsa-miR-181c-5p hsa-miR-222-3p hsa-miR-492 
hsa-miR-106b-5p hsa-miR-181d-5p hsa-miR-223-3p hsa-miR-5100 
hsa-miR-107 hsa-miR-182-5p hsa-miR-22-3p hsa-miR-574-3p 
hsa-miR-10b-5p hsa-miR-18a-5p hsa-miR-22-5p hsa-miR-607 
hsa-miR-1202 hsa-miR-1915-3p hsa-miR-24-3p hsa-miR-628-3p 
hsa-miR-124-3p hsa-miR-192-5p hsa-miR-25-3p hsa-miR-629-5p 
hsa-miR-1246 hsa-miR-193b-3p hsa-miR-30c-5p hsa-miR-642b-3p 
hsa-miR-126-3p hsa-miR-196a-5p hsa-miR-34a-5p hsa-miR-663a 
hsa-miR-1275 hsa-miR-196b-5p hsa-miR-3679-5p hsa-miR-744-5p 
hsa-miR-1290 hsa-miR-19a-3p hsa-miR-373-3p hsa-miR-7-5p 
hsa-miR-130a-3p hsa-miR-19b-3p hsa-miR-378a-3p hsa-miR-885-5p 
hsa-miR-133a-3p hsa-miR-200a-3p hsa-miR-429 hsa-miR-939-5p 
hsa-miR-134-5p hsa-miR-200b-3p hsa-miR-4466 hsa-miR-99a-5p 
hsa-miR-146a-5p hsa-miR-20a-5p hsa-miR-4516 hsa-miR-99b-5p 
hsa-miR-155-5p hsa-miR-210-3p hsa-miR-4687-3p  
hsa-miR-16-5p hsa-miR-212-3p hsa-miR-483-3p  
hsa-miR-181a-5p hsa-miR-21-5p hsa-miR-484  
hsa-let-7b-5p hsa-miR-181b-5p hsa-miR-221-3p hsa-miR-486-5p 
hsa-miR-100-5p hsa-miR-181c-5p hsa-miR-222-3p hsa-miR-492 
hsa-miR-106b-5p hsa-miR-181d-5p hsa-miR-223-3p hsa-miR-5100 
hsa-miR-107 hsa-miR-182-5p hsa-miR-22-3p hsa-miR-574-3p 
hsa-miR-10b-5p hsa-miR-18a-5p hsa-miR-22-5p hsa-miR-607 
hsa-miR-1202 hsa-miR-1915-3p hsa-miR-24-3p hsa-miR-628-3p 
hsa-miR-124-3p hsa-miR-192-5p hsa-miR-25-3p hsa-miR-629-5p 
hsa-miR-1246 hsa-miR-193b-3p hsa-miR-30c-5p hsa-miR-642b-3p 
hsa-miR-126-3p hsa-miR-196a-5p hsa-miR-34a-5p hsa-miR-663a 
hsa-miR-1275 hsa-miR-196b-5p hsa-miR-3679-5p hsa-miR-744-5p 
hsa-miR-1290 hsa-miR-19a-3p hsa-miR-373-3p hsa-miR-7-5p 
hsa-miR-130a-3p hsa-miR-19b-3p hsa-miR-378a-3p hsa-miR-885-5p 
hsa-miR-133a-3p hsa-miR-200a-3p hsa-miR-429 hsa-miR-939-5p 
hsa-miR-134-5p hsa-miR-200b-3p hsa-miR-4466 hsa-miR-99a-5p 
hsa-miR-146a-5p hsa-miR-20a-5p hsa-miR-4516 hsa-miR-99b-5p 
hsa-miR-155-5p hsa-miR-210-3p hsa-miR-4687-3p  
hsa-miR-16-5p hsa-miR-212-3p hsa-miR-483-3p  
hsa-miR-181a-5p hsa-miR-21-5p hsa-miR-484  

Bioinformatics Analysis of PDAC-Associated miRNAs

KEGG database analysis illustrated the PC pathway, with 59 of 69 PDAC miRNAs targeting 67 genes out of 78 genes. Figure 2a is a KEGG fragment plot that displays most of the key genes present in the PC pathway targeted by PDAC miRNAs. The details for each miRNA that targets genes in the PC pathway are illustrated in online supplementary Table S1 (for all online suppl. material, see https://doi.org/10.1159/000540329).

Fig. 2.

Signaling pathways in KEGG analysis with the 69 PDAC-related miRNAs: PC pathway (a) and insulin signaling pathway (b). The target genes are coded by the color asterisks according to the number of targeting miRNAs (purple: 1, blue: 2, and red: 3 or greater).

Fig. 2.

Signaling pathways in KEGG analysis with the 69 PDAC-related miRNAs: PC pathway (a) and insulin signaling pathway (b). The target genes are coded by the color asterisks according to the number of targeting miRNAs (purple: 1, blue: 2, and red: 3 or greater).

Close modal

The insulin signaling pathway promotes PC initiation and progression by inducing tumorigenic inflammation, regulating lipid and glucose metabolic reprogram, overcoming apoptosis, stimulating cancer metastasis, and activating tumor microenvironment formation [27, 28]. As a result, it is a promising target for drug discovery in PC. KEGG pathway analysis revealed that PDAC miRNAs targeted many components of the insulin pathway with 60 of 69 PDAC miRNAs targeting 98 of 153 genes. Figure 2b is a KEGG fragment plot that displays most of the key genes present in the insulin signaling pathway targeted by PDAC miRNAs. The details for each miRNA and genes in the insulin signaling pathway they target are illustrated in online supplementary Table S2.

Performance Comparison of the Different Classifiers

Descriptors generated with target pathways, genes, and sequences were evaluated by employing the InfoGainAttributeEval filter and a ranker search. Using a threshold of 0.05, the number of attributes for the training dataset was reduced from 12,441 to 226 for the final model.

Using WEKA, a PDAC diagnostic model was created, and the performances of several classifiers, including Random Forest, J48, LMT (logistic model tree), SMO (sequential minimal optimization), and SGD (stochastic gradient descent), with 10-fold cross-validation, were compared. Parameters and training/testing set were kept the same. The following statistic metrics were used for evaluation: the accuracy, precision, recall, F-measure, the area under the receiver operating characteristic curve (AUC), and the area under the precision-recall curve (AUC-PR). Recall is also named the true-positive rate, which is the probability of correctly classifying a positive class. F-measure evaluates the performance of an ML model by combining precision and recall. The AUC compares sensitivity versus specificity across a range of values to show the performance of a classifier; AUC is widely used to measure the accuracy of diagnostic tests. An AUC of 70–80% is considered acceptable, 80–90% is considered excellent, and greater than 90% is considered outstanding. AUC-PR demonstrates precision for the corresponding sensitivity (recall) values. Table 2 displays the performance comparison of all the classifiers; Random Forest had the best accuracy at 88.4% and performance with an average AUC of 94.2% and AUC-PR value of 93.3%.

Table 2.

Performance comparison of the different classifiers with 10-fold cross validation

ClassifierAccuracy, %Precision, %Recall, %F-measure, %AUC, %AUC-PR, %
Random Forest 88.4 89.6 88.4 88.3 94.2 93.3 
SMO 84.8 86.1 84.8 84.6 84.8 79.9 
J48 82.6 83.3 82.6 82.5 82.2 78.6 
LMT 79.7 81.4 79.7 79.4 90.2 90.5 
SGD 79.7 81.4 79.7 79.4 79.7 74.2 
ClassifierAccuracy, %Precision, %Recall, %F-measure, %AUC, %AUC-PR, %
Random Forest 88.4 89.6 88.4 88.3 94.2 93.3 
SMO 84.8 86.1 84.8 84.6 84.8 79.9 
J48 82.6 83.3 82.6 82.5 82.2 78.6 
LMT 79.7 81.4 79.7 79.4 90.2 90.5 
SGD 79.7 81.4 79.7 79.4 79.7 74.2 

SMO, sequential minimal optimization; LMT, logistic model tree; SGD, stochastic gradient descent; AUC, area under the receiver operating characteristic curve; AUC-PR, area under the precision-recall curve.

To explore whether the number of attributes had a significant impact on the prediction performance, different threshold levels for InfoGain attribute filtering with 10-fold cross-validation were tested. The best prediction accuracy of each classifier was observed at a different number of attributes (Fig. 3a). Random Forest with a threshold of 0.05 and 226 attributes had the best accuracy compared to the other classifiers. Various folds of cross-validation for different classifiers were also compared and shown in Figure 3b. Among them, Random Forest classifier with 10-fold cross-validation had the best performance and was thus selected for the PDAC ML model validation.

Fig. 3.

Performance of the models generated by selected classifiers: with various thresholds and attributes using 10-fold cross-validation (a) and with various folds of cross-validation using 0.05 threshold (b).

Fig. 3.

Performance of the models generated by selected classifiers: with various thresholds and attributes using 10-fold cross-validation (a) and with various folds of cross-validation using 0.05 threshold (b).

Close modal

Testing of Independent Datasets on Developed Model

Fourteen serum/plasma miRNAs reported as PDAC biomarkers [13, 24, 25] (listed in online suppl. Table S3) were used to test our model. Four (hsa-miR-1290, hsa-miR-155-5p, hsa-miR-192-5p, and hsa-miR-21-5p) overlapped with the training model dataset set. For these 14 miRNAs, we generated a dataset with the same descriptors (226 attributes) as the training set. When inputting these 14 miRNAs in the established ML model, all 14 miRNAs were classified as associated with PDAC with 100% prediction accuracy.

A separate miRNA dataset of an unrelated disease was also tested with the model. From an open source search, 21 plasma miRNAs associated with sarcopenic obesity were obtained (listed in online suppl. Table S4) [26]. For these 21 miRNAs, we generated the dataset with the same descriptors (226 attributes) as the training set and input it into our model. With an accuracy of 52.4%, this low classification accuracy of the sarcopenic obesity’s data supports that the generated PDAC model can differentiate between PDAC and other diseases.

PDAC is an extremely aggressive malignancy with the highest mortality rate of all major cancers. Lack of early symptoms and rapid progression make PDAC patients most diagnosed at advanced stages. Therefore, accurate diagnosis is vital for improving the survival of PDAC. For better diagnostic methods, miRNAs and ML are increasingly being used to identify biomarkers. In recent years, circulating miRNAs have become promising biomarkers due to the stability in the blood, ease of non-invasive detection, and convenient screening methods [29]. However, variations of the PDAC biomarkers of circulating miRNA were reported by different studies. Thus, it is critical to use a significant number of circulating miRNAs associated with PDAC to build a diagnostic model. ML, a useful tool for analysis of large and complex data sets, has enabled us to evaluate powerful diagnostic biomarkers and models for PDAC. In this study, we collected blood-based miRNA biomarkers of PDAC reported in the recent two decades, used comprehensive data mining according to miRNA profiling, and developed an ML system for PDAC diagnosis. The best classifier was Random Forest with a 10-fold cross-validation accuracy of 88.4%. Model performance evaluated by using an independent PC dataset and sarcopenic obesity dataset had accuracies of 100% and 52.4% respectively, demonstrating that this model was accurately trained for specifically recognizing PDAC-related miRNAs.

The 69 PDAC-related miRNAs collected in this study targeted many genes implicated in several important PC and insulin signaling pathways such as PI3K-Akt, MAPK, Jak-STAT, p53, and TGFβ (Fig. 2a, b). In the KEGG PC pathway, the top miRNAs were hsa-miR-34a-5p and hsa-miR-16-5p that targeted 21 and 17 genes, respectively; hsa-miR-155-5p and hsa-miR-21-5p targeted 16 genes (online suppl. Table S1). Tang et al. [30] demonstrated that hsa-miR-34a inhibited PC growth by downregulating Snail1 and Notch1. Furthermore, Hidalgo-Sastre et al. [31] discovered that hsa-miR-34a suppressed PDAC development by modulating the immune microenvironment. Basu et al. [32] found that hsa-miR-16 acts as a tumor suppressor by attenuating PDAC cell growth through down-regulation of the anti-apoptotic gene BCL2. hsa-miR-21 was discovered to be overexpressed in PDAC patients and was further associated with poor prognosis; it regulated the stemness of PDAC cells and thus played roles in mesenchymal transition [33]. Meanwhile, a study by Ma et al. [34] demonstrated that hsa-miR-21 promotes PDAC cell migration and mesenchymal transition via activating Ras/Erk pathway.

hsa-miR-16-5p is listed as the most important miRNA in the insulin signaling pathway, targeting 26 genes. hsa-let-7b-5p and hsa-miR-155-5p were second most involved with both targeting 18 genes (online suppl. Table S2). The hsa-let-7 family was dysregulated in PDAC patients, which may result in promoting insulin receptor/insulin-like growth factor (IGF) signaling pathways that are key for PDAC development and progression [35]. A study by Wang and Gao [36] revealed that hsa-miR-155-5p activates the Akt/NF-κB pathway that promotes immune evasion and PC cell proliferation. hsa-miR155 is associated with the pathogenesis of both type 2 diabetes and PC, with diabetes being a known risk factor for PC [37]. Altogether, these top-ranked miRNAs play an important role in PC development and could be potential targets for PC treatment.

In this study, we developed an ML diagnostics model of PDAC using blood-based miRNA biomarkers and their features including the sequence, targeted genes, and pathways. Our model has the potential to classify whether a given miRNA is associated with PDAC based on its associated features. Future research should include the integration of more advanced machine learning algorithms and the inclusion of additional types of biomarkers, such as CA19-9 and CEA (carcinoembryonic antigen) [8], to enhance diagnostic power.

Despite the promising results, our study has several limitations. First, most miRNA biomarkers used in our model were identified among the patients with mixed stages of PDAC. Although our model can potentially determine whether a patient’s miRNA panel is associated with PDAC, it cannot determine whether the panel is associated with early or advanced stage of PDAC. Second, the sample size of miRNA datasets, though comprehensive, may still not capture the full spectrum of miRNA variability in PDAC before clinical use. Some newly discovered circulating miRNAs associated with PDAC were not included in the miRTarBase or DIANA-miRPath databases and were thus not included in our study. Third, although the miRNA biomarkers used for our model were derived from published studies and experimentally evaluated, the specificity of these miRNA biomarkers also needs further validation in larger, independent cohorts with standardized methodology. Additionally, the potential for false positives and false negatives in the current AI analyses must be acknowledged.

Future work will focus on evaluating the model’s performance, specifically in high-risk populations and in early-stage detection. High risk factors for PDAC include age, family history, diabetes, obesity, chronic pancreatitis, and gene mutations of BRCA1, BRCA2, or ATM [38, 39]. For example, age is an important risk factor for the development of PC as people aged 65–74 are most frequently diagnosed: 68% of patients were older than 65 [39]. In future studies adding these high-risk factors in the ML model may help improve diagnostic accuracy. Blackford and colleagues reported the 5-year overall survival for stage 1A PDAC is 83.7% in 2012, and 74.3% for stage 1B, decreasing to 13.3% for stage IIA, 15.5% for stage IIB, 3.2% for stage III and 2.8% for stage IV [40]. Hence, it is critical to have a sensitive and non-invasive method to diagnosis early-stage PDAC. Building ML diagnostic models using miRNA profile and other biomarkers associated with different PDAC stages, especially stage I and II, as well as applying these models to individuals with one or more high risk factors provides a possible strategy for PDAC screening are our two next goals.

Additionally, we would like to use AI to address the role of miRNAs in different stages of PDAC, chemoresistance and therapy response to identify prognostic biomarkers for predicting survival, surgical outcome monitoring, and effective drugs selection. Based on our knowledge, this is the first AI system created for potential diagnosis of PDAC using blood-based miRNAs as biomarkers. Future work will enhance its performance of PDAC diagnosis, especially for early-stage detection and for screening in high-risk population. However, before the diagnostic system can be used in a clinical setting, a more comprehensive validation in larger cohort’s studies is needed.

An ethics statement was not required for this study type, no human or animal subjects or materials were used.

The authors declare that they have no competing interests.

The authors have not received any financial support.

J.Y.T. proposed the project, designed the study, collected the data, developed a machine-learning model, and wrote the manuscript; I.F.T., S.K., and V.L.K. proposed the study concept, methodology, supervised the project, and edited the manuscript. All authors approved the final manuscript.

The Python code used for this study is available upon request. Further inquiries can be directed to the corresponding author.

1.
Siegel
RL
,
Giaquinto
AN
,
Jemal
A
.
Cancer statistics, 2024
.
CA Cancer J Clin
.
2024
;
74
(
1
):
12
49
.
2.
Sarantis
P
,
Koustas
E
,
Papadimitropoulou
A
,
Papavassiliou
AG
,
Karamouzis
MV
.
Pancreatic ductal adenocarcinoma: treatment hurdles, tumor microenvironment and immunotherapy
.
World J Gastrointest Oncol
.
2020
;
12
(
2
):
173
81
.
3.
Gao
Z
,
Jiang
W
,
Zhang
S
,
Li
P
.
The state of the art on blood MicroRNAs in pancreatic ductal adenocarcinoma
.
Anal Cell Pathol
.
2019
;
2019
:
9419072
.
4.
Nienhüser
H
,
Büchler
MW
,
Schneider
M
.
Resection of recurrent pancreatic cancer: who can benefit
.
Visc Med
.
2022
;
38
(
1
):
42
8
.
5.
Klein-Brill
A
,
Amar-Farkash
S
,
Lawrence
G
,
Collisson
EA
,
Aran
D
.
Comparison of FOLFIRINOX vs gemcitabine plus nab-paclitaxel as first-line chemotherapy for metastatic pancreatic ductal adenocarcinoma
.
JAMA Netw Open
.
2022
;
5
(
6
):
e2216199
.
6.
Wei
L
,
Yao
K
,
Gan
S
,
Suo
Z
.
Clinical utilization of serum- or plasma-based miRNAs as early detection biomarkers for pancreatic cancer: a meta-analysis up to now
.
Medicine
.
2018
;
97
(
35
):
e12132
.
7.
Ho
PTB
,
Clark
IM
,
Le
LTT
.
MicroRNA-based diagnosis and therapy
.
Int J Mol Sci
.
2022
;
23
(
13
):
7167
.
8.
Wnuk
J
,
Strzelczyk
JK
,
Gisterek
I
.
Clinical value of circulating miRNA in diagnosis, prognosis, screening and monitoring therapy of pancreatic ductal adenocarcinoma: a review of the literature
.
Int J Mol Sci
.
2023
;
24
(
6
):
5113
.
9.
Eid
M
,
Karousi
P
,
Kunovský
L
,
Tuček
Š
,
Brančíková
D
,
Kala
Z
, et al
.
The role of circulating microRNAs in patients with early-stage pancreatic adenocarcinoma
.
Biomedicines
.
2021
;
9
(
10
):
1468
.
10.
Xue
J
,
Jia
E
,
Ren
N
,
Lindsay
A
,
Yu
H
.
Circulating microRNAs as promising diagnostic biomarkers for pancreatic cancer: a systematic review
.
Onco Targets Ther
.
2019
;
12
:
6665
84
.
11.
Zou
X
,
Wei
J
,
Huang
Z
,
Zhou
X
,
Lu
Z
,
Zhu
W
, et al
.
Identification of a six-miRNA panel in serum benefiting pancreatic cancer diagnosis
.
Cancer Med
.
2019
;
8
(
6
):
2810
22
.
12.
Dittmar
RL
,
Liu
S
,
Tai
MC
,
Rajapakshe
K
,
Huang
Y
,
Longton
G
, et al
.
Plasma miRNA biomarkers in limited volume samples for detection of early-stage pancreatic cancer
.
Cancer Prev Res
.
2021
;
14
(
7
):
729
40
.
13.
Khan
IA
,
Rashid
S
,
Singh
N
,
Rashid
S
,
Singh
V
,
Gunjan
D
, et al
.
Panel of serum miRNAs as potential non-invasive biomarkers for pancreatic ductal adenocarcinoma
.
Sci Rep
.
2021
;
11
(
1
):
2824
.
14.
Huang
L
,
Zhang
L
,
Chen
X
.
Updated review of advances in microRNAs and complex diseases: experimental results, databases, webservers and data fusion
.
Brief Bioinform
.
2022
;
23
(
6
):
bbac397
.
15.
Luo
Y
,
Peng
L
,
Shan
W
,
Sun
M
,
Luo
L
,
Liang
W
.
Machine learning in the development of targeting microRNAs in human disease
.
Front Genet
.
2022
;
13
:
1088189
.
16.
Ling
L
,
Aldoghachi
AF
,
Chong
ZX
,
Ho
WY
,
Yeap
SK
,
Chin
RJ
, et al
.
Addressing the clinical feasibility of adopting circulating miRNA for breast cancer detection, monitoring and management with artificial intelligence and machine learning platforms
.
Int J Mol Sci
.
2022
;
23
(
23
):
15382
.
17.
Korfiati
A
,
Grafanaki
K
,
Kyriakopoulos
GC
,
Skeparnias
I
,
Georgiou
S
,
Sakellaropoulos
G
, et al
.
Revisiting miRNA association with melanoma recurrence and metastasis from a machine learning point of view
.
Int J Mol Sci
.
2022
;
23
(
3
):
1299
.
18.
Arora
A
,
Tsigelny
IF
,
Kouznetsova
VL
.
Laryngeal cancer diagnosis via miRNA-based decision tree model
.
Eur Arch Otorhinolaryngol
.
2024
;
281
(
3
):
1391
9
.
19.
Kang
W
,
Kouznetsova
VL
,
Tsigelny
IF
.
miRNA in machine-learning-based diagnostics of cancers
.
Cancer Screen Prev
.
2022
;
1
(
1
):
32
8
.
20.
Huang
HY
,
Lin
YC
,
Cui
S
,
Huang
Y
,
Tang
Y
,
Xu
J
, et al
.
miRTarBase update 2022: an informative resource for experimentally validated miRNA–target interactions
.
Nucleic Acids Res
.
2022
;
50
(
D1
):
D222
30
.
21.
Tastsoglou
S
,
Skoufos
G
,
Miliotis
M
,
Karagkouni
D
,
Koutsoukos
I
,
Karavangeli
A
, et al
.
DIANA-miRPath v4.0: expanding target-based miRNA functional analysis in cell-type and tissue contexts
.
Nucleic Acids Res
.
2023
;
51
(
W1
):
W154
9
.
22.
Kozomara
A
,
Birgaoanu
M
,
Griffiths-Jones
S
.
miRBase: from microRNA sequences to function
.
Nucleic Acids Res
.
2019
;
47
(
D1
):
D155
62
.
23.
Frank
E
,
Hall
MA
,
Witten
IH
.
Appendix B: the WEKA workbench
. In:
Witten
IH
,
Frank
E
,
Hall
MA
,
Pal
CJ
, editors.
Data mining: practical machine learning tools and techniques
. 4th ed.
Cambridge, Mass, USA
:
Morgan Kaufmann
;
2016
. p.
553
72
.
24.
Karasek
P
,
Gablo
N
,
Hlavsa
J
,
Kiss
I
,
Vychytilova-Faltejskova
P
,
Hermanova
M
, et al
.
Pre-operative plasma miR-21-5p is a sensitive biomarker and independent prognostic factor in patients with pancreatic ductal adenocarcinoma undergoing surgical resection
.
Cancer Genom Proteom
.
2018
;
15
(
4
):
321
7
.
25.
Vila-Navarro
E
,
Duran-Sanchon
S
,
Vila-Casadesús
M
,
Moreira
L
,
Ginès
À
,
Cuatrecasas
M
, et al
.
Novel circulating miRNA signatures for early detection of pancreatic neoplasia
.
Clin Transl Gastroenterol
.
2019
;
10
(
4
):
e00029
.
26.
Dowling
L
,
Duseja
A
,
Vilaca
T
,
Walsh
JS
,
Goljanek-Whysall
K
.
MicroRNAs in obesity, sarcopenia, and commonalities for sarcopenic obesity: a systematic review
.
J Cachexia Sarcopenia Muscle
.
2022
;
13
(
1
):
68
85
.
27.
Mutgan
AC
,
Besikcioglu
HE
,
Wang
S
,
Friess
H
,
Ceyhan
GO
,
Demir
IE
.
Insulin/IGF-driven cancer cell-stroma crosstalk as a novel therapeutic target in pancreatic cancer
.
Mol Cancer
.
2018
;
17
(
1
):
66
.
28.
Deng
J
,
Guo
Y
,
Du
J
,
Gu
J
,
Kong
L
,
Tao
B
, et al
.
The intricate crosstalk between insulin and pancreatic ductal adenocarcinoma: a review from clinical to molecular
.
Front Cell Dev Biol
.
2022
;
10
:
844028
.
29.
Khan
IA
,
Saraya
A
.
Circulating MicroRNAs as noninvasive diagnostic and prognostic biomarkers in pancreatic cancer: a review
.
J Gastrointest Cancer
.
2023
;
54
(
3
):
720
30
.
30.
Tang
Y
,
Tang
Y
,
Cheng
YS
.
miR-34a inhibits pancreatic cancer progression through Snail1-mediated epithelial-mesenchymal transition and the Notch signaling pathway
.
Sci Rep
.
2017
;
7
:
38232
.
31.
Hidalgo-Sastre
A
,
Lubeseder-Martellato
C
,
Engleitner
T
,
Steiger
K
,
Zhong
S
,
Desztics
J
, et al
.
mir34a constrains pancreatic carcinogenesis
.
Sci Rep
.
2020
;
10
(
1
):
9654
.
32.
Basu
A
,
Jiang
X
,
Negrini
M
,
Haldar
S
.
MicroRNA-mediated regulation of pancreatic cancer cell proliferation
.
Oncol Lett
.
2010
;
1
(
3
):
565
8
.
33.
Mortoglou
M
,
Miralles
F
,
Arisan
ED
,
Dart
A
,
Jurcevic
S
,
Lange
S
, et al
.
microRNA-21 regulates stemness in pancreatic ductal adenocarcinoma cells
.
Int J Mol Sci
.
2022
;
23
(
3
):
1275
.
34.
Ma
Q
,
Wu
H
,
Xiao
Y
,
Liang
Z
,
Liu
T
.
Upregulation of exosomal microRNA-21 in pancreatic stellate cells promotes pancreatic cancer cell migration and enhances Ras/ERK pathway activity
.
Int J Oncol
.
2020
;
56
(
4
):
1025
33
.
35.
Nweke
EE
,
Brand
M
.
Downregulation of the let-7 family of microRNAs may promote insulin receptor/insulin-like growth factor signalling pathways in pancreatic ductal adenocarcinoma
.
Oncol Lett
.
2020
;
20
(
3
):
2613
20
.
36.
Wang
S
,
Gao
Y
.
Pancreatic cancer cell-derived microRNA-155-5p-containing extracellular vesicles promote immune evasion by triggering EHF-dependent activation of Akt/NF-κB signaling pathway
.
Int Immunopharmacol
.
2021
;
100
:
107990
.
37.
Kozłowska
M
,
Śliwińska
A
.
The link between diabetes, pancreatic tumors, and miRNAs: new players for diagnosis and therapy
.
Int J Mol Sci
.
2023
;
24
(
12
):
10252
.
38.
Park
W
,
Chawla
A
,
O’Reilly
EM
.
Pancreatic cancer: a review
.
JAMA
.
2021
;
326
(
9
):
851
62
.
39.
Wang
C
,
Cai
H
,
Cai
Q
,
Wu
J
,
Stolzenberg-Solomon
R
,
Guo
X
, et al
.
Circulating microRNAs in association with pancreatic cancer risk within 5 years
.
Int J Cancer
.
2024
;
155
(
3
):
519
31
.
40.
Blackford
AL
,
Canto
MI
,
Klein
AP
,
Hruban
RH
,
Goggins
M
.
Recent trends in the incidence and survival of stage 1A pancreatic cancer: a surveillance, epidemiology, and end results analysis
.
J Natl Cancer Inst
.
2020
;
112
(
11
):
1162
9
.