Introduction: Lymph node metastasis is one of the most common ways of tumour metastasis. The presence or absence of lymph node involvement influences the cancer’s stage, therapy, and prognosis. The integration of artificial intelligence systems in the histopathological diagnosis of lymph nodes after surgery is urgent. Methods: Here, we propose a pan-origin lymph node cancer metastasis detection system. The system is trained by over 700 whole-slide images (WSIs) and is composed of two deep learning models to locate the lymph nodes and detect cancers. Results: It achieved an area under the receiver operating characteristic curve (AUC) of 0.958, with a 95.2% sensitivity and 72.2% specificity, on 1,402 WSIs from 49 organs at the National Cancer Center, China. Moreover, we demonstrated that the system could perform robustly with 1,051 WSIs from 52 organs from another medical centre, with an AUC of 0.925. Conclusion: Our research represents a step forward in a pan-origin lymph node metastasis detection system, providing accurate pathological guidance by reducing the probability of missed diagnosis in routine clinical practice.

Lymph nodes (LNs), the essence of the immune system, are secondary lymphoid organs distributed throughout the human body [1]. Cancer in LNs originates either from LNs or more often from the metastasis of primary organs [2]. LNs are generally the first site of metastasis for a variety of cancers and are critical for tumour staging and prognosis [3]. The degree of tumour metastasis is considered to be a strong predictor of recurrence and survival [4‒7].

The histological diagnosis of LN involves accurate detection of malignant lesions metastasized from various organs, imposing considerable diagnostic pressure on pathologists in routine clinical practice [8]. Due to the morphologic similarity between metastatic cells and normal lymphocytes, it is tedious to locate the tumour cells in LNs, especially for small tumour foci. Moreover, as required by the tumour, node, and metastasis staging system [9], pathologists are supposed to report the numbers of cancerous LNs at different locations, which currently depends on human labour. Therefore, it is urgent to build an artificial intelligence (AI) system to assist in the LN diagnosis process.

Recently, the rapid growth of deep learning [10] techniques in digital pathology has motivated the development of various systems focusing on histopathological diagnosis for primary organs, including the lung [11], stomach [12, 13], colorectum [14], prostate [15‒20], etc [21]. Meanwhile, for LN metastasis, cancer detection models have been built for the breast [8, 22, 23], lung [24], stomach [25], colorectum [26, 27], and oesophagus [28]. Although studies have proven the excellent performance and potential benefits of deep learning, models designed for single organs would not be generally applicable in practical settings since cancerous cells could originate from any organ. To address clinical needs, one could either build and combine models for every human organ or create one pan-origin model.

In this research, we proved the possibility of building the deep learning model for pan-origin LN metastasis cancer detection and established an intelligent system for the histological diagnosis of LNs, as illustrated in Figure 1a. The deep learning model was trained with 745 (645 malignancies) pixel-level annotated haematoxylin and eosin (H&E)-stained whole-slide images (WSIs) from the Cancer Hospital, Chinese Academy of Medical Sciences (CHCAMS), and CAMELYON16 [22]. The training dataset was composed of LN WSIs from seven organs (see Fig. 1b). After circling the precise malignant areas and LN regions using the self-developed iPad-based annotation system, deep learning models for cancer and LN detection were built. The cancer detection model achieved an area under the receiver operating characteristic (ROC) curve (AUC) of 0.958 on 1,402 WSIs from 49 organs. The robustness of the system was further confirmed by a multicentre test with 1,051 WSIs from 52 organs collected at an additional medical centre.

Fig. 1.

Framework of this research. a An illustration of pan-origin LN cancer metastasis detection pipeline. b Data and workflow for analysis of LN cancer metastasis with deep learning.

Fig. 1.

Framework of this research. a An illustration of pan-origin LN cancer metastasis detection pipeline. b Data and workflow for analysis of LN cancer metastasis with deep learning.

Close modal

Ethical Approval

The study was approved by the National Cancer Center/CHCAMS and PLA General Hospital (PLAGH) Ethics Committees. Since the reports were anonymized, the Institutional Review Boards waived the informed consents. The data used in this research are part of standard-of-care hospital routine.

Datasets

The CHCAMS dataset was composed of three components: (1) training set comprises 605 WSIs (520 malignancies) from January 2017 to January 2019; (2) validation set contains 610 WSIs (341 malignancies) from April 2017 to April 2020 for model hyperparameter tuning; (3) test set contains 1,402 WSIs (768 malignancies) from April 2017 to April 2020 for model performance evaluation. Within the three datasets, each formalin-fixed paraffin-embedded block corresponded to one slide. All the slides contained only LNs. These datasets, comprising training, validation, and test sets, were meticulously selected to avoid any patient overlap, ensuring that the model’s performance is not biased by repeated patient data. The detailed distribution was given in online supplementary Tables S1–S5 and Figures S1–S3 (for all online suppl. material, see https://doi.org/10.1159/000539010). In addition, the training dataset includes 140 WSIs from CAMELYON16 (Nos. 1 to 34, 36 to 58, and 61 to 110 in the training repository with “Tumor” prefix, and Nos. 1, 9, 12 to 14, 18, 22, 27 to 32, 40, 41, 43, 44, 47 to 49, 52, 54, 57, 58, 68, 76, 79, 94, 97, 113, 115, 116, and 124 in the test repository, a total of 125 malignancies). Therefore, the final training dataset contains 745 WSIs (645 malignant ones).

There are 1,051 WSIs (690 malignant tumours) from January 2017 to March 2019 at PLAGH in the multicentre dataset. Refer to online supplementary Tables S1 and S2, and Figure S4 for a comprehensive description and distribution of data.

All the WSIs were obtained using Ventana DP200, Motic EasyScan 102, KF-Bio KF-PRO-005, UNIC PRECICE 610, and Leica Aperio AT2. The detailed distribution was given in online supplementary Figure S5.

Data Labelling

Using an iPad-based annotation system, pathologists from CHCAMS annotated 520 training and 341 validation malignant WSIs at the pixel level; i.e., all the malignant regions were labelled. The labels included “malignant” and “benign.” We also introduced “ignore” and “low-quality” labels to improve the annotation flexibility. Pixels with the above two labels were ignored during training. The malignant WSIs in CAMELYON16 were also reviewed and annotated/revised. Meanwhile, for all the datasets, the LN regions of the WSIs were annotated. We accessed the WSIs using ThoSlide 2.4.0, a proprietary library.

During the annotating phase, a pathologist was first allocated a slide at random. The completed slide and notes were then forwarded to a second pathologist for evaluation. In the third and final phase, a senior pathologist would examine all the slides that had cleared the previous two rounds.

Training Data

The preprocessing process was identical to our previous work [12]. We obtained curves with no precise stroke orders as annotations. During the data preparation phase, we picked closed curves and filled in their enclosing regions to create pixel-level labels. In the case of stacked curves, the outermost curves were initially filled. The Otsu technique was performed on each WSI’s thumbnail to retrieve the tissue coordinates in the foreground.

The coordinates were then rescaled to their initial zoom level in order to generate WSI-level coordinates. We retrieved training tiles exclusively from tissue-covering coordinates. During training, the WSIs were divided into 320 by 320 pixel tiles under scales of 20× for cancer detection and 5× for LN region recognition. With pixel-level training tiles, we collected 2,967,470 (malignant: 544,606; benign: 2,422,864) and 215,774 (LN: 35,709; background: 180,065) training samples for cancer and LN detection, respectively. For LN region recognition, we resized all the WSI thumbnails to 640 × 640 pixels; thus, the training set size was equivalent to the slide number, i.e., a total of 745.

Model Development

We constructed our deep learning model for cancer detection (including individual organ model) and LN region recognition using DeepLab v3 and the ResNet-50 architecture as its backbone (see online suppl. Fig. S6). As illustrated in online supplementary Figure S7, we optimized the atrous spatial pyramid pooling module to fit into the scenario for pathological diagnosis by changing the atrous rates of the 3 × 3 convolution layers from 6, 12, 18 to 2, 4, 6 and removing the image pooling information. These two deep learning models performed the segmentation task by classifying at the pixel level. Therefore, they output the pixel-level cancer and LN regions. For LN location, we leveraged YOLO v5 with default configurations. This is an object detection task outputting bounding boxes of the LN regions.

All models were developed in TensorFlow using the Momentum optimizer and trained on an Ubuntu server with four Nvidia GTX 1080Ti GPUs using data parallel. For semantic segmentation models, the batch size was 128, namely, 32 samples on one GPU per iteration. The initial learning rate was set as 1 × 10−3 and decayed by 0.5 every 20,000 (2,000 for LN region recognition) iterations. The models for cancer and LN detection were trained for 110,000 and 9,500 iterations. The LN region recognition model was trained on one GPU with a batch size of 10 for 8,000 iterations. The initial learning rate was set as 0.01 and by 0.5 every 2,000 iterations.

Meanwhile, we had built five cancer detection models trained with LN metastasis from five organs, including lung, thyroid, stomach, intestine, breast, oesophagus, with identical training setups, to compare the performance against the pan-origin model. While deep learning models inherently possess the capacity to learn rather than simply memorize extensive training data, the possibility of overfitting during training was mitigated through data augmentation. Our approach involved randomness in patch brightness, contrast, hue (average colour), and saturation, with a maximum delta of 0.08. This data augmentation strategy significantly bolstered the model’s generalizability across WSIs sourced from diverse medical centres.

The slide-level predictions for all models were obtained by averaging the top 1,000 pixel-level probabilities. One advantage of using the fully convolutional neural network architecture is that there is no need for identical tile sizes during both training and inference. To preserve more environment information during the inference phase, we employed bigger tiles of 2,000 × 2,000 pixels and a 10% overlap ratio, sending 2,200 × 2,200 pixel tiles into the network while only utilizing the 2,000 × 2,000 pixel core region for the final prediction. The LN-level predictions were derived by masking the annotated LN area into the predicted heatmap, with the top 1,000 pixel-level probabilities as the LN-level predicted probability.

The LN count result was derived by post-processing. For LN region recognition predictions, we first applied dilation to the predicted heatmap to fit the small gaps between predicted tissue. Then, we group regions with connected pixels as predicted LNs. By masking LN region predictions, the LN count was obtained. In online supplementary Tables S6 and S7, we gave the detailed WSI processing time for LN analysis on high and low configuration hardware.

Evaluation Metrics

We used slide-level AUC, accuracy, sensitivity, specificity to measure model performance, and accuracy, sensitivity, specificity when comparing with human pathologists. These metrics were defined as follows:
where NTP, NTN, NFP, and NFN represented the number of true-positive, true-negative, false-positive, and false-negative slides, respectively. In cases where a WSI contains multiple LNs, our model processes each node separately, and the LN-level AUC reported in this study is the average AUCs across all the annotated LNs in the test set.

Plots and Charts

All the statistics in our research were obtained by in-house Python scripts. The model performance was revealed with both the ROC curves, with 1-specificity as the x-axis and sensitivity as the y-axis, pie charts, and bar charts using the matplotlib package in Python. The sun charts for data distribution were obtained using Apache ECharts. The generalized confusion matrix was drawn manually.

Cancer Detection

On 1,402 test slides from CHCAMS, the deep learning model achieved a slide/LN-level AUCs of 0.958/0.957 (sensitivity: 0.952/0.909; specificity: 0.722/0.854; accuracy: 0.848/0.869), as shown in Figure 2a. As shown in Figure 2b, the model had also revealed consistent performance on different body parts.

Fig. 2.

Model performance. a ROC curve on the slide level. b ROC curve on the LN level. c The slide-level AUC, sensitivity, specificity, and accuracy on different body parts. d The LN-level AUC, sensitivity, specificity, and accuracy on different body parts.

Fig. 2.

Model performance. a ROC curve on the slide level. b ROC curve on the LN level. c The slide-level AUC, sensitivity, specificity, and accuracy on different body parts. d The LN-level AUC, sensitivity, specificity, and accuracy on different body parts.

Close modal

The test set contained both carcinomas and other malignant tumours. The slide/LN-level AUCs of the model in identifying various tumours were as follows: squamous cell carcinoma (0.981/0.976), adenocarcinoma (0.953/0.956), neuroendocrine neoplasm (0.806/0.817), sarcomatoid carcinoma (0.828/0.959), urothelial carcinoma (0.864/0.999), sarcoma (0.967/0.963), and melanoma (0.767/0.906). As illustrated by the heatmaps in Figure 3a and b, the model performed well in identifying squamous cell carcinoma and adenocarcinoma. Figure 3a shows examples of metastatic classical squamous cell carcinomas of the skin and tongue with tumour keratinization. The model diagnosed the WSIs as positive based on viable tumour cells rather than keratinocytes. Examples of gastric adenocarcinoma with moderate differentiation and high-grade serous ovarian carcinoma are shown in Figure 3b(i) and (ii). The model focused on both tumour cells and structures. Moreover, the model also detected other malignant tumours. Figure 3c(i), (ii), and (iii) show three correctly classified WSIs containing large cell neuroendocrine carcinoma of the oesophagogastric junction, epithelioid sarcoma of the hip joint, and melanoma of the skin, respectively. More predictions are given in online supplementary Figures S8–S10.

Fig. 3.

Model predictions for different malignant tumour subtypes. a Squamous cell carcinoma: (i) skin and (ii) tongue. b Adenocarcinoma: (i) moderately differentiated gastric adenocarcinoma; (ii) high-grade serous carcinoma of the ovary. c Other malignant tumour: (i) large cell neuroendocrine carcinoma of the oesophagogastric junction; (ii) epithelioid sarcoma of the hip joint; (iii) melanoma of the skin.

Fig. 3.

Model predictions for different malignant tumour subtypes. a Squamous cell carcinoma: (i) skin and (ii) tongue. b Adenocarcinoma: (i) moderately differentiated gastric adenocarcinoma; (ii) high-grade serous carcinoma of the ovary. c Other malignant tumour: (i) large cell neuroendocrine carcinoma of the oesophagogastric junction; (ii) epithelioid sarcoma of the hip joint; (iii) melanoma of the skin.

Close modal

We are also interested in the model performance in detecting small metastases (with 24 WSIs/63 LNs in the test set), including micro-metastases (0.2–2 mm) and isolated tumour cells (less than 0.2 mm). The model revealed good performance in identifying small metastases, with a LN-level AUC of 0.995 (sensitivity: 0.952; specificity: 0.977; accuracy: 0.969). Examples are shown in Figure 4 (more examples are given in online suppl. Fig. S11).

Fig. 4.

Small metastasis detection. a High-grade serous carcinoma of the ovary. b Invasive micropapillary carcinoma of the breast. c Rectal adenocarcinoma.

Fig. 4.

Small metastasis detection. a High-grade serous carcinoma of the ovary. b Invasive micropapillary carcinoma of the breast. c Rectal adenocarcinoma.

Close modal

Neoadjuvant Therapy

A more difficult situation for the deep learning model is patients treated with preoperative chemotherapy or concurrent chemoradiotherapy (with 153 WSIs in the test set). The model achieved a slide/LN-level AUCs of 0.922/0.941 (sensitivity: 0.905/0.886; specificity: 0.565/0.791; accuracy: 0.752/0.816) in this situation. Online supplementary Figure S12 shows an example of the successful identification of tumour cells within the background of neoadjuvant therapy treatment effects.

Pan-Origin versus Single-Organ Models

By comparing the pan-origin model with six single-organ models (the larynx model was excluded due to the limited number of training samples), we observed a significant performance improvement, with slide-level AUCs of 0.958 versus 0.990 for lung, 0.990 for thyroid, 0.990 for stomach, 0.990 for intestine, 0.866 for breast, and 0.990 for oesophagus. We further divided the model performance into body parts, as shown in Figure 5a.

Fig. 5.

Generalizability of the deep learning model. a Pan-origin versus single-organ model performance on different body parts. b Slide-level ROC curve for test set from PLAGH. c LN-level ROC curve for test set from PLAGH.

Fig. 5.

Generalizability of the deep learning model. a Pan-origin versus single-organ model performance on different body parts. b Slide-level ROC curve for test set from PLAGH. c LN-level ROC curve for test set from PLAGH.

Close modal

External Test

To evaluate the reliability of the deep learning model at other hospitals, we obtained a multicentre dataset consisting of 1,051 WSIs from the Chinese PLAGH in order to determine if the model can generalize with the variations caused by various laboratories. As shown in Figure 5b and c, the slide and LN-level AUCs for the PLAGH data were 0.925 and 0.913, respectively, confirming consistent performance.

False Analysis

It is interesting to analyse the failure cases, as given in Figure 6. In general, the false-positive cases (Fig. 6a) can be divided into two main groups, i.e., germinal centre and squeezed lymphocytes, while the false negatives (Fig. 6b) were mainly due to complex tumour circumstances.

Fig. 6.

Examples of failure cases. a False-positive predictions: (i) lymphoblast cells appearing in the germinal centre of LNs, (ii) squeezed lymphocytes, (iii) carbon deposition in lung LNs, (iv) tissue calcification, (v) blood lakes of LNs. b False-negative examples: (i) follicular component of papillary thyroid carcinoma, (ii) malignant melanoma rich in melanin, (iii) poorly differentiated carcinoma, (iv) poor scanning quality.

Fig. 6.

Examples of failure cases. a False-positive predictions: (i) lymphoblast cells appearing in the germinal centre of LNs, (ii) squeezed lymphocytes, (iii) carbon deposition in lung LNs, (iv) tissue calcification, (v) blood lakes of LNs. b False-negative examples: (i) follicular component of papillary thyroid carcinoma, (ii) malignant melanoma rich in melanin, (iii) poorly differentiated carcinoma, (iv) poor scanning quality.

Close modal

Diagnostic Result

In order to evaluate the ability of the model to detect LNs and provide accurate diagnostic results, we utilized a generalized confusion matrix as illustrated in Figure 7. The confusion matrix was divided into outer and inner matrices. The outer matrix assessed the performance of the LN region recognition model and categorized the LN count results into six categories: 0 to 5, and greater than 5. Due to the inherent subjectivity involved, the accuracy of the LN region recognition was found to be 0.721. Once the LN region recognition model was successful, the corresponding inner confusion matrices evaluated the performance of the LN detection model. Diagnostic accuracies were calculated for WSIs containing one to five LNs, with results of 0.883, 0.668, 0.583, 0.664, and 0.534, respectively.

Fig. 7.

Generalized confusion matrix for diagnostic performance. The outer confusion matrix is the one for LN detection, while the inner ones are for cancer detection.

Fig. 7.

Generalized confusion matrix for diagnostic performance. The outer confusion matrix is the one for LN detection, while the inner ones are for cancer detection.

Close modal

Accurately diagnosing LN metastases requires years of training. The shortage of pathologists leads to suboptimal patient care, including delayed cancer diagnoses and diagnostic errors. The dangers of overwork, diminishing quality, and diagnostic error are all reported in pathology.

The application of deep learning in histopathology to automatically detect LN metastasis is relatively novel. Most of the literature has focused on clinical or radiomic features [29]. However, such clinical or radiomic features fail to reflect direct morphological changes due to metastasis, which can be revealed by WSIs. Deep learning models developed for specific organs, such as the breast [8, 22, 23] and oesophagus [28], are applicable under specific circumstances. However, LN metastasis from unknown primary tumours or rare histological subtypes is common in routine practice. To address clinical needs, we developed a deep learning-based system to localize and count LN metastases of different origins and histologic subtypes.

The model achieved high accuracy on LN cancer metastases originating from over 50 organs and revealed its power in detecting small metastases. In clinical practice, the detection of isolated tumour cells in H&E-stained slides is extremely challenging for pathologists. In the test set from CHCAMS, the model successfully detected the isolated tumour cells for two cases missed by both junior and senior pathologists in the original diagnostic report (see Fig. 8). Since these cells can be identified without immunohistochemistry (IHC), the mistake was made due to the high pressure imposed by daily work (approximately 200–300 slides per day). Implementing the system in pathologists’ diagnostic process would reduce the probability of missed diagnosis.

Fig. 8.

Missed cases by pathologists. a Ductal adenocarcinoma of the pancreas. b Squamous cell carcinoma of the cervix.

Fig. 8.

Missed cases by pathologists. a Ductal adenocarcinoma of the pancreas. b Squamous cell carcinoma of the cervix.

Close modal

It is important to conduct a detailed analysis of the failure cases. As shown in Figure 6a(i), lymphoblast cells appearing in the germinal centre are much larger than normal lymph cells and morphologically similar to metastases. Pathologists can distinguish germinal centres from metastases at the macro-level, while the deep learning model only obtains information regarding microregions due to the small tile size during the training phase. Another group contains squeezed lymphocytes (Fig. 6a(ii)), which are highly curved and irregular, quite similar to metastatic regions. Moreover, carbon deposition in lung LNs (Fig. 6a(iii)), tissue calcification (Fig. 6a(iv)), and blood lakes (Fig. 6a(v)) were occasionally identified as positive signals.

Compared with false positives, it is more important to analyse false-negative cases and propose possible solutions for future research. As shown in Figure 6b(i), the model missed the papillary thyroid carcinoma with follicular component. Similarly, malignant melanoma with metastatic cell nests rich in melanin was missed (Fig. 6b(ii)). Interestingly, tumour cells with less melanin background could be accurately detected (see Fig. 3c(iii)). Another case revealed poorly differentiated carcinoma scattered in the lymphocytes (Fig. 6b(iii)), making it difficult to detect. In addition, the WSI quality has a nonignorable influence on model performance. As shown in Figure 6b(iv), the WSI was blurred due to out-of-focus digitalization. All the false cases could be corrected by boosting the training set and imposing proper quality control.

We have also investigated the single-organ model performance on different body parts. Generally speaking, despite the best performing pan-origin model, the model trained by breast LN data performed best, while the one established by thyroid LN WSIs performed worst. From the morphology perspective, the tumour glands in breast cancer reveal both glandular and solid structures, which are similar to adenocarcinoma and squamous cell carcinoma, respectively. This finding gave the breast model advantages in detecting malignant tumours from different body parts. On the other hand, the main histopathological subtype of thyroid cancer is papillary carcinoma, which is morphologically similar to adenocarcinoma, leading to a lower AUC on organs from the thorax than on other body parts. Similar situations occurred for models trained by gastric and intestinal LN samples.

In addition to medical insights, there is a long debate of whether the deep learning model for pathological diagnosis is built by learning malignant or benign features. In other words, whether the model detects malignant tumours by learning cancer characteristics or determining them as abnormal ones. We named this ability as the power of generalizability and verified it by testing the models with subtypes outside the training set (online suppl. Fig. S13). In this research, the subtypes in the test set were far beyond the training ones, making us believe the model learns more knowledge on what is benign instead of what is malignant, and classifies abnormal cases as malignancies.

The next step is to put the pan-origin LN diagnostic model into clinical practice. Past work mainly focused on the screening and preliminary subtyping of primary organ H&E-stained slides. In the future, the next-generation pathological AI platform will provide diagnostic guidance for both primary organ and LN slides, combined with the ability of IHC quantification, making it possible to offer a complete assistance for surgical specimens. As illustrated in online supplementary Figure S14, one could upload all the H&E and IHC-stained slides in a file package. After communicating with the pathology information system, the platform automatically identifies primary organ, LN, and IHC slides by slide numbers. The AI-assisted results could be viewed on one interface, providing richer diagnostic aids and clinical insights.

This study protocol was reviewed and approved by the Ethics Committees of National Cancer Center, Cancer Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, approval number (20/172-2368). Since the reports were anonymized, the need for informed consent was waived by the Ethics Committees of National Cancer Center, Cancer Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College. The data used in this research are part of standard-of-care hospital routine.

Shuhao Wang is the co-founder and chief technology officer (CTO) of Thorough Future. Lang Wang is the algorithm researcher at Thorough Future. All remaining authors have declared no conflicts of interest.

This work is supported by CAMS Innovation Fund for Medical Sciences (No. 2021-I2M-C&T-A-017 to S.Z.); Beijing Hope Run Special Fund of Cancer Foundation of China (No. LC2017A07 to S.Z.); Special Research Fund for Central Universities, Peking Union Medical College, China (No. 2022-I2M-C&T-B-073 to S.Z.); National Natural Science Foundation of China (No. 81903019 to Y.P.); Capital’s Funds for Health Improvement and Research (No. 2020-2Z-4028 to L.L.); 2023 Science and Technology Projects of Qinghai Province, China (Basic Research Program) (No. 2023-ZJ-732 to S.W.); and Medical Big Data and Artificial Intelligence Project of the Chinese PLA General Hospital (to Z.S.).

S.Z., S.W., Z.S., and W.Z. proposed the research. Y.P., H.D., W.W., J.L, D.Q., Z.Y., J.J., Y.W., Q.F., L.L., and S.Z. performed the WSI annotation. Z.S. led the multicentre study. S.Z. and S.W. conducted the experiment. L.W. and S.W. wrote the deep learning code and performed the experiment. Q.L. conducted statistical analysis. Y.P., H.D., and S.W. wrote the manuscript. S.Z., Z.S., and W.Z. reviewed the manuscript.

Additional Information

Yi Pan, Hongtian Dai, and Shuhao Wang contributed equally to this work.

The data that support the findings of this study are included in this article and its supplementary material files. The training code base for the deep learning framework is available at https://github.com/ThoroughFuture/PathFrame. The LN region recognition model and post-processing codes are available at https://github.com/ThoroughFuture/LNDetect. Further inquiries can be directed to the corresponding author (S.Z.).

1.
Onder
L
,
Ludewig
B
.
A fresh view on lymph node organogenesis
.
Trends Immunol
.
2018
;
39
(
10
):
775
87
.
2.
Fidler
IJ
.
The pathogenesis of cancer metastasis: the “seed and soil” hypothesis revisited
.
Nat Rev Cancer
.
2003
;
3
(
6
):
453
8
.
3.
Hur
K
,
Han
TS
,
Jung
EJ
,
Yu
J
,
Lee
HJ
,
Kim
WH
, et al
.
Up-regulated expression of sulfatases (SULF1 and SULF2) as prognostic and metastasis predictive markers in human gastric cancer
.
J Pathol
.
2012
;
228
(
1
):
88
98
.
4.
Bhattacharya
P
,
Mukherjee
R
.
Lymph node extracapsular extension as a marker of aggressive phenotype: classification, prognosis and associated molecular biomarkers
.
Eur J Surg Oncol
.
2021
;
47
(
4
):
721
31
.
5.
Griebling
TL
,
Ozkutlu
D
,
See
WA
,
Cohen
MB
.
Prognostic implications of extracapsular extension of lymph node metastases in prostate cancer
.
Mod Pathol
.
1997
;
10
(
8
):
804
9
.
6.
Veronese
N
,
Nottegar
A
,
Pea
A
,
Solmi
M
,
Stubbs
B
,
Capelli
P
, et al
.
Prognostic impact and implications of extracapsular lymph node involvement in colorectal cancer: a systematic review with meta-analysis
.
Ann Oncol
.
2016
;
27
(
1
):
42
8
.
7.
Beroukhim
R
,
Mermel
CH
,
Porter
D
,
Wei
G
,
Raychaudhuri
S
,
Donovan
J
, et al
.
The landscape of somatic copy-number alteration across human cancers
.
Nature
.
2010
;
463
(
7283
):
899
905
.
8.
Ehteshami Bejnordi
B
,
Veta
M
,
Johannes van Diest
P
,
Van Ginneken
B
,
Karssemeijer
N
,
Litjens
G
, et al
.
Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer
.
JAMA
.
2017
;
318
(
22
):
2199
210
.
9.
Amin
MB
,
Edge
SB
,
Greene
FL
.
AJCC cancer staging manual
. 8th ed.
Springer Science & Business Media
;
2017
.
10.
LeCun
Y
,
Bengio
Y
,
Hinton
G
.
Deep learning
.
Nature
.
2015
;
521
(
7553
):
436
44
.
11.
Coudray
N
,
Ocampo
PS
,
Sakellaropoulos
T
,
Narula
N
,
Snuderl
M
,
Fenyö
D
, et al
.
Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning
.
Nat Med
.
2018
;
24
(
10
):
1559
67
.
12.
Song
Z
,
Zou
S
,
Zhou
W
,
Huang
Y
,
Shao
L
,
Yuan
J
, et al
.
Clinically applicable histopathological diagnosis system for gastric cancer detection using deep learning
.
Nat Commun
.
2020
;
11
(
1
):
4294
9
.
13.
Zheng
X
,
Wang
R
,
Zhang
X
,
Sun
Y
,
Zhang
H
,
Zhao
Z
, et al
.
A deep learning model and human-machine fusion for prediction of EBV-associated gastric cancer from histopathology
.
Nat Commun
.
2022
;
13
(
1
):
2790
12
.
14.
Yu
G
,
Sun
K
,
Xu
C
,
Shi
XH
,
Wu
C
,
Xie
T
, et al
.
Accurate recognition of colorectal cancer with semi-supervised deep learning on pathological images
.
Nat Commun
.
2021
;
12
(
1
):
6311
3
.
15.
Ström
P
,
Kartasalo
K
,
Olsson
H
,
Solorzano
L
,
Delahunt
B
,
Berney
DM
, et al
.
Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study
.
Lancet Oncol
.
2020
;
21
(
2
):
222
32
.
16.
Bulten
W
,
Pinckaers
H
,
van Boven
H
,
Vink
R
,
de Bel
T
,
van Ginneken
B
, et al
.
Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study
.
Lancet Oncol
.
2020
;
21
(
2
):
233
41
.
17.
Tolkach
Y
,
Dohmgörgen
T
,
Toma
M
,
Kristiansen
G
.
High-accuracy prostate cancer pathology using deep learning
.
Nat Mach Intell
.
2020
;
2
(
7
):
411
8
.
18.
Nagpal
K
,
Foote
D
,
Tan
F
,
Liu
Y
,
Chen
PHC
,
Steiner
DF
, et al
.
Development and validation of a deep learning algorithm for Gleason grading of prostate cancer from biopsy specimens
.
JAMA Oncol
.
2020
;
6
(
9
):
1372
80
.
19.
Bulten
W
,
Balkenhol
M
,
Belinga
JJA
,
Brilhante
A
,
Çakır
A
,
Egevad
L
, et al
.
Artificial intelligence assistance significantly improves Gleason grading of prostate biopsies by pathologists
.
Mod Pathol
.
2021
;
34
(
3
):
660
71
.
20.
Nagpal
K
,
Foote
D
,
Liu
Y
,
Chen
PHC
,
Wulczyn
E
,
Tan
F
, et al
.
Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer
.
NPJ Digit Med
.
2019
;
2
(
1
):
48
10
.
21.
Van der Laak
J
,
Litjens
G
,
Ciompi
F
.
Deep learning in histopathology: the path to the clinic
.
Nat Med
.
2021
;
27
(
5
):
775
84
.
22.
Litjens
G
,
Bandi
P
,
Ehteshami Bejnordi
B
,
Geessink
O
,
Balkenhol
M
,
Bult
P
, et al
.
1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset
.
GigaScience
.
2018
;
7
(
6
):
giy065
.
23.
Steiner
DF
,
MacDonald
R
,
Liu
Y
,
Truszkowski
P
,
Hipp
JD
,
Gammage
C
, et al
.
Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer
.
Am J Surg Pathol
.
2018
;
42
(
12
):
1636
46
.
24.
Pham
HHN
,
Futakuchi
M
,
Bychkov
A
,
Furukawa
T
,
Kuroda
K
,
Fukuoka
J
.
Detection of lung cancer lymph node metastases from whole-slide histopathologic images using a two-step deep learning approach
.
Am J Pathol
.
2019
;
189
(
12
):
2428
39
.
25.
Matsushima
J
,
Sato
T
,
Ohnishi
T
,
Yoshimura
Y
,
Mizutani
H
,
Koto
S
, et al
.
The use of deep learning-based computer diagnostic algorithm for detection of lymph node metastases of gastric adenocarcinoma
.
Int J Surg Pathol
.
2023
;
31
(
6
):
975
81
.
26.
Brockmoeller
S
,
Echle
A
,
Ghaffari Laleh
N
,
Eiholm
S
,
Malmstrøm
ML
,
Plato Kuhlmann
T
, et al
.
Deep learning identifies inflamed fat as a risk factor for lymph node metastasis in early colorectal cancer
.
J Pathol
.
2022
;
256
(
3
):
269
81
.
27.
Song
JH
,
Hong
Y
,
Kim
ER
,
Kim
SH
,
Sohn
I
.
Utility of artificial intelligence with deep learning of hematoxylin and eosin-stained whole slide images to predict lymph node metastasis in T1 colorectal cancer using endoscopically resected specimens
.
J Gastroenterol
.
2022
;
57
(
9
):
654
6
.
28.
Pan
Y
,
Sun
Z
,
Wang
W
,
Yang
Z
,
Jia
J
,
Feng
X
, et al
.
Automatic detection of squamous cell carcinoma metastasis in esophageal lymph nodes using semantic segmentation
.
Clin Transl Med
.
2020
;
10
(
3
):
e129
.
29.
Shaish
H
,
Mutasa
S
,
Makkar
J
,
Chang
P
,
Schwartz
L
,
Ahmed
F
.
Prediction of lymph node maximum standardized uptake value in patients with cancer using a 3D convolutional neural network: a proof-of-concept study
.
AJR Am J Roentgenol
.
2019
;
212
(
2
):
238
44
.