Introduction: Since no universal cytological classification system for lung cancer has been established, the Japanese Lung Cancer Society and the Japanese Society of Clinical Cytology (JSCC) jointly established and reported four cytological categories: negative for malignancy, atypical cells, suspicious for malignancy, and malignancy. In 2022, the WHO Reporting System for Lung Cytopathology was published. This system presented five cytological classifications, including the four cytological category classifications above and insufficient/inadequate/nondiagnostic. The creation of a classification alone is not practical in actual clinical practice. Thus, we evaluated the reproducibility of the classification through tutorials and identified the issues and problems involved in the wide dissemination of this classification. Methods: Forty-two cases were selected from those used in previously published articles, and diagnosis and tutorial systems were created. The first diagnostic round and tutorial and the second diagnostic round were conducted on the web. Participants were recruited via the JSCC website and emails. Images (×100 and ×400) of the lesions to be diagnosed were categorizing by 4 cytological categories (benign, atypical, suspicious for malignancy, malignant), 7 suggestive pathological diagnoses, and 4 cytological features. The mean correct or incorrect answer rates for the 42 cases and the mean correct response rates for 105 participants were compared between the first and second rounds using McNemar’s test and t tests to identify cases with diagnostic difficulties and high tutorial effects. Results: Comparing the correct response to cytological categories, the results showed that 17 of 42 cases improved significantly. The mean number of correct answers for the four cytological categories increased significantly from 16.0 (38.1%) in the first round to 20.3 (48.3%) in the second round (p < 0.001). For the seven suggestive pathological diagnoses, the mean number of correct answers increased significantly from 20.3 (48.3%) in the first round to 25.1 (59.8%) in the second round (p < 0.001). The mean number of correct responses increased significantly from 40.2 (38%) in the first round to 51.5 (49%) in the second round (p = 0.0147). Four cases were difficult to match even after the tutorial and three cases were highly affected by the tutorial. The most important basis for diagnoses was nuclear findings in the first and second rounds. Conclusion: Comprehensive tutorials on diagnostic criteria are needed to effectively implement this system globally. In particular, devising ways to appropriately diagnose cancers with mild atypia or without characteristic morphology is important.

Histological and cytological classifications are essential for diagnosing and managing lung cancer. Cytology specimens of lung cancer are useful for morphological and genetic diagnosis even when tissue samples are unavailable. However, no universal category classification for lung cytopathology, such as histological classification, was available. Conversely, the Papanicolaou Society of Cytopathology in the USA published a category classification [1‒3], and Japan also used a separate cytological classification system [4, 5]. Thus, categorization was different in each country. Developing a common cytological classification system that could be used in articles and presentations at academic conferences was necessary. Therefore, the Japanese Lung Cancer Society and the Japanese Society of Clinical Cytology (JSCC) jointly developed the lung cytological classification system, which included negative for malignancy, atypical cells, suspicious for malignancy, and malignancy categories [6]. Yoshizawa et al. performed a follow-up study using these four categories plus benign tumors and confirmed the usefulness of this system [7]. In 2022, the WHO Reporting System for Lung Cytopathology was published. This system was based on the Japanese system and included the following cytology categories: insufficient/inadequate/nondiagnostic, benign, atypical, suspicious for malignancy, and malignant [8‒10]. Diagnostic categories in cytological reporting system must be linked to diagnostic management recommendations to improve communication with clinicians and support patient care. The WHO reporting system must be truly international to serve the needs of patients worldwide in many differently medically resourced settings.

However, the mere creation of a classification system is not practical in actual clinical practice. Thus, we evaluated the reproducibility of the classification system through a tutorial and investigated the issues and problems associated with the wide dissemination of this classification system. This study is the first reproducibility test of the WHO Reporting System for Lung Cytopathology and is the first step toward international standardization.

Study Cohort

Forty-two cases from our previous analyses were selected for this study [6, 7]. All cases were histologically confirmed by biopsies or surgical tissues. The cases were selected through discussions among four cytopathologists and six cytotechnologists and consisted of a mixture of easily diagnosed and difficult-to-diagnose cases. Benign, inflammatory, and malignant cases were selected, including 10 benign, 12 atypical, 12 suspicious for malignancy, and 8 malignant cases. These included 26 brushing samples, 6 touch preparations of tumors, 4 lavage specimens, 3 sputum samples, 2 transbronchial aspiration samples and 1 washing forceps specimen. All specimens were fixed in 95% ethanol and stained with Papanicolaou staining.

No inadequate/inadequate/nondiagnosis case was included in this study. The correct answers for these cases were given in the previous studies [6, 7]. The cases are listed in Table 1.

Table 1.

Case summary

Case No.Cytological category1st round*2nd round**p valuePathological diagnosis1st round*2nd roundp value
analyzed difference of number of correct response for cytological categorycorrect response for 5 suggestive pathological diagnoses was performed in “suspicious for malignancy” and “malignant” cases only
Suspicious for malignancy 15 (14.3%) 29 (27.6%) 0.022 SQCC 32 (30.5%) 49 (46.7%) 0.017 
Benign 75 (71.4%) 83 (79%) 0.170 Normal cell    
Atypical 11 (10.5%) 46 (43.8%) <0.001 Reactive    
Malignant 44 (41.9%) 50 (47.6%) 0.471 Mucoepidermoid carcinoma 22 (21.0%) 40 (38.1%) 0.007 
Atypical 38 (36.2%) 51 (48.6%) 0.061 Granuloma    
Suspicious for malignancy 15 (14.3%) 36 (34.3%) 0.001 ADC 61 (58.1%) 72 (68.6%) 0.136 
Benign 87 (82.9%) 93 (88.6%) 0.264 Interstitial pneumonia    
Suspicious for malignancy 26 (24.8%) 38 (36.2%) 0.067 ADC 79 (75.2%) 82 (78.1%) 0.710 
Atypical 28 (26.7%) 28 (26.7%) 1.000 Interstitial pneumonia    
10 Suspicious for malignancy 19 (18.1%) 34 (32.4%) 0.021 ADC 41 (39.0%) 47 (44.9%) 0.361 
11 Malignant 60 (57.1%) 72 (68.6%) 0.074 Adenoid cystic carcinoma 48 (45.7%) 75 (71.4%) <0.001 
12 Atypical 35 (33.3%) 53 (50.5%) 0.010 Interstitial pneumonia    
13 Suspicious for malignancy 28 (26.7%) 39 (37.1%) 0.127 ADC 70 (66.7%) 75 (71.4%) 0.499 
14 Malignant 69 (65.7%) 74 (70.5%) 0.472 ADC 87 (82.9%) 87 (82.9%) 1.000 
15 Benign 31 (29.5%) 59 (56.2%) <0.001 Sclerosing pneumocytoma    
16 Atypical 31 (29.5%) 39 (37.1%) 0.302 Radiation pneumonia    
17 Suspicious for malignancy 33 (31.4%) 43 (41%) 0.155 ADC 39 (37.1%) 45 (42.9%) 0.391 
18 Benign 91 (86.7%) 96 (91.4%) 0.302 Sarcoidosis    
19 Atypical 30 (28.6%) 48 (45.7%) 0.007 Organizing pneumonia    
20 Suspicious for malignancy 35 (33.3%) 42 (40%) 0.324 ADC 70 (66.7%) 80 (76.2%) 0.078 
21 Malignant 94 (89.5%) 93 (88.6%) 1.000 ADC 69 (57.1%) 65 (61.9%) 0.499 
22 Benign 98 (93.3%) 96 (91.4%) 0.752 Hamartoma    
23 Benign 47 (44.8%) 58 (55.2%) 0.100 Granuloma    
24 Suspicious for malignancy 18 (17.1%) 34 (32.4%) 0.008 SQCC 25 (23.8%) 38 (36.2%) 0.031 
25 Malignant 94 (89.5%) 89 (84.8%) 0.267 SCLC 100 (95.2%) 90 (85.7%) 0.024 
26 Suspicious for malignancy 26 (24.8%) 40 (38.1%) 0.040 ADC 58 (55.2%) 77 (73.3%) 0.001 
27 Atypical 31 (29.5%) 47 (44.8%) 0.021 Hypersensitivity pneumonitis    
28 Benign 33 (31.4%) 51 (48.6%) 0.004 Meningioma    
29 Malignant 63 (60%) 67 (63.8%) 0.571 Carcinoid 66 (62.9%) 80 (76.2%) 0.014 
30 Atypical 11 (10.5%) 30 (28.6%) 0.002 Eosinophilic pneumonia    
31 Suspicious for malignancy 20 (19%) 32 (30.5%) 0.067 ADC 88 (83.8%) 91 (86.7%) 0.579 
32 Atypical 34 (32.4%) 44 (41.9%) 0.155 Acute pneumonia    
33 Benign 29 (27.6%) 35 (33.3%) 0.307 Papilloma    
34 Suspicious for malignancy 36 (34.3%) 38 (36.2%) 0.871 ADC 88 (83.8%) 90 (85.7%) 0.480 
35 Atypical 25 (23.8%) 39 (37.1%) 0.035 Acute interstitial pneumonia    
36 Malignant 68 (64.8%) 79 (75.2%) 0.063 SQCC 99 (94.3%) 104 (99.0%) 0.074 
37 Atypical 26 (24.8%) 47 (44.8%) 0.002 Interstitial pneumonia    
38 Benign 7 (6.7%) 15 (14.3%) 0.061 Granuloma    
39 Malignant 60 (57.1%) 54 (51.4%) 0.391 ADC 63 (60.0%) 76 (72.4%) 0.012 
40 Benign 29 (27.6%) 43 (41%) 0.040 Solitary fibrous tumor    
41 Atypical 5 (4.8%) 26 (24.8%) <0.001 Inflammation    
42 Suspicious for malignancy 35 (33.3%) 54 (51.4%) 0.007 ADC 68 (64.8%) 69 (65.7%) 1.000 
Case No.Cytological category1st round*2nd round**p valuePathological diagnosis1st round*2nd roundp value
analyzed difference of number of correct response for cytological categorycorrect response for 5 suggestive pathological diagnoses was performed in “suspicious for malignancy” and “malignant” cases only
Suspicious for malignancy 15 (14.3%) 29 (27.6%) 0.022 SQCC 32 (30.5%) 49 (46.7%) 0.017 
Benign 75 (71.4%) 83 (79%) 0.170 Normal cell    
Atypical 11 (10.5%) 46 (43.8%) <0.001 Reactive    
Malignant 44 (41.9%) 50 (47.6%) 0.471 Mucoepidermoid carcinoma 22 (21.0%) 40 (38.1%) 0.007 
Atypical 38 (36.2%) 51 (48.6%) 0.061 Granuloma    
Suspicious for malignancy 15 (14.3%) 36 (34.3%) 0.001 ADC 61 (58.1%) 72 (68.6%) 0.136 
Benign 87 (82.9%) 93 (88.6%) 0.264 Interstitial pneumonia    
Suspicious for malignancy 26 (24.8%) 38 (36.2%) 0.067 ADC 79 (75.2%) 82 (78.1%) 0.710 
Atypical 28 (26.7%) 28 (26.7%) 1.000 Interstitial pneumonia    
10 Suspicious for malignancy 19 (18.1%) 34 (32.4%) 0.021 ADC 41 (39.0%) 47 (44.9%) 0.361 
11 Malignant 60 (57.1%) 72 (68.6%) 0.074 Adenoid cystic carcinoma 48 (45.7%) 75 (71.4%) <0.001 
12 Atypical 35 (33.3%) 53 (50.5%) 0.010 Interstitial pneumonia    
13 Suspicious for malignancy 28 (26.7%) 39 (37.1%) 0.127 ADC 70 (66.7%) 75 (71.4%) 0.499 
14 Malignant 69 (65.7%) 74 (70.5%) 0.472 ADC 87 (82.9%) 87 (82.9%) 1.000 
15 Benign 31 (29.5%) 59 (56.2%) <0.001 Sclerosing pneumocytoma    
16 Atypical 31 (29.5%) 39 (37.1%) 0.302 Radiation pneumonia    
17 Suspicious for malignancy 33 (31.4%) 43 (41%) 0.155 ADC 39 (37.1%) 45 (42.9%) 0.391 
18 Benign 91 (86.7%) 96 (91.4%) 0.302 Sarcoidosis    
19 Atypical 30 (28.6%) 48 (45.7%) 0.007 Organizing pneumonia    
20 Suspicious for malignancy 35 (33.3%) 42 (40%) 0.324 ADC 70 (66.7%) 80 (76.2%) 0.078 
21 Malignant 94 (89.5%) 93 (88.6%) 1.000 ADC 69 (57.1%) 65 (61.9%) 0.499 
22 Benign 98 (93.3%) 96 (91.4%) 0.752 Hamartoma    
23 Benign 47 (44.8%) 58 (55.2%) 0.100 Granuloma    
24 Suspicious for malignancy 18 (17.1%) 34 (32.4%) 0.008 SQCC 25 (23.8%) 38 (36.2%) 0.031 
25 Malignant 94 (89.5%) 89 (84.8%) 0.267 SCLC 100 (95.2%) 90 (85.7%) 0.024 
26 Suspicious for malignancy 26 (24.8%) 40 (38.1%) 0.040 ADC 58 (55.2%) 77 (73.3%) 0.001 
27 Atypical 31 (29.5%) 47 (44.8%) 0.021 Hypersensitivity pneumonitis    
28 Benign 33 (31.4%) 51 (48.6%) 0.004 Meningioma    
29 Malignant 63 (60%) 67 (63.8%) 0.571 Carcinoid 66 (62.9%) 80 (76.2%) 0.014 
30 Atypical 11 (10.5%) 30 (28.6%) 0.002 Eosinophilic pneumonia    
31 Suspicious for malignancy 20 (19%) 32 (30.5%) 0.067 ADC 88 (83.8%) 91 (86.7%) 0.579 
32 Atypical 34 (32.4%) 44 (41.9%) 0.155 Acute pneumonia    
33 Benign 29 (27.6%) 35 (33.3%) 0.307 Papilloma    
34 Suspicious for malignancy 36 (34.3%) 38 (36.2%) 0.871 ADC 88 (83.8%) 90 (85.7%) 0.480 
35 Atypical 25 (23.8%) 39 (37.1%) 0.035 Acute interstitial pneumonia    
36 Malignant 68 (64.8%) 79 (75.2%) 0.063 SQCC 99 (94.3%) 104 (99.0%) 0.074 
37 Atypical 26 (24.8%) 47 (44.8%) 0.002 Interstitial pneumonia    
38 Benign 7 (6.7%) 15 (14.3%) 0.061 Granuloma    
39 Malignant 60 (57.1%) 54 (51.4%) 0.391 ADC 63 (60.0%) 76 (72.4%) 0.012 
40 Benign 29 (27.6%) 43 (41%) 0.040 Solitary fibrous tumor    
41 Atypical 5 (4.8%) 26 (24.8%) <0.001 Inflammation    
42 Suspicious for malignancy 35 (33.3%) 54 (51.4%) 0.007 ADC 68 (64.8%) 69 (65.7%) 1.000 

Including the average number and percentage of correct responses among the 105 participants.

McNemar’s test.

*1st round: cytological diagnosis before tutorial.

**2nd round: cytological diagnosis after tutorial.

Research Participants

Participants were members of JSCC and qualified as cytotechnologists or cytopathologists. Participants were recruited through advertisements on the JSCC website and through the mailing list of 10,921 JSCC members. Written informed consent for the collection of participant data was obtained when a request for participation was received. The following data about the participants were collected: participant name, facility name, age, facility type, years of cytology experience, job title, and specialty. Participant names and facility names were collected only to assign personal numbers for participation in the study. Personal data were anonymized by assigning a personal number.

Study Design

A web-based response system using WEBCAS (WOW WORLD Inc., Tokyo, Japan) was developed. A ×100 and a ×400 image centered on the lesion to be diagnosed was included for each of the 42 cases. The images were classified based on the following: 4 categories, including 10 benign, 12 atypical, 12 suspicious for malignancy, and 8 malignant lesions; 7 suggestive pathological diagnosis meaning the histological type inferred from the morphology of the cytology specimen, including 5 benign tumors, 17 inflammatory changes, 13 adenocarcinoma (ADC), 3, squamous cell carcinoma (SQCC), 1 small-cell carcinoma (SCLC), 0 large-cell neuroendocrine carcinoma, and 3 other malignant tumors; and 4 choices for the first and second basis for the diagnosis, including cell cluster, cytoplasmic, nuclear findings, or other. All answers were collected online; the first response system was disclosed at 4 weeks. One week after the end of the first round, the correct answers and tutorial were presented for 2 weeks. The tutorial comprised a PDF file available online for 2 weeks, detailing the cytological findings and key observation points based on case images from the study and allowing participants to access and study the material at their convenience. The second round started 4 weeks after the end of the presentation of correct answers and tutorial. The cases and the questions for the second round were the same as the first round (Table 1; Fig. 1). This study schedule was designed based on the results of the Ebbinghaus forgetting curve of human memory [11].

Fig. 1.

Test schedule and participants. The outline of the study was announced by email and on the website of the Japanese Society for Clinical Cytology. Two weeks were set aside for accepting participants, and 180 applications were received. The study system was then opened for 4 weeks, and the first round was conducted with 140 participants (77.8% of the applicants). One week after the completion of the study, the correct answers were presented for 2 weeks as an educational period. The second round was conducted 4 weeks after the end of the presentation of correct answers. The second round included 105 participants (58.3% of the applicants and 75% of the participants in the first round).

Fig. 1.

Test schedule and participants. The outline of the study was announced by email and on the website of the Japanese Society for Clinical Cytology. Two weeks were set aside for accepting participants, and 180 applications were received. The study system was then opened for 4 weeks, and the first round was conducted with 140 participants (77.8% of the applicants). One week after the completion of the study, the correct answers were presented for 2 weeks as an educational period. The second round was conducted 4 weeks after the end of the presentation of correct answers. The second round included 105 participants (58.3% of the applicants and 75% of the participants in the first round).

Close modal

Calculation Method

The correct and incorrect answer rates were defined as the number of correct and incorrect answers divided by the total number of cases. The correct response rates were defined as the number of correct respondents divided by the total number of participants.

Assessment Factors

We compared the mean correct and incorrect answer rates for the 42 cases among the participants before and after the tutorial, and the mean correct response rates for the 105 participants among the cases. Cases that were difficult to match were extracted, and diagnostic points were examined. To evaluate the impact of the tutorial based on cytology experience, we analyzed changes in correct response rates after the tutorial based on years of experience (19 participants with 1–5, 15 with 6–10, 26 with >10 years, 19 with >20 years, and 26 with >30 years of experience).

Statistics

McNemar’s test was used to analyze difference of number of correct response for cytological category. Paired Student t tests were used to compare mean correct and incorrect answer rates and mean correct response rates. To examine the effects of the tutorial on each category and to identify cases with difficult-to-concordant diagnoses and cases with high tutorial effects, the first and second correct response rates were compared for each of the 42 cases using McNemar’s tests. A p value of <0.05 was considered significant.

Study Participants

The first round consisted of 180 research applicants and 140 participants (77.8% applicants). The second round consisted of 105 participants (58.3% of applicants, 75% of first round participants) (Fig. 1). Results from 105 participants were analyzed.

Comparison of the Correct Answers by Experience

Significant difference was found only between less than 5 years and 6–10 years (p = 0.03).

Comparison of the Correct Response for Cytological Category

As shown in Table 1, the results showed that 17 of 42 cases improved significantly, especially 8 of 12 atypical cases and 6 of 12 suspicious for malignancy cases.

Comparison of the Correct Answers per Participant

As shown in Table 2 and Figure 2, the mean number of correct answers for cytologic diagnostic categories increased from 16.0 of 42 cases (38.1%) in the first round to 20.3 of 42 cases (48.3%) in the second round (p < 0.001). In the case of the benign, the atypical, and the suspicious for malignancy, the mean number and rate of correct answers increased significantly between rounds 1 and 2 (p < 0.001). Conversely, for malignant cases, the mean number and rate of correct answers increased between the first and the second rounds; however, the difference was not statistically significant (p = 0.223).

Table 2.

Mean number and percentage of correct and incorrect answers in each category among the 42 cases

Cytological categoryAnswersIncorrect cytological category1st round *2nd round **p value
All (42) Correct  16.0 (38.1%) 20.3 (48.3%) <0.001 
Benign (10) Correct  5.0 (50%) 6.0 (60%) <0.001 
Incorrect Atypical 2.1 (21%) 2.2 (22%) 0.273 
Suspicious for malignancy 1.1 (11%) 1.0 (10%) 0.157 
Malignant 1.8 (18%) 0.87 (8.7%) <0.001 
Atypical (12) Correct  2.9 (24.2%) 5.0 (41.7%) <0.001 
Incorrect Benign 2.5 (21%) 2.7 (22.5%) 0.278 
Suspicious for malignancy 2.4 (20%) 2.2 (18%) 0.240 
Malignant 4.2 (35%) 2.1 (17.5%) <0.001 
Suspicious for malignancy (12) Correct  2.9 (24.2%) 4.4 (36.6%) <0.001 
Incorrect Benign 0.9 (7.5%) 0.7 (5.8%) 0.079 
Atypical 1.8 (15%) 1.9 (15.8%) 0.283 
Malignant 6.3 (52.5%) 5.0 (41.7%) 0.002 
Malignant (8) Correct  5.3 (66.3%) 5.5 (68.8%) 0.223 
Incorrect Benign 0.52 (6.5%) 0.32 (4%) 0.027 
Atypical 0.63 (7.9%) 0.5 (6.25%) 0.134 
Suspicious for malignancy 1.6 (20%) 1.7 (21%) 0.282 
Cytological categoryAnswersIncorrect cytological category1st round *2nd round **p value
All (42) Correct  16.0 (38.1%) 20.3 (48.3%) <0.001 
Benign (10) Correct  5.0 (50%) 6.0 (60%) <0.001 
Incorrect Atypical 2.1 (21%) 2.2 (22%) 0.273 
Suspicious for malignancy 1.1 (11%) 1.0 (10%) 0.157 
Malignant 1.8 (18%) 0.87 (8.7%) <0.001 
Atypical (12) Correct  2.9 (24.2%) 5.0 (41.7%) <0.001 
Incorrect Benign 2.5 (21%) 2.7 (22.5%) 0.278 
Suspicious for malignancy 2.4 (20%) 2.2 (18%) 0.240 
Malignant 4.2 (35%) 2.1 (17.5%) <0.001 
Suspicious for malignancy (12) Correct  2.9 (24.2%) 4.4 (36.6%) <0.001 
Incorrect Benign 0.9 (7.5%) 0.7 (5.8%) 0.079 
Atypical 1.8 (15%) 1.9 (15.8%) 0.283 
Malignant 6.3 (52.5%) 5.0 (41.7%) 0.002 
Malignant (8) Correct  5.3 (66.3%) 5.5 (68.8%) 0.223 
Incorrect Benign 0.52 (6.5%) 0.32 (4%) 0.027 
Atypical 0.63 (7.9%) 0.5 (6.25%) 0.134 
Suspicious for malignancy 1.6 (20%) 1.7 (21%) 0.282 

Paired Student t test, one-sided test.

*1st round: cytological diagnosis before tutorial.

**2nd round: cytological diagnosis after tutorial.

Fig. 2.

The mean number of correct answers by category for the 105 participants in the first and second rounds are shown. Significance was set at p < 0.05; the mean number of correct answers increased after the tutorial.

Fig. 2.

The mean number of correct answers by category for the 105 participants in the first and second rounds are shown. Significance was set at p < 0.05; the mean number of correct answers increased after the tutorial.

Close modal

Comparison of the Incorrect Answers per Participant

The results of the incorrect answer for 42 cases by cytological diagnostic category are shown in Table 2. The mean number of cases out of 10 benign and 12 atypical cases miscategorized as malignant were significantly lower in the second round than in the first round (p < 0.001). Of the 12 suspicious for malignancy cases, the number of cases miscategorized as malignant decreased significantly lower in the second round than in the first (p = 0.002). Of the 8 malignant cases, the number of cases miscategorized as benign decreased significantly lower in the second round than in the first (p = 0.027).

Comparison of the Correct Answer Rates by Suggestive Pathological Diagnosis

As shown in Table 3, the mean number of correct answers for all participants increased significantly from 20.3 of 42 (48.3%) in the first round to 25.1 of 42 (59.8%) in the second round (p < 0.001). The mean number of correct answers to benign tumors, inflammatory changes, and all five carcinomas of suggestive pathological diagnosis combined into one carcinoma category all increased significantly in the second round (p < 0.001).

Table 3.

Mean number and percentage of correct answers per case in the benign tumor, inflammation, and cancer groups among the 42 cases

Cytological diagnosis1st round*2nd round**p value
All (42) 20.3 (48.3%) 25.1 (59.8%) <0.001 
Benign tumor (5) 2.3 (46%) 3.0 (60%) <0.001 
Inflammation (17) 6.0 (35.3%) 8.1 (47.6%) <0.001 
Cancer (20) 12.0 (60%) 13.5 (67.5%) <0.001 
Cytological diagnosis1st round*2nd round**p value
All (42) 20.3 (48.3%) 25.1 (59.8%) <0.001 
Benign tumor (5) 2.3 (46%) 3.0 (60%) <0.001 
Inflammation (17) 6.0 (35.3%) 8.1 (47.6%) <0.001 
Cancer (20) 12.0 (60%) 13.5 (67.5%) <0.001 

Paired Student t test, one‐sided test.

*1st round: cytological diagnosis before tutorial.

**2nd round: cytological diagnosis after tutorial.

Comparison of the Correct Response Rates per Case

As shown in Table 4, the mean number of correct response rates increased significantly from 40.2 of 105 (38%) in the first round to 51.5 of 105 (49%) in the second round (p = 0.015). In each category, the mean number of correct response rates for the 10 benign and 8 malignant cases increased after the tutorial, but the difference was not significant. In contrast, the mean number of correct response rates for the 12 atypical cases increased significantly from 25.3 (24%) in the first round to 41.5 (39.5%) in the second round (p < 0.001). The mean number of correct response rates for the 12 suspicious for malignancy increased significantly from 25.5 (24.3%) in the first round to 38.3 (36.5%) in the second round (p < 0.001). The mean number of correct response rates for suggestive pathological diagnosis of suspicious for malignancy and malignant increased after the tutorial, although the difference was not significant.

Table 4.

Mean number and percentage of correct responses per case among the 105 participants

Cytological category1st round*2nd round**p value
All 40.2 (38%) 51.5 (49%) 0.015 
Benign 52.5 (50%) 62.8 (59.8%) 0.228 
Atypical 25.3 (24%) 41.5 (39.5%) <0.001 
Suspicious for malignancy 25.5 (24.3%) 38.3 (36.5%) <0.001 
Malignant 69.0 (65.7%) 72.0 (68.6%) 0.359 
Each presumptive tissue types (suspicious for malignancy and malignant) 63.7 (60.6%) 71.5 (68.0%) 0.123 
Cytological category1st round*2nd round**p value
All 40.2 (38%) 51.5 (49%) 0.015 
Benign 52.5 (50%) 62.8 (59.8%) 0.228 
Atypical 25.3 (24%) 41.5 (39.5%) <0.001 
Suspicious for malignancy 25.5 (24.3%) 38.3 (36.5%) <0.001 
Malignant 69.0 (65.7%) 72.0 (68.6%) 0.359 
Each presumptive tissue types (suspicious for malignancy and malignant) 63.7 (60.6%) 71.5 (68.0%) 0.123 

Paired Student t test, one-sided test.

*1st round: cytological diagnosis before tutorial.

**2nd round: cytological diagnosis after tutorial.

Cases of Low Tutorial Effectiveness

A McNemar’s test was performed to analyze differences in the number of correct response rates for cytological categories among the 105 participants before and after the tutorial for each case. Cases with no significant difference before and after the tutorial in each category, low correct response rates, and poor tutorial effects were selected (Table 1). Case 38 (granulomatous lesion, benign) had the lowest correct response rate among the benign cases (1st: 6.7% vs. 2nd: 14.3%, p = 0.061). Case 9 (interstitial pneumonia, atypical) had same correct response rate in the first and second rounds (1st: 26.7 vs. 2nd: 26.7%, p = 1.000). The correct response rate in Case 34 (ADC, suspicious for malignancy) did not increase significantly after the tutorial (1st: 34.3% vs. 2nd: 36.2%, p = 0.871). Case 4 (mucoepidermoid carcinoma, malignant) had the highest correct response rates among the four categories, but the correct response rate did not increase after the tutorial (1st: 41.9% vs. 2nd: 47.6%, p = 0.471) (Fig. 3a–d).

Fig. 3.

Cases with low tutorial effects. a Case 38, granuloma in the benign category. In this case, the cells do not have a distinct nuclear size difference or chromatin hyperexpansion. The cytoplasm is somewhat thicker and more abundant, with additional squamous metaplasia; magnification: ×100 (ai) and ×400 (aii). b Case 9, interstitial pneumonia in the atypical category. The nuclei of the cells are not different in size, and the chromatin is uniform, but dense proliferating cells are observed. The cells are presumed to be bronchial epithelial cells, although no obvious lineage brushes can be observed; magnification: ×100 (bi) and ×400 (bii). c Case 34, ADC in the suspicious for malignancy category. Cells with diverse chromatin are irregularly stacked. ADC is presumed, but size disparity and nuclear atypia are not obvious; magnification: ×100 (ci) and ×400 (cii). d Case 4, mucoepidermoid carcinoma in the malignant category. Cell clusters are observed in the cytoplasm, and mucus-containing, eccentric nucleus and mucus-free cells are adjacent to each other. The nuclear findings are similar. The case is presumed to be mucoepidermoid carcinoma; magnification: ×100 (di) and ×400 (dii).

Fig. 3.

Cases with low tutorial effects. a Case 38, granuloma in the benign category. In this case, the cells do not have a distinct nuclear size difference or chromatin hyperexpansion. The cytoplasm is somewhat thicker and more abundant, with additional squamous metaplasia; magnification: ×100 (ai) and ×400 (aii). b Case 9, interstitial pneumonia in the atypical category. The nuclei of the cells are not different in size, and the chromatin is uniform, but dense proliferating cells are observed. The cells are presumed to be bronchial epithelial cells, although no obvious lineage brushes can be observed; magnification: ×100 (bi) and ×400 (bii). c Case 34, ADC in the suspicious for malignancy category. Cells with diverse chromatin are irregularly stacked. ADC is presumed, but size disparity and nuclear atypia are not obvious; magnification: ×100 (ci) and ×400 (cii). d Case 4, mucoepidermoid carcinoma in the malignant category. Cell clusters are observed in the cytoplasm, and mucus-containing, eccentric nucleus and mucus-free cells are adjacent to each other. The nuclear findings are similar. The case is presumed to be mucoepidermoid carcinoma; magnification: ×100 (di) and ×400 (dii).

Close modal

A McNemar’s test was performed to analyze the difference in the number of correct responses for the five suggestive pathological diagnoses (ADC, SQCC, SCLC, large-cell neuroendocrine carcinoma, and other malignancies) for the 20 suspicious for malignancy and malignant cases. A significant tutorial effect was detected in 8 out of 20 cases (Table 1). The correct response rate for Case 4 (mucoepidermoid carcinoma, malignant) was low but significantly increased after the tutorial (1st: 21% vs. 2nd: 38.1%, p = 0.007) (Fig. 3d). In Case 24 (SQCC, suspicious for malignancy), there was also significant difference between after the tutorial (1st: 23.8% vs. 2nd: 36.2%, p = 0.031) (Fig. 4a). Case 25 (SCLC, suspicious for malignancy) had a higher percentage of correct responses (1st: 95.2% vs. 2nd: 85.7%, p = 0.024), although the percentage of correct responses decreased after the tutorial compared to before (Fig. 4b).

Fig. 4.

Difficult-to-diagnose cases by histological type. a Case 24, SQCC in the suspicious for malignancy category. The background of the specimen is necrotic. In this case, irregularly stacked cell clusters and large atypical cells are seen. Estimating the histological classification is difficult; magnification: ×100 (ai) and ×400 (aii). b Case 25, SCLC in the malignant category. In this case, the chromatin is fine and granular, and the naked nucleus-like cells lacking cytoplasm appear as loosely bound aggregates. SCLC is presumed; magnification: ×100 (bi) and ×400 (bii).

Fig. 4.

Difficult-to-diagnose cases by histological type. a Case 24, SQCC in the suspicious for malignancy category. The background of the specimen is necrotic. In this case, irregularly stacked cell clusters and large atypical cells are seen. Estimating the histological classification is difficult; magnification: ×100 (ai) and ×400 (aii). b Case 25, SCLC in the malignant category. In this case, the chromatin is fine and granular, and the naked nucleus-like cells lacking cytoplasm appear as loosely bound aggregates. SCLC is presumed; magnification: ×100 (bi) and ×400 (bii).

Close modal

Cases of High Tutorial Effectiveness

A McNemar’s test was performed for the cytological category of each case, and cases with a high tutorial effect (an increased number of correct responses) were selected (Table 1). The correct response rate for Case 15 (sclerosing pneumocytoma, benign) significantly increased (1st: 29.5% vs. 2nd: 56.2%, p < 0.001) (Fig. 5a). The correct response rate for Case 3 (rheumatoid lung, atypical) (Fig. 5b) significantly increased (1st: 10.5% vs. 2nd: 43.8%, p < 0.001). The correct response rate for Case 41 (inflammatory changes, atypical) (Fig. 5c) significantly increased (1st: 4.8% vs. 2nd: 24.8%, p < 0.001). The correct response rate for Case 6 (ADC, suspicious for malignancy) (Fig. 5d) significantly increased (1st: 14.3% vs. 2nd: 34.3%, p = 0.001).

Fig. 5.

The cases with high tutorial effects. a Case 15, sclerosing pneumocytoma in the benign category. In this specimen, the vascular interstitium is in the center of the cell aggregates, and little variation in the size of the cells can be detected. In addition, the cell density has increased. No obvious nuclear atypia can be detected, and the cells appear to be a cluster of atypical cells similar to type II alveolar epithelium. Spindle cells are also seen in the specimen. We presume sclerosing alveolar epithelioma, which is a benign tumor; magnification: ×100 (ai) and ×400 (aii). b Case 3, rheumatoid lung in the atypical category. The background of this specimen is inflammatory. The cells are irregularly stacked, with markedly different sizes of nuclei and nuclear atypia. However, these cells are uniform with no chromatin hyperplasia, and normal lineage cylinder epithelial cells are present. Regenerative bronchial epithelial cells are presumed; magnification: ×100 (bi) and ×400 (bii). c Case 41, inflammation in the atypical category. Cell-dense aggregates are arranged in a fenestrated pattern with some stacking. Although the nuclei are enlarged, the individual cells do not show clear atypia, suggesting bronchial epithelial cells; magnification: ×100 (ci) and ×400 (cii). d Case 6, ADC in the suspicious for malignancy category. An inflammatory background can be observed. The cell aggregates show irregular stacking. The cytoplasm of these cells is lacy, the nuclei are irregularly sized, and small nucleoli are present in the nuclei. Nuclear atypia is also present, and the chromatin is fused. The cells are suspected to be ADC but are also suspicious for malignancy due to the marked degeneration and vacuolation of the cytoplasm; magnification: ×100 (di) and ×400 (dii).

Fig. 5.

The cases with high tutorial effects. a Case 15, sclerosing pneumocytoma in the benign category. In this specimen, the vascular interstitium is in the center of the cell aggregates, and little variation in the size of the cells can be detected. In addition, the cell density has increased. No obvious nuclear atypia can be detected, and the cells appear to be a cluster of atypical cells similar to type II alveolar epithelium. Spindle cells are also seen in the specimen. We presume sclerosing alveolar epithelioma, which is a benign tumor; magnification: ×100 (ai) and ×400 (aii). b Case 3, rheumatoid lung in the atypical category. The background of this specimen is inflammatory. The cells are irregularly stacked, with markedly different sizes of nuclei and nuclear atypia. However, these cells are uniform with no chromatin hyperplasia, and normal lineage cylinder epithelial cells are present. Regenerative bronchial epithelial cells are presumed; magnification: ×100 (bi) and ×400 (bii). c Case 41, inflammation in the atypical category. Cell-dense aggregates are arranged in a fenestrated pattern with some stacking. Although the nuclei are enlarged, the individual cells do not show clear atypia, suggesting bronchial epithelial cells; magnification: ×100 (ci) and ×400 (cii). d Case 6, ADC in the suspicious for malignancy category. An inflammatory background can be observed. The cell aggregates show irregular stacking. The cytoplasm of these cells is lacy, the nuclei are irregularly sized, and small nucleoli are present in the nuclei. Nuclear atypia is also present, and the chromatin is fused. The cells are suspected to be ADC but are also suspicious for malignancy due to the marked degeneration and vacuolation of the cytoplasm; magnification: ×100 (di) and ×400 (dii).

Close modal

Most Important Basis for Diagnoses in Each Case

Nuclear findings were the most important basis for diagnoses in 31 of 42 cases (74%) in the first round and 31 of 42 cases (74%) in the second round, either. The details of the most important basis for diagnosis are listed in Table 5.

Table 5.

Most important basis for categorization in each case

First roundSecond round
Categorization First basis for categorize Second basis for categorize First basis for categorize Second basis for categorize 
Nuclear findings 31 20* 31 20* 
Cell aggregate 10 13* 10 12* 
Cytoplasmic 10 12 
First roundSecond round
Categorization First basis for categorize Second basis for categorize First basis for categorize Second basis for categorize 
Nuclear findings 31 20* 31 20* 
Cell aggregate 10 13* 10 12* 
Cytoplasmic 10 12 

*An equal number of participants chose different categories.

The mean correct answer rates increased significantly after the tutorial. As shown in Table 2, the effects of the tutorial for the new reporting system for lung cytopathology were significant, especially in atypical and suspicious for malignancy cases. The tutorial also affected the correct answer rates for benign and malignant category cases, but the effects were limited. Part of the problem in reproducibility, especially for intermediate categories, is that the morphologic changes are a spectrum through which we draw artificial lines in an attempt to correlate with biological behavior [12‒14]. In other words, benign and malignancy are easy to distinguish morphologically.

The tutorial was not effective in improving diagnoses in some cases. For example, Case 38 was a granulomatous lesion and benign. However, the participants may have diagnosed the case as atypical or malignant based on the nuclear findings, cytoplasmic thickness, and squamous metaplasia. Case 9 was an atypical one that most participants diagnosed as benign. That was an interstitial pneumonitis case, with no cell size discrepancy or homogeneous chromatin, but the image exhibited dense proliferating cells. The participants were misled to believe that the case was benign, but the dense proliferating cells indicate that the case should be diagnosed as atypical. In the suspicious for malignancy Case 34, almost one-third of the participants diagnosed malignant in the first and second rounds, indicating that the tutorial had no effect. In this case, the irregular dense proliferating cells had a variety of chromatin, suggesting ADC. However, no clear differences in size or nuclear atypia were observed, and the case should have been diagnosed as suspicious for malignancy rather than malignant. Case 4 was mucoepidermoid carcinoma. The nuclear findings were similar between the two types of cells, and the participants have been confused about the difference between malignancy and suspicious for malignancy.

Meanwhile, the highest tutorial effect was indicated in Case 15, a sclerosing pneumocytoma in the benign category. Although the cells lacked nuclear atypia, the high cell density may have discouraged participants from categorizing the case as benign in the first round, and many participants categorized the image as malignant or suspicious for malignancy. However, the tutorial highlighted the lack of size disparity in the cells and the lack of nuclear atypia. Thus, the percentage of correct responses increased in the second round.

The most important diagnostic point evaluated by the participants was nuclear findings, which may have led to diagnostic confusion when the atypia of nuclei was mild. Cytological diagnoses should be based on differences in the nuclear size and chromatin distribution. Reactive cells may also have more pleomorphic, binucleated, or enlarged nuclei than malignant cells, which may lead to overdiagnosis if the focus is solely on nuclear findings [15]. However, specimens assigned to the atypical category will demonstrate lesser degrees of cytomorphologic abnormality than those assigned to the suspicious for malignancy category [16]. In this study, the percentage of correct responses for benign, atypical, and suspicious for malignancy categories increased when the tutorial highlighted the lack of chromatin increase and nuclear atypia.

The main limitation of this study was that only two images were presented instead of the entire slide. In general, cytology specimens are examined for the target cells, the background cells, necrosis, and mucin. However, this study focused on the tutorial effects of the new classification system. Thus, we wanted the participants to focus only on the target cells, and we presented only two images. In addition, since this was a cytology study involving many cytology specialists, the use of virtual slides and the large number of cases increased the time required for case observation longer, making it difficult to maintain concentration, and there was concern that this would interfere with the work of the cytology specialists, so the number of cases was limited to 42, a slightly smaller number than before [17].

We showed that web-based tutorials are particularly effective for benign, atypical, and suspicious for malignancy categories. Malignant category had a high correct answer and response rate and remained high before and after the tutorial; therefore, the tutorial should be fine as similar volume as our study. In order for this reporting system to be used effectively in many countries, it is necessary to provide and disseminate appropriate tutorials on diagnostic criteria via the internet. In particular, detailed explanations and annotations on important points should be provided to enable correct diagnosis of cancers with mild atypia, and cancers lacking characteristic morphology.

We many thank the 140 participants, especially the 105 cytotechnologists and cytopathologists who participated in both rounds of this study and JSCC secretariats. We also thank Dr. Satoru Shimizu of Tokyo Women's Medical University for giving us advice on statistical analysis of the data in this reproducibility study of the classification through tutorials.

Written informed consent was waived for this retrospective study which analyzed. Cytological specimens with limited clinical information by the Ethics Committee of Tokyo Women’s Medical University (Approval No.: 4873-R; approval date: May 19, 2021).

The authors have no conflicts of interest to declare.

The Japan Lung Cancer Society and the Japanese Society of Clinical Cytology paid the developing of the web-based response and tutorial system and fee for the English editing service.

Study concept and design: Yuko Minami, Kenzo Hiroshima, Akihiko Yoshizawa, Akemi Takenaka, Reiji Haba, Kunimitsu Kawahara, Hirokuni Kakinuma, Yasuo Shibuki, Shinji Miyake, and Yukitoshi Satoh. Reviewing the cases: Yuko Minami, Akemi Takenaka, Hirokuni Kakinuma, Yasuo Shibuki, and Shinji Miyake. Management of images: Akemi Takenaka. Analysis and interpretation of data: Yuko Minami, Kenzo Hiroshima, Akihiko Yoshizawa, and Yukitoshi Satoh. Drafting of the manuscript: Yuko Minami. Critical revision of the manuscript: Yuko Minami, Kenzo Hiroshima, Akihiko Yoshizawa, Reiji Haba, Kunimitsu Kawahara, and Yukitoshi Satoh.

All data analyzed during this study are included in this article. Further inquiries can be directed to the corresponding author.

1.
Suen
KC
,
Abdul-Karim
FW
,
Kaminsky
DB
,
Layfield
LJ
,
Miller
TR
,
Spires
SE
, et al
.
Guidelines of the Papanicolaou Society of Cytopathology for the examination of cytologic specimens obtained from the respiratory tract
.
Diagn Cytopathol
.
1999
;
21
(
1
):
61
9
.
2.
Layfield
LJ
,
Baloch
Z
,
Elsheikh
T
,
Litzky
L
,
Rekhtman
N
,
Travis
WD
, et al
.
Standardized terminology and nomenclature for respiratory cytology: the Papanicolaou Society of Cytopathology guidelines
.
Diagn Cytopathol
.
2016
;
44
(
5
):
399
409
.
3.
Layfield
LJ
,
Baloch
Z
, editors.
The Papanicolaou society of cytopathology system for reporting respiratory cytology. Definitions, criteria, explanatory notes, and recommendations for ancillary testing
.
USA
:
Springer
;
2018
.
4.
The Japan Lung Cancer Society
.
General rule for clinical and pathological record of lung cancer
.
Tokyo
:
Kanehara
;
1978
.
5.
The Japan Lung Cancer Society
.
General rule for clinical and pathological record of lung cancer
. 8th ed;
2021
. Revised version, Tokyo: Kanehara.
6.
Hiroshima
K
,
Yoshizawa
A
,
Takenaka
A
,
Haba
R
,
Kawahara
K
,
Minami
Y
, et al
.
Cytology reporting system for lung cancer from the Japan Lung Cancer Society and Japanese Society of Clinical Cytology: an interobserver reproducibility study and risk of malignancy evaluation on cytology specimens
.
Acta Cytol
.
2020
;
64
(
5
):
452
62
.
7.
Yoshizawa
A
,
Hiroshima
K
,
Takenaka
A
,
Haba
R
,
Kawahara
K
,
Minami
Y
, et al
.
Cytology reporting system for lung cancer from the Japan Lung Cancer Society and the Japanese Society of Clinical Cytology: an extensive study containing more benign lesions
.
Acta Cytol
.
2022
;
66
(
2
):
124
33
.
8.
International academy of cytology – international agency for research on cancer – world health organization joint editorial board WHO reporting system for lung cytopathology
.
IAC-IARC-WHO cytopathology reporting systems series
. 1st ed,
International Agency for Research on Cancer
,
Lyon (France)
;
2022
.
9.
Schmitt
FC
,
Bubendorf
L
,
Canberk
S
,
Chandra
A
,
Cree
IA
,
Engels
M
, et al
.
The world health organization reporting system for lung cytopathology
.
Acta Cytol
.
2023
;
67
(
1
):
80
91
.
10.
Canberk
S
,
Field
A
,
Bubendorf
L
,
Chandra
A
,
Cree
IA
,
Engels
M
, et al
.
A brief review of the WHO reporting system for lung cytopathology
.
J Am Soc Cytopathol
.
2023
;
12
(
4
):
251
7
.
11.
Murre
JMJ
,
Dros
J
.
Replication and analysis of Ebbinghaus’ forgetting curve
.
PLoS One
.
2015
;
10
(
7
):
e0120644
.
12.
Manucha
V
,
Wang
C
,
Huang
Y
.
Non-small cell lung carcinoma subtyping on cytology without the use of immunohistochemistry - can we meet the challenge
.
Acta Cytol
.
2012
;
56
(
4
):
413
8
.
13.
Jain
D
,
Nambirajan
A
,
Chen
G
,
Geisinger
K
,
Hiroshima
K
,
Layfield
LJ
, et al
.
NSCLC subtyping in conventional cytology: results of the International Association for the Study of Lung Cancer Cytology Working Group Survey to determine specific cytomorphologic criteria for adenocarcinoma and squamous cell carcinoma
.
J Thorac Oncol
.
2022
;
17
(
6
):
793
805
.
14.
Hoshi
R
,
Furuta
N
,
Horai
T
,
Ishikawa
Y
,
Miyata
S
,
Satoh
Y
.
Discriminant model for cytologic distinction of large cell neuroendocrine carcinoma from small cell carcinoma of the lung
.
J Thorac Oncol
.
2010
;
5
(
4
):
472
8
.
15.
Huang
CC
,
Collins
B
,
Flint
A
,
Michael
CW
.
Pulmonary neuroendocrine tumors: an entity in search of cytologic criteria
.
Diagn Cytopathol
.
2013
;
41
(
8
):
689
96
.
16.
Layfield
LJ
,
Esebua
M
,
Dodd
L
,
Giorgadze
T
,
Schmidt
RL
.
The Papanicolaou Society of Cytopathology guidelines for respiratory cytology reproducibility of categories among observers
.
CytoJournal
.
2018
;
15
:
22
.
17.
Crescenzi
A
,
Trimboli
P
,
Basolo
F
,
Frasoldati
A
,
Orlandi
F
,
Palombini
L
, et al
.
Exploring the inter-observer agreement among the members of the Italian consensus for the classification and reporting of thyroid cytology
.
Endocr Pathol
.
2020
;
31
(
3
):
301
6
.