Abstract
Introduction: This study conducts the first meta-analysis to evaluate the diagnostic accuracy and the aggregated risk of malignancy associated with each category of the Papanicolaou Society of Cytopathology (PSC) system for reporting respiratory cytology. Methods: A systematic search was conducted in PubMed, Scopus, and Web of Science using the keywords “(Lung, Respiratory specimens) AND (Papanicolaou Society of Cytopathology System).” Articles were assessed for risk of bias using the QUADAS-2 tool. After excluding inadequate samples, sensitivity and specificity for various cut-off points. Summary receiver operating characteristic curves and diagnostic odds ratios were pooled to assess diagnostic accuracy. Results: Five studies, totaling 3,489 cases, were included. Sensitivity and specificity for the “Atypical and higher risk categories” considered positive were 60% (95% CI, 51–68%) and 87% (95% CI, 81–92%), respectively. For the “Suspicious for malignancy and higher risk categories” considered positive, sensitivity and specificity were 49% (95% CI, 40–58%) and 95% (95% CI, 92–97%), respectively. Sensitivity and specificity for the “Malignant” category considered positive for malignancy were 42% (95% CI, 33–52%) and 97% (95% CI, 92–99%), respectively. The pooled area under the curve ranged from 68 to 75% for each cut-off. Conclusion: This meta-analysis underscores the PSC system’s accuracy in reporting respiratory cytology. It highlights the diagnostic importance of the “Suspicious” and “Malignant” categories in identifying malignancy, and the utility of the “Atypical” category for initial screening. These findings support the PSC system’s role in enhancing diagnostic accuracy and clinical decision-making in respiratory cytology.
Introduction
Lung cancer was the most frequently diagnosed cancer in 2022, accounting for 2.5 million new cases, or 12.4% of all cancers globally [1]. Cytological evaluation of respiratory specimens is a rapid, minimally invasive, and cost-effective technique for evaluating malignancies and guiding patient management. However, respiratory cytology presents several diagnostic challenges, including distinguishing between reactive respiratory epithelial cells and malignant cells. The Papanicolaou Society of Cytopathology (PSC) system was proposed in 2016 to establish a standardized reporting format for respiratory cytology, enhancing communication between clinicians and pathologists. Additionally, each PSC category is linked to a specific risk of malignancy (ROM) and management recommendations [2‒4].
The PSC system categories include:
Category I: nondiagnostic
Category II: negative for malignancy
Category III: atypia of undetermined significance
Category IV: neoplastic
Category V: suspicious for malignancy
Category VI: malignant (M).
Since its introduction, numerous studies have assessed the utility of the PSC system in diagnosing respiratory cytology [4‒10]. However, a precise estimate of the sensitivity (SN) and specificity, in the form of a systematic review and meta-analysis of the diagnostic accuracy of the PSC system in respiratory cytology, is lacking.
This study aimed to estimate the diagnostic accuracy of the PSC system for diagnosing malignancy in respiratory cytology. The secondary objective was to evaluate the inadequacy rate of respiratory cytology specimens.
Materials and Methods
This systematic review and meta-analysis was registered with the Open Science Foundation on June 1, 2024. The review protocol is accessible at https://osf.io/c98wz. The study population included any patient presenting to the hospital with any lung/hilar lesion or detected incidentally on radiological investigations. The Papanicolaou Society of Cytology system for reporting respiratory cytology served as the index test, providing criteria for the cytological evaluation and classification of respiratory cytology specimens based on cellular morphology and architectural features.
Histopathological examination of lung/hilar tissue obtained via core needle biopsy or surgical excision was used as the reference standard for diagnosing these lesions. In cases where histopathological correlation was unavailable, clinicoradiological follow-up was included. Outcome measures evaluated included SN and specificity in diagnosing malignancy per lesion in adequate respiratory cytology specimens for each category. Additionally, the study evaluated the rate of inadequate cytology specimens obtained using the PSC system.
There were four possible decision cut-offs to determine if a respiratory cytology specimen reported by PSC was positive for malignancy:
- 1.
“Malignant” was considered positive.
- 2.
“Suspicious of Malignancy” or higher could be considered positive for malignancy.
- 3.
“Neoplastic” or any higher risk category was considered a positive test result for malignancy.
- 4.
“Atypical” or any higher risk category was considered a positive test result for malignancy.
The SN and specificity of each diagnostic cut-off were assessed separately. The procedure’s accuracy was also evaluated using the area under the curve (AUC).
A search of PubMed, Scopus, and Web of Science was conducted for studies published up to March 20, 2024, which was the final search date. Authors were contacted for full texts or supplementary information if necessary. The search strategy aimed to identify articles encompassing four key concepts: Lung (target organ), fine needle aspiration biopsy (index test), and diagnostic accuracy (study type). These concepts were linked using the Boolean “AND” operator. Within each main concept, related search terms were connected using the Boolean “OR” operator to broaden the search. For example, in PubMed, the search was structured as (Lung) OR (Respiratory specimens) and (PSC System for Reporting Respiratory Cytology). Search results were imported into Rayyan online software and screened independently by two investigators. Discrepancies in screening results were resolved by consensus among the investigators. Only journal articles were included, and the full texts of potentially eligible articles were reviewed. Studies evaluating respiratory cytology specimens using the PSC system in patients presenting with any lung/hilar lesion were included. Study selection was carried out independently by two authors, with consensus reached through discussion in case of conflict. Data extraction was performed by one investigator and verified by a second, capturing study characteristics, patient demographics, index test details, reference standards, and outcome measures.
Statistical analysis was conducted using Stata (version 13) and Revman 5.4. The selected articles were assessed for risk of bias using the QUADAS-2 tool, with conflicts resolved through discussion. The direction of bias was determined, and for flow and timing, the risk of bias for overestimating accuracy was evaluated. Meta-analysis for SN and specificity for each cut-off (“Atypical and above considered positive,” “Neoplastic and above considered positive,” “Suspicious of Malignancy and above considered positive,” and “Malignant considered positive”) was performed after excluding inadequate samples in each study. A random single proportion model was applied to estimate the pooled rate of each category, and between-study heterogeneity (τ2) was estimated using the maximum likelihood method. For diagnostic accuracy assessment, summary receiver operating characteristic curves were constructed, and the diagnostic odds ratio (DOR) was calculated. The summary receiver operating characteristic curves, their summary points (false positive rate on the horizontal axis and SN on the vertical axis), and the pooled AUC value were estimated using the summary ROC model. DORs were calculated from true positive, true negative, false positive, and false negative per study and were meta-analyzed using a random-effects model with the inverse variance method and the restricted maximum likelihood estimator for τ2. Heterogeneity was evaluated using the I2 statistic. Funnel plots were used to assess potential publication bias, and a likelihood ratio scatter matrix was constructed to evaluate the strength and consistency of the diagnostic accuracy estimates. A two-tailed p value of <0.05 was considered statistically significant.
Results
The PRISMA diagram for the study selection is presented in Figure 1. Five studies were identified and the distribution of the PSC categories along with their ROM and inadequacy rate are tabulated in Table 1. The true positive, true negative, false positive and false negative for each of the studies for analysis that is “Atypical considered positive,” “Neoplastic considered,” “Suspicious considered positive” and “Malignant considered positive” are presented in the online supplementary Table 1 (for all online suppl. material, see https://doi.org/10.1159/000541139).
Author . | Nondiagnostic . | Negative for malignancy . | Atypical . | Neoplastic . | Suspicious . | Malignant . | Distribution of specimens . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
% of cases . | ROM . | % of cases . | ROM . | % of cases . | ROM . | % of cases . | ROM . | % of cases . | ROM . | % of cases . | ROM . | ||
Canberk et al. [5] (2018) | 16 | 64.01 | 53 | 48.27 | 5.4 | 59.09 | 0.4 | 100.00 | 2.1 | 90.00 | 23.1 | 89.74 | BAL/bronchial wash: 574 (45%), FNA: 551 (43%), bronchial brush: 83 (6%), sputum: 82 (6%) |
Khari et al. [6] (2020) | 43 | 29.40 | 14 | 21.40 | 10 | 100 | - | - | 6 | 100 | 8 | 100 | BAL: 7 (7%), bronchial wash: 66 (66%), bronchial brush: 27 (27%) |
Layfield and Esebua [7] (2021) | 1.7 | 33.90 | 73.3 | 36.90 | 8.1 | 66.67 | - | - | 3.9 | 81.25 | 12.9 | 96.20 | Bronchial wash: 672 (57%), bronchial brush: 511 (43%) |
Goel et al. [8] (2022) | 14.3 | 100 | 46 | 41.40 | 7.9 | 60.00 | - | - | - | - | 31.8 | 90.00 | Not specified |
Ardor et al. [9] (2024) | 2.6 | 42.80 | 64.1 | 31.20 | 5 | 43.90 | 0.4 | 0.00 | 4 | 87.90 | 23.9 | 94.30 | BAL/bronchial wash: 340 (41%), FNA: 384 (47%), bronchial brush: 92 (11%), sputum: 4 (0.5%) |
Author . | Nondiagnostic . | Negative for malignancy . | Atypical . | Neoplastic . | Suspicious . | Malignant . | Distribution of specimens . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
% of cases . | ROM . | % of cases . | ROM . | % of cases . | ROM . | % of cases . | ROM . | % of cases . | ROM . | % of cases . | ROM . | ||
Canberk et al. [5] (2018) | 16 | 64.01 | 53 | 48.27 | 5.4 | 59.09 | 0.4 | 100.00 | 2.1 | 90.00 | 23.1 | 89.74 | BAL/bronchial wash: 574 (45%), FNA: 551 (43%), bronchial brush: 83 (6%), sputum: 82 (6%) |
Khari et al. [6] (2020) | 43 | 29.40 | 14 | 21.40 | 10 | 100 | - | - | 6 | 100 | 8 | 100 | BAL: 7 (7%), bronchial wash: 66 (66%), bronchial brush: 27 (27%) |
Layfield and Esebua [7] (2021) | 1.7 | 33.90 | 73.3 | 36.90 | 8.1 | 66.67 | - | - | 3.9 | 81.25 | 12.9 | 96.20 | Bronchial wash: 672 (57%), bronchial brush: 511 (43%) |
Goel et al. [8] (2022) | 14.3 | 100 | 46 | 41.40 | 7.9 | 60.00 | - | - | - | - | 31.8 | 90.00 | Not specified |
Ardor et al. [9] (2024) | 2.6 | 42.80 | 64.1 | 31.20 | 5 | 43.90 | 0.4 | 0.00 | 4 | 87.90 | 23.9 | 94.30 | BAL/bronchial wash: 340 (41%), FNA: 384 (47%), bronchial brush: 92 (11%), sputum: 4 (0.5%) |
The risk of bias assessed for each study is given in Figure 2a, and the summary of the risk of bias assessment across all included studies is given in Figure 2b. The risk of bias in patient selection, patient applicability, index test applicability, and reference test applicability were low in all studies. All studies did not report whether the respiratory cytological diagnosis was reported before the histopathological examination, and a risk of reference test bias category of unclear was given in all such cases. Three out of 5 studies had a high risk of flow bias.
Table 2 shows the pooled ROM associated with each category of the PSC system. The “Non-diagnostic,” “Benign,” “Atypical,” “Suspicious,” and “Malignant” categories were associated with a pooled ROM of 46% (95% CI, 30–63%), 36% (95% CI, 30–41%), 58% (95% CI, 45–70%), 85% (95% CI, 78–93%), and 95% (95% CI, 92–97%), respectively. The pooled ROM for the neoplastic category could not be evaluated due to lack of sufficient data as 3 of the studies Khari et al. [6], Layfield and Esebua [7] and Goel et al. [8] did not have any cases in this category.
Categories . | Number of studies pooled . | ROM % . | 95% CI, % . | τ2 . | τ . | I2, % . |
---|---|---|---|---|---|---|
Inadequate | 5 | 46 | 30–63 | 0.02 | 0.141 | 68.32 |
Benign | 5 | 36 | 30–41 | 0.00 | 0.00 | 52.35 |
Atypical | 5 | 58 | 45–70 | 0.01 | 0.1 | 68.67 |
Neoplastic | - | - | - | - | - | - |
Suspicious | 5 | 85 | 78–93 | 0.00 | 0.0 | 0.0 |
Malignant | 5 | 95 | 92–97 | 0.00 | 0.00 | 15.31 |
Categories . | Number of studies pooled . | ROM % . | 95% CI, % . | τ2 . | τ . | I2, % . |
---|---|---|---|---|---|---|
Inadequate | 5 | 46 | 30–63 | 0.02 | 0.141 | 68.32 |
Benign | 5 | 36 | 30–41 | 0.00 | 0.00 | 52.35 |
Atypical | 5 | 58 | 45–70 | 0.01 | 0.1 | 68.67 |
Neoplastic | - | - | - | - | - | - |
Suspicious | 5 | 85 | 78–93 | 0.00 | 0.0 | 0.0 |
Malignant | 5 | 95 | 92–97 | 0.00 | 0.00 | 15.31 |
The forest plots for SN, specificity and bivariate SN-specificity (SROC) plots for the analysis considering “Malignant” as positive for malignancy are depicted in Figures 3a and 4a respectively. SN was 42% (95% CI, 33–52%), and the specificity was 97% (95% CI, 92–99%). The pooled AUC was 68%, indicating sufficient diagnostic accuracy (Fig. 5a).
The forest plots for SN, specificity and bivariate SROC plots for the analysis considering “Suspicious for malignancy and higher risk categories” as positive for malignancy are depicted in Figures 3b and 4b respectively. SN was 49% (95% CI, 40–58%), and the specificity was 95% (95% CI, 92–97%). The pooled AUC was 71%, indicating good diagnostic accuracy (Fig. 5b).
The forest plots for SN, specificity and bivariate SROC plots for the analysis considering “Neoplastic and higher risk categories” as positive for malignancy are depicted in Figures 3c and 4c respectively. SN was 49% (95% CI, 40–58%), and the specificity was 95% (95% CI, 91–98%). The pooled AUC was 71%, indicating good diagnostic accuracy (Fig. 5c).
The forest plots for SN, specificity and bivariate SROC plots for the analysis considering “Atypical and higher risk categories” as positive for malignancy are depicted in Figures 3d and 4d respectively. SN was 60% (95% CI, 51–68%), and the specificity was 87% (95% CI, 81–92%). The pooled AUC was 75%, indicating good diagnostic accuracy (Fig. 5d).
The pooled DOR for the “Malignant” considered positive was 23.68 (95% CI, 15.85–35.36), also indicating a high level of diagnostic accuracy (Fig. 6a). The pooled DOR for “Suspicious and higher risk categories” considered positive was 20.48 (95% CI, 14.69–28.58), also indicating a high level of diagnostic accuracy (Fig. 6b). The pooled DOR for “Neoplastic and higher risk categories” considered positive was 19.37 (95% CI, 13.99–26.79), also indicating a high level of diagnostic accuracy (Fig. 6c). The pooled DOR for “Atypical and higher risk categories” considered positive was 10.31 (95% CI, 8.48–12.94), also indicating a high level of diagnostic accuracy (Fig. 6d).
The likelihood scatter matrix graph suggests that there is substantial evidence that the cut-off “Only malignant considered positive” is useful in ruling in but not ruling out malignancy as it has a positive likelihood ratio of almost 20 (Fig. 7a). The likelihood scatter matrix graph suggests that there is moderate evidence that the cut-off “Suspicious for malignancy and higher risk category considered positive” is useful in ruling in but not ruling out malignancy (Fig. 7b). The likelihood scatter matrix graph suggests that there is moderate evidence that the cut-off “Neoplastic and higher categories considered positive” is useful in ruling in but not ruling out malignancy (Fig. 7c). The likelihood scatter matrix graph suggests that there is minimal evidence that the cut-off “Atypical and higher categories considered positive” is neither useful in ruling in nor out malignancy as the negative likelihood ratio was greater than 0.1 while the positive likelihood ratio was lesser than 10 (Fig. 7d). Lastly, a funnel plot was constructed for “Malignant,” “Suspicious for malignancy and higher risk category,” “Neoplastic and higher categories” and “Atypical and higher categories” considered positive which did not reveal the presence of publication bias (p = 0.16, p = 0.45, p = 0.40, p = 0.71, respectively) (Fig. 8).
Discussion
To our knowledge, this study is the first systematic review and meta-analysis to evaluate the diagnostic accuracy of the PSC system for reporting respiratory cytology. The PSC system, introduced in 2016, provides a standardized format for reporting respiratory cytology, aiming to improve communication between clinicians and pathologists and to link each category to specific ROM and management recommendations [2‒4]. By synthesizing data from 5 studies, the study sheds light on the performance of different categories within the PSC for reporting respiratory cytology and their association with the ROM [5‒9]. The pooled ROM was calculated for each category of the PSC utilizing the data from the published studies. The “Insufficient,” “Benign,” “Atypical,” “Suspicious,” and “Malignant” categories were associated with a pooled ROM of 46%, 36%, 58%, 85%, and 95%, respectively. Heterogeneity was higher in the “Noon diagnostic,” and “atypical” categories compared with the other categories.
The ROM for the benign category in our analysis ranged from 21 to 48%. This variability highlights the importance of a comprehensive diagnostic approach in respiratory cytology, as a benign diagnosis does not eliminate the ROM. The elevated ROM in the benign category can be attributed to several factors.
First, the presence of atypical or inflammatory lesions that may mimic malignant conditions can contribute to a higher ROM. These lesions can sometimes present with cytological features that overlap with malignant diseases, thereby increasing the risk of a false benign diagnosis. Second, variations in the quality of histopathological evaluation and differences in criteria used for malignancy diagnosis across different studies can also affect the ROM. It is essential to consider these factors when interpreting the results and making clinical decisions based on cytological diagnoses.
Our findings demonstrate that the PSC system is effective in diagnosing malignancy in respiratory cytology specimens, with varying degrees of SN and specificity across different diagnostic cut-offs. The highest specificity was observed when “Malignant” was considered positive, achieving 97% specificity, although SN was lower at 42%. This suggests that the “Malignant” category is particularly reliable for ruling in malignancy but less effective for ruling it out, aligning with the principle of high specificity aiding in ruling in disease (SPIN) [11].
When “Suspicious for malignancy” or higher risk categories were considered positive, SN increased to 49% with a specificity of 95%. Similarly, considering “Neoplastic” and higher risk categories as positive resulted in 49% SN and 95% specificity. These findings indicate a balance between SN and specificity, suggesting these thresholds are useful in clinical settings where both diagnostic accuracy and reliability are crucial.
The “Atypical and higher risk categories” cut-off showed the highest SN at 60%, but with a reduced specificity of 87%. This cut-off is beneficial for initial screening purposes, where ruling out malignancy (high SN) is a priority. The specificity of the atypical category is a crucial aspect of diagnostic accuracy in respiratory cytology. Lesions that contribute to decreased specificity in the atypical category often include those with ambiguous cytological features that are neither clearly benign nor clearly malignant. These lesions can include atypical squamous cells of undetermined significance, atypical glandular cells, and reactive atypia due to inflammation or other benign conditions.
Reactive atypia, for example, can arise from chronic inflammation or infection, presenting with features that overlap with neoplastic changes. Similarly, atypical squamous cells of undetermined significance may result from various benign processes but raise concerns due to their ambiguous nature. This overlap complicates the interpretation and decreases the specificity of the atypical category, as distinguishing between benign and malignant lesions based solely on cytology can be challenging.
A thorough discussion of these contributing lesions and their impact on diagnostic specificity will provide a clearer understanding of the limitations and challenges associated with the atypical category. This will also emphasize the need for additional diagnostic modalities or follow-up procedures to enhance diagnostic accuracy and reduce the risk of misclassification.
The DOR further supports these findings, with the “Malignant” cut-off yielding the highest DOR of 23.68, indicating strong diagnostic performance. The “Suspicious” and “Neoplastic” categories also showed high DORs, reinforcing their utility in diagnostic workflows.
Our results are consistent with previous studies on other cytopathology reporting systems, such as the International System for Reporting Serous Fluid Cytopathology (ISRSFC), the Bethesda System for Reporting Thyroid Cytopathology, and the Sydney System. These studies also found that standardized reporting systems improve diagnostic accuracy and clinical decision-making. For example, the ISRSFC has demonstrated high reproducibility and acceptance among pathologists, significantly enhancing therapeutic decision processes.
The PSC guidelines define “Non-diagnostic” as a category for specimens that do not provide any useful diagnostic information regarding the pulmonary nodule, cyst, or mass lesion identified through imaging findings. In the current meta-analysis, the percentage of nondiagnostic cases varied from 1.7 to 43%. In the WHO system, the terms “Insufficient,” “Inadequate,” or “Non-diagnostic” are applied to specimens that lack sufficient material due to factors such as low cellularity, poor preparation, fixation issues, or interference from materials like blood. Additionally, “Non-diagnostic” can refer to cases where benign material is present but does not appear representative of a mass lesion or lung nodule seen on imaging. It is recommended to reserve these terms for cases of technical insufficiency and not for cases with abundant benign material where the imaging findings suggest a mass lesion. In instances where any atypical cells are present, even if the slides are otherwise inadequate, the case should be categorized as “Atypical” instead of “Non-diagnostic.”
Adequacy of cytopathological specimens is determined by specific criteria depending on the type of specimen. For fine needle aspiration biopsy of the lung, an adequate sample should contain alveolar macrophages, typically pigmented with carbon or hemosiderin, and may include fragments of collapsed alveolar septa. In endobronchial ultrasound-guided transbronchial needle aspiration targeting lymph nodes, an adequate sample should contain moderate to abundant lymphocytes or macrophages with anthracotic pigment, with a suggested criterion of more than 40 lymphocytes in the area of highest cellularity observed under high-power magnification. Sputum samples should contain alveolar macrophages and ciliated columnar cells, while bronchial brushings should demonstrate abundant bronchial epithelial cells, with macrophages potentially present. For bronchoalveolar lavage, an adequate sample should include identifiable alveolar macrophages, with a cutoff of more than 10 macrophages per high-power field. The number of bronchial epithelial cells in BAL should not exceed that of the alveolar macrophages. These specific guidelines are crucial for determining the adequacy of specimens in different types of cytopathological evaluations [12].
This systematic review and meta-analysis have several limitations. First, all included studies were retrospective, which may introduce inherent biases related to study design. Second, there was a high level of heterogeneity observed in the assessment of pooled ROM, particularly for the “Non-diagnostic” and “Atypical” categories. This heterogeneity could be due to variations in study populations, sample sizes, and diagnostic criteria across different studies.
Additionally, the lack of histopathological follow-up in some cytology specimens categorized as “Non-diagnostic” and “Atypical” may inflate the ROM. Incorporating clinical follow-up, along with meticulous correlation with radiological findings and medical history, would provide a more accurate assessment of malignancy risk.
The PSC system for respiratory cytology has proven to be a valuable tool in the diagnostic armamentarium for lung cancer. The high specificity of the “Malignant” category makes it reliable for confirming malignancy, while the higher SN of the “Atypical” and higher risk categories is useful for screening purposes. These findings highlight the importance of selecting appropriate cut-offs based on clinical context to optimize patient management.
This study underscores the diagnostic utility of the PSC system in respiratory cytology. By providing a standardized reporting framework linked to specific management recommendations, the PSC system enhances diagnostic accuracy and facilitates better clinical decision-making. This meta-analysis highlights the accuracy of the PSC in reporting respiratory cytology. Particularly, it emphasizes the diagnostic significance of the “Suspicious” and “Malignant” categories in identifying malignancy.
Statement of Ethics
An ethics statement is not applicable because this study is based exclusively on published literature.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
The authors received no funding for this study.
Author Contributions
Sana Ahuja: writing the protocol, independent assessor in the screening of studies, and assessment of the risk of bias; reviewed the extracted data; and primary writing of the manuscript. Marzieh Fattahi-Darghlou: data extraction and formal data analysis. Sufian Zaheer: involved in study ideation, checked the extracted data, checked and reviewed the risk of bias assessment, and critically reviewed the manuscript. Rhea Ahuja: literature search, statistics, helped achieve consensus in case of conflict in screening, and critically reviewed the manuscript.
Data Availability Statement
All data generated in this study are included in this article and its supplementary files. Further queries can be directed to the corresponding author.