Abstract
Introduction: The aim of the study was to perform the first meta-analysis for assessment of the pooled risk of malignancy of each category of the Sydney system for reporting of lymph nodal aspirates along with the evaluation of diagnostic accuracy. Methods: PubMed/MEDLINE and Embase were searched with the following keywords: “(Lymph node) AND (fine needle aspiration biopsy) OR (International system OR Sydney system)” in the timeframe 2020 to August 4, 2023. The selected articles were assessed for the risk of bias by the QUADAS-2 tool. The meta-analysis for sensitivity (SN) and specificity for each cut-off, that is, “atypical considered positive,” “suspicious of malignancy considered positive,” and “malignant considered positive” for the lesions, was carried out after excluding the inadequate samples in each study. To assess the diagnostic accuracy, summary receiver operating characteristic curves were constructed, and the diagnostic odds ratio was pooled in both scenarios. Results: Nine studies, all of which were retrospective cross-sectional studies, were evaluated with a total of 13,205 cases. The SN and specificity for the “atypical and higher risk categories” considered positive for malignancy were 97% (95% CI, 95–99%) and 96% (95% CI, 91–98%), respectively. The SN and specificity for the “suspicious of malignancy and higher risk categories” considered positive for malignancy were 91% (95% CI, 85–95%) and 99% (95% CI, 97–100%), respectively. The SN and specificity for the “malignant” considered positive for malignancy were 75% (95% CI, 65–84%) and 100% (95% CI, 99–100%), respectively. The pooled area under the curve was 99–100% for each of the cut-offs. Conclusion: This meta-analysis highlights the accuracy of the Sydney system in reporting lymph node aspirates. It exhibits the significance of the “suspicious” and “malignant” categories in diagnosing malignancy and of the “benign” category in excluding malignancy.
Introduction
Fine needle aspiration biopsy (FNAB) is a minimally invasive, cost-effective, rapid, and safe technique to evaluate palpable lymph nodal swellings including both palpable and non-palpable enlarged nodes. The Sydney system for reporting lymph node cytopathology was proposed to develop a standardised format for reporting lymph node aspirates and for better communication between the clinician and the pathologist. In addition, each category of the Sydney system was linked to its risk of malignancy and management recommendations [1].
The categories of the Sydney system include I/L1: non-diagnostic; II/L2: benign; III/L3: atypical cells of undetermined significance or atypical lymphoid cells of uncertain significance; IV/L4: suspicious of malignancy; and V/L5: malignant [1]. Following its development, a large number of studies have been done to predict the usefulness of the Sydney system in the diagnosis of lymph node aspirates [2‒10]. However, a precise estimate of sensitivity (SN) and specificity, which is a systematic review and meta-analysis of the diagnostic accuracy of the Sydney system, in the diagnosis of lymph node aspirates is lacking.
The objective of this study was to estimate the diagnostic accuracy of FNAB in diagnosing malignancy in patients with nodal swelling. The secondary objective was the evaluation of the inadequacy rate of FNAB in nodal aspirates.
Materials and Methods
This systematic review with meta-analysis was registered in the Open Science Foundation on August 4, 2023. The review protocol is available at https://osf.io/4aq2k. The studied population consisted of any patient presenting to the hospital with symptomatic or radiologically detected lymph nodal swelling. FNAB for a primary nodal swelling reported by the Sydney system was considered the index test. Histopathological examination by either core needle biopsy or incisional/excisional biopsy was considered the reference standard. However, adequate clinicoradiological follow-up was also included for the benign aspirates. The outcome measures evaluated were SN and specificity in diagnosing malignancy per lesion in adequate nodal aspirates for each category.
The malignant category is comprised of Hodgkin’s lymphoma, non-Hodgkin’s lymphoma, and metastatic carcinoma. The proportion of inadequate aspirates in each study was the outcome variable for the secondary objective. There were three possible decision cut-offs to decide if an FNAB reported by the Sydney system was positive for malignancy: (i) “atypical” or any higher risk category was considered a positive test result for malignancy, (ii) “suspicious of malignancy” or higher may be considered positive for malignancy, and (iii) “malignant” was considered positive.
The different SN and specificity of each of these diagnostic cut-offs were assessed separately. The accuracy of the procedure was also evaluated using the area under the curve (AUC).
PubMed and Embase were searched for studies published from 2020 to August 4, 2023, which was the date of the final search. The search included studies from 2020 onward because the first proposal for the Sydney system was published online in July 2020 [1]. The authors were contacted if full texts could not be retrieved or for any supplementary information.
The search strategy was to find articles at the intersection of four main concepts: lymph node (the site of the lesion), FNAB (the index test), diagnostic accuracy (the type of study), and 2020–2023 (the period of interest). These four concepts were joined by the Boolean “AND” operator. Related search terms within each main concept were joined using the Boolean “OR” operator and used to expand the search. For example, PubMed search was “(lymph node) AND (“fine needle aspiration biopsy” OR FNAB) OR (International system OR Sydney system).” The search results were imported into the Rayyan online software and screened blindly and independently by two investigators. Any differences in the screen results were resolved by consensus among the other investigators. Only journal articles were included. The full text of potentially eligible articles was then reviewed. Studies evaluating lymph node FNAB using the Sydney system in patients presenting in a hospital with a nodal swelling were included. The selection of studies was carried out independently by two authors. In case of conflict between the two independent reviewers, consensus was reached by discussion with the other authors, with at least three out of the total of five authors needing to agree. The relevant data were extracted from the studies by a single investigator and checked by a second investigator.
Statistical analysis was performed using the Stata (version 13) and RevMan 5.4. The selected articles were assessed for the risk of bias by the QUADAS-2 tool for each study, and conflicts were resolved through discussion. The direction of bias was deduced. For flow and timing, the risk of bias for overestimating accuracy was determined. The meta-analysis for SN and specificity for each cut-off, that is, “atypical considered positive,” “suspicious of malignancy considered positive,” and “malignant considered positive” for the lesions, was carried out after excluding the inadequate samples in each study. To estimate the pooled ROM of each category, a random single proportion model was applied, and the between-study heterogeneity (τ2) was estimated using the maximum likelihood method. To assess the diagnostic accuracy, summary receiver operating characteristic curves were constructed, and the diagnostic odds ratio (DOR) was pooled in both scenarios. The summary receiver operating characteristic curves, their summary points (false positive [FP] rate on the horizontal axis and SN on the vertical axis), and the pooled AUC value were estimated using the summary ROC model. The DORs were calculated from true positive, true negative, FP, and false negative per study and were meta-analysed by applying a random-effects model using the inverse variance method and the restricted maximum likelihood estimator for τ2. Heterogeneity was evaluated using I2 statistic. We used funnel plots to evaluate potential publication bias. The strength and consistency of the diagnostic accuracy estimates were assessed by a likelihood ratio scatter matrix. A two-tailed p < 0.05 was considered statistically significant.
Results
The PRISMA diagram for the study selection is presented in Figure 1. Nine studies were identified and the distribution of the FNAB categories along with their risk of malignancy and inadequacy rates are tabulated in Table 1. The true positive, true negative, FP, and false negative for each of the studies for analysis that is “atypical considered positive,” “suspicious considered positive,” and “malignant considered positive” are presented in online supplementary Table 1 (for all online suppl. material, see https://doi.org/10.1159/000535797). It was attempted to separate the cases of lymphoma from the metastatic cancers; however, it was not possible for all studies due to the construct of the studies evaluated.
Distribution of the FNAB categories along with their risk of malignancy and inadequacy rate
Author . | Year . | Inadequate . | Benign . | Atypical . | Suspicious . | Malignant . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
% of cases . | ROM, % . | % of cases . | ROM, % . | % of cases . | ROM, % . | % of cases . | ROM, % . | % of cases . | ROM, % . | ||
Makarenko, 2022 [2] | 2021 | 6.9 | 58.3 | 31.2 | 6.4 | 14.9 | 69.2 | 8.6 | 96.7 | 38.4 | 99.3 |
Gupta, 2021 [3] | 2021 | 4.1 | 27.5 | 48.6 | 11.5 | 0.5 | 66.7 | 1.4 | 88 | 45.4 | 99.6 |
Vigliar, 2021 [4] | 2021 | 6.7 | 50 | 34.7 | 1.9 | 8.3 | 58.3 | 4.3 | 100 | 46 | 100 |
Caputo, 2022 [5] | 2021 | 0.6 | 66.7 | 49.1 | 9.4 | 1.6 | 28.6 | 3.9 | 100 | 44.8 | 99.8 |
Ahuja, 2022 [6] | 2021 | 4.4 | 9.1 | 40.5 | 1.5 | 0.8 | 37.5 | 22.8 | 96.9 | 31.5 | 98.2 |
Uzun, 2022 [7] | 2022 | 4.8 | 16.6 | 56.2 | 0.7 | 7.1 | 88.8 | 9.5 | 100 | 22.4 | 100 |
Kanhe, 2023 [8] | 2022 | 6 | 0 | 80.3 | 0.2 | 0.4 | 100 | 0.9 | 92.3 | 12.3 | 100 |
Juanita, 2023 [9] | 2023 | 3.7 | 66.7 | 20.1 | 15.6 | 8.2 | 76.9 | 10.7 | 94 | 57.2 | 98.9 |
Balasubramanian, 2023 [10] | 2023 | 10.6 | 26.3 | 54.6 | 7.2 | 4 | 76.9 | 3.7 | 82.3 | 27 | 100 |
Author . | Year . | Inadequate . | Benign . | Atypical . | Suspicious . | Malignant . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
% of cases . | ROM, % . | % of cases . | ROM, % . | % of cases . | ROM, % . | % of cases . | ROM, % . | % of cases . | ROM, % . | ||
Makarenko, 2022 [2] | 2021 | 6.9 | 58.3 | 31.2 | 6.4 | 14.9 | 69.2 | 8.6 | 96.7 | 38.4 | 99.3 |
Gupta, 2021 [3] | 2021 | 4.1 | 27.5 | 48.6 | 11.5 | 0.5 | 66.7 | 1.4 | 88 | 45.4 | 99.6 |
Vigliar, 2021 [4] | 2021 | 6.7 | 50 | 34.7 | 1.9 | 8.3 | 58.3 | 4.3 | 100 | 46 | 100 |
Caputo, 2022 [5] | 2021 | 0.6 | 66.7 | 49.1 | 9.4 | 1.6 | 28.6 | 3.9 | 100 | 44.8 | 99.8 |
Ahuja, 2022 [6] | 2021 | 4.4 | 9.1 | 40.5 | 1.5 | 0.8 | 37.5 | 22.8 | 96.9 | 31.5 | 98.2 |
Uzun, 2022 [7] | 2022 | 4.8 | 16.6 | 56.2 | 0.7 | 7.1 | 88.8 | 9.5 | 100 | 22.4 | 100 |
Kanhe, 2023 [8] | 2022 | 6 | 0 | 80.3 | 0.2 | 0.4 | 100 | 0.9 | 92.3 | 12.3 | 100 |
Juanita, 2023 [9] | 2023 | 3.7 | 66.7 | 20.1 | 15.6 | 8.2 | 76.9 | 10.7 | 94 | 57.2 | 98.9 |
Balasubramanian, 2023 [10] | 2023 | 10.6 | 26.3 | 54.6 | 7.2 | 4 | 76.9 | 3.7 | 82.3 | 27 | 100 |
The risk of bias assessed for each study is given in Figure 2a, and the summary of the risk of bias assessment across all included studies is given in Figure 2b. The risk of bias in patient selection was unclear in 5 out of 9 studies. The risk of index test bias was unclear in 6 out of 9 studies. The risk of patient applicability, index test applicability, and reference test applicability were low in all studies. All studies did not report whether the FNAB was reported before the histopathological examination, and a risk of reference test bias category of unclear was given in all such cases. 2 out of 9 studies had a high risk of flow bias.
a Traffic light plot for the risk of bias of the included studies. b Summary of the risk of bias across all included studies.
a Traffic light plot for the risk of bias of the included studies. b Summary of the risk of bias across all included studies.
Table 2 shows the pooled ROM associated with each category of the Sydney system. The “insufficient,” “benign,” “atypical,” “suspicious,” and “malignant” categories were associated with a pooled ROM of 34% (95% confidence interval [95% CI], 22–45%), 5% (95% CI, 2–8%), 65% (95% CI, 52–78%), 93% (95% CI, 89–98%), and 100% (95% CI, 99–100%), respectively. Heterogeneity was higher in the “insufficient” and “atypical” categories compared with the other categories.
Pooled ROM associated with each category
Categories . | Number of studies pooled . | ROM, % . | 95% CI, % . | τ2 . | τ . | I2, % . |
---|---|---|---|---|---|---|
Insufficient | 9 | 34 | 22–45 | 0.02 | 0.141 | 83.56 |
Benign | 9 | 5 | 2–8 | 0 | 0 | 98.43 |
Atypical | 9 | 65 | 52–78 | 0.03 | 0.173 | 80.53 |
Suspicious | 9 | 93 | 89–98 | 0 | 0 | 53.06 |
Malignant | 9 | 100 | 99–100 | 0 | 0 | 45.74 |
Categories . | Number of studies pooled . | ROM, % . | 95% CI, % . | τ2 . | τ . | I2, % . |
---|---|---|---|---|---|---|
Insufficient | 9 | 34 | 22–45 | 0.02 | 0.141 | 83.56 |
Benign | 9 | 5 | 2–8 | 0 | 0 | 98.43 |
Atypical | 9 | 65 | 52–78 | 0.03 | 0.173 | 80.53 |
Suspicious | 9 | 93 | 89–98 | 0 | 0 | 53.06 |
Malignant | 9 | 100 | 99–100 | 0 | 0 | 45.74 |
The forest plots for SN and specificity and bivariate SN-specificity (SROC) plots for the analysis considering “atypical and higher risk categories” as positive for malignancy are depicted in Figures 3a and 4a, respectively. The SN was 97% (95% CI, 95–99%), and the specificity was 96% (95% CI, 91–98%). The pooled AUC was 99%, indicating excellent diagnostic accuracy (Fig. 5a, Table 3).
a Forest plot of SN and specificity for the threshold where the “atypical” category was considered positive for malignancy. b Forest plot of SN and specificity for the threshold where the “suspicious of malignancy” category was considered positive for malignancy. c Forest plot of SN and specificity for the threshold where the “malignant” category was considered positive for malignancy.
a Forest plot of SN and specificity for the threshold where the “atypical” category was considered positive for malignancy. b Forest plot of SN and specificity for the threshold where the “suspicious of malignancy” category was considered positive for malignancy. c Forest plot of SN and specificity for the threshold where the “malignant” category was considered positive for malignancy.
a Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when the “atypical” category was considered positive for malignancy. b Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when the “suspicious of malignancy” category was considered positive for malignancy. c Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when the “malignant” category was considered positive for malignancy.
a Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when the “atypical” category was considered positive for malignancy. b Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when the “suspicious of malignancy” category was considered positive for malignancy. c Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when the “malignant” category was considered positive for malignancy.
a Forest plot for the AUC with the results of fixed and random-effects meta-analysis for the accuracy when the “atypical” category was considered positive for malignancy. b Forest plot for the AUC with the results of fixed and random-effects meta-analysis for the accuracy when the “suspicious of malignancy” category was considered positive for malignancy. c Forest plot for the AUC with the results of fixed and random-effects meta-analysis for the accuracy when the “malignant” category was considered positive for malignancy.
a Forest plot for the AUC with the results of fixed and random-effects meta-analysis for the accuracy when the “atypical” category was considered positive for malignancy. b Forest plot for the AUC with the results of fixed and random-effects meta-analysis for the accuracy when the “suspicious of malignancy” category was considered positive for malignancy. c Forest plot for the AUC with the results of fixed and random-effects meta-analysis for the accuracy when the “malignant” category was considered positive for malignancy.
Meta-analysis of AUC with the results of meta-analysis for the accuracy when “atypical” category and higher risk categories was considered positive for malignancy
Study . | AUC . | Standard error . | 95% CI . | z . | p value . |
---|---|---|---|---|---|
Makarenko, 2022 [2] | 0.91 | 0.0179 | 0.875–0.945 | ||
Gupta, 2021 [3] | 0.92 | 0.0128 | 0.895–0.945 | ||
Vigliar, 2021 [4] | 0.97 | 0.0128 | 0.945–0.995 | ||
Caputo, 2022 [5] | 0.95 | 0.0102 | 0.930–0.970 | ||
Ahuja, 2022 [6] | 0.96 | 0.0102 | 0.940–0.980 | ||
Uzun, 2022 [7] | 0.99 | 0.00765 | 0.975–1.000 | ||
Kanhe, 2023 [8] | 0.99 | 0.00765 | 0.975–1.000 | ||
Juanita, 2023 [9] | 0.9 | 0.0255 | 0.850–0.950 | ||
Balasubramanian, 2023 [10] | 0.94 | 0.0179 | 0.905–0.975 | ||
Total (fixed effects) | 0.966 | 0.00369 | 0.959–0.973 | 261.784 | <0.001 |
Total (random effects) | 0.952 | 0.00998 | 0.932–0.972 | 95.34 | <0.001 |
Study . | AUC . | Standard error . | 95% CI . | z . | p value . |
---|---|---|---|---|---|
Makarenko, 2022 [2] | 0.91 | 0.0179 | 0.875–0.945 | ||
Gupta, 2021 [3] | 0.92 | 0.0128 | 0.895–0.945 | ||
Vigliar, 2021 [4] | 0.97 | 0.0128 | 0.945–0.995 | ||
Caputo, 2022 [5] | 0.95 | 0.0102 | 0.930–0.970 | ||
Ahuja, 2022 [6] | 0.96 | 0.0102 | 0.940–0.980 | ||
Uzun, 2022 [7] | 0.99 | 0.00765 | 0.975–1.000 | ||
Kanhe, 2023 [8] | 0.99 | 0.00765 | 0.975–1.000 | ||
Juanita, 2023 [9] | 0.9 | 0.0255 | 0.850–0.950 | ||
Balasubramanian, 2023 [10] | 0.94 | 0.0179 | 0.905–0.975 | ||
Total (fixed effects) | 0.966 | 0.00369 | 0.959–0.973 | 261.784 | <0.001 |
Total (random effects) | 0.952 | 0.00998 | 0.932–0.972 | 95.34 | <0.001 |
The forest plots for SN and specificity and bivariate SN-specificity (SROC) plots for the analysis considering “suspicious of malignancy and higher risk categories” as positive for malignancy are depicted in Figures 3b and 4b, respectively. The SN was 91% (95% CI, 85–95%) and the specificity was 99% (95% CI, 97–100%). The pooled AUC was 99%, indicating excellent diagnostic accuracy (Fig. 5b, Table 4).
Meta-analysis of AUC with the results of meta-analysis for the accuracy when “suspicious of malignancy” and higher risk category was considered positive for malignancy
Study . | ROC area . | Standard error . | 95% CI . | z . | p value . |
---|---|---|---|---|---|
Makarenko, 2022 [2] | 0.89 | 0.0179 | 0.855–0.925 | ||
Gupta, 2021 [3] | 0.91 | 0.0128 | 0.885–0.935 | ||
Vigliar, 2021 [4] | 0.97 | 0.0128 | 0.945–0.995 | ||
Caputo, 2022 [5] | 0.98 | 0.0051 | 0.970–0.990 | ||
Ahuja, 2022 [6] | 0.97 | 0.00765 | 0.955–0.985 | ||
Uzun, 2022 [7] | 0.91 | 0.0153 | 0.880–0.940 | ||
Kanhe, 2023 [8] | 0.97 | 0.0102 | 0.950–0.990 | ||
Juanita, 2023 [9] | 0.91 | 0.0255 | 0.860–0.960 | ||
Balasubramanian, 2023 [10] | 0.91 | 0.0179 | 0.875–0.945 | ||
Total (fixed effects) | 0.961 | 0.00334 | 0.955–0.968 | 287.551 | <0.001 |
Total (random effects) | 0.939 | 0.0114 | 0.917–0.961 | 82.119 | <0.001 |
Study . | ROC area . | Standard error . | 95% CI . | z . | p value . |
---|---|---|---|---|---|
Makarenko, 2022 [2] | 0.89 | 0.0179 | 0.855–0.925 | ||
Gupta, 2021 [3] | 0.91 | 0.0128 | 0.885–0.935 | ||
Vigliar, 2021 [4] | 0.97 | 0.0128 | 0.945–0.995 | ||
Caputo, 2022 [5] | 0.98 | 0.0051 | 0.970–0.990 | ||
Ahuja, 2022 [6] | 0.97 | 0.00765 | 0.955–0.985 | ||
Uzun, 2022 [7] | 0.91 | 0.0153 | 0.880–0.940 | ||
Kanhe, 2023 [8] | 0.97 | 0.0102 | 0.950–0.990 | ||
Juanita, 2023 [9] | 0.91 | 0.0255 | 0.860–0.960 | ||
Balasubramanian, 2023 [10] | 0.91 | 0.0179 | 0.875–0.945 | ||
Total (fixed effects) | 0.961 | 0.00334 | 0.955–0.968 | 287.551 | <0.001 |
Total (random effects) | 0.939 | 0.0114 | 0.917–0.961 | 82.119 | <0.001 |
The forest plots for SN and specificity and bivariate SN-specificity (SROC) plots for the analysis considering “malignant” as positive for malignancy are depicted in Figures 3c and 4c, respectively. The SN was 75% (95% CI, 65–84%) and the specificity was 100% (95% CI, 99–100%). The pooled AUC was 100%, indicating excellent diagnostic accuracy (Fig. 5c; Table 5).
Meta-analysis of AUC with the results of meta-analysis for the accuracy when only “malignant” was considered positive for malignancy
Study . | ROC area . | Standard error . | 95% CI . | z . | p value . |
---|---|---|---|---|---|
Makarenko, 2022 [2] | 0.82 | 0.023 | 0.775–0.865 | ||
Gupta, 2021 [3] | 0.88 | 0.0128 | 0.855–0.905 | ||
Vigliar, 2021 [4] | 0.93 | 0.0179 | 0.895–0.965 | ||
Caputo, 2022 [5] | 0.95 | 0.0102 | 0.930–0.970 | ||
Ahuja, 2022 [6] | 0.72 | 0.0255 | 0.670–0.770 | ||
Uzun, 2022 [7] | 0.79 | 0.0204 | 0.750–0.830 | ||
Kanhe, 2023 [8] | 0.95 | 0.0102 | 0.930–0.970 | ||
Juanita, 2023 [9] | 0.85 | 0.0306 | 0.790–0.910 | ||
Balasubramanian, 2023 [10] | 0.86 | 0.023 | 0.815–0.905 | ||
Total (fixed effects) | 0.903 | 0.00518 | 0.893–0.913 | 174.363 | <0.001 |
Total (random effects) | 0.863 | 0.0243 | 0.816–0.911 | 35.535 | <0.001 |
Study . | ROC area . | Standard error . | 95% CI . | z . | p value . |
---|---|---|---|---|---|
Makarenko, 2022 [2] | 0.82 | 0.023 | 0.775–0.865 | ||
Gupta, 2021 [3] | 0.88 | 0.0128 | 0.855–0.905 | ||
Vigliar, 2021 [4] | 0.93 | 0.0179 | 0.895–0.965 | ||
Caputo, 2022 [5] | 0.95 | 0.0102 | 0.930–0.970 | ||
Ahuja, 2022 [6] | 0.72 | 0.0255 | 0.670–0.770 | ||
Uzun, 2022 [7] | 0.79 | 0.0204 | 0.750–0.830 | ||
Kanhe, 2023 [8] | 0.95 | 0.0102 | 0.930–0.970 | ||
Juanita, 2023 [9] | 0.85 | 0.0306 | 0.790–0.910 | ||
Balasubramanian, 2023 [10] | 0.86 | 0.023 | 0.815–0.905 | ||
Total (fixed effects) | 0.903 | 0.00518 | 0.893–0.913 | 174.363 | <0.001 |
Total (random effects) | 0.863 | 0.0243 | 0.816–0.911 | 35.535 | <0.001 |
The pooled DOR for “atypical and higher risk categories” considered positive was 777.47 (95% CI, 263.23–2296.32), also indicating a high level of diagnostic accuracy (Fig. 6a). The pooled DOR for “suspicious and higher risk categories” considered positive was 1,064.94 (95% CI, 337.31–3,362.21), also indicating a high level of diagnostic accuracy (Fig. 6b). The pooled DOR for the “malignant” considered positive was 823.72 (95% CI, 249.63–2718.02), also indicating a high level of diagnostic accuracy (Fig. 6c).
a DOR forest plot for detecting malignancy when the “atypical” category was considered positive for malignancy. b DOR forest plot for detecting malignancy when the “suspicious of malignancy” category was considered positive for malignancy. c DOR forest plot for detecting malignancy when the “malignant” category was considered positive for malignancy.
a DOR forest plot for detecting malignancy when the “atypical” category was considered positive for malignancy. b DOR forest plot for detecting malignancy when the “suspicious of malignancy” category was considered positive for malignancy. c DOR forest plot for detecting malignancy when the “malignant” category was considered positive for malignancy.
The likelihood scatter matrix graph suggests that there is substantial evidence that the cut-off “atypical and higher categories considered positive” is useful in ruling in and out malignancy as the negative likelihood ratio was less than 0.1, while the positive likelihood ratio was greater than 10 (Fig. 7a). The likelihood scatter matrix graph suggests that there is moderate evidence that the cut-off “suspicious of malignancy and higher risk category considered positive” is useful in ruling in but not ruling out malignancy (Fig. 7b). The likelihood scatter matrix graph suggests that there is moderate evidence that the cut-off “only malignant considered positive” is useful in ruling in but not ruling out malignancy as it has a positive likelihood ratio greater than 100 (Fig. 7c). Lastly, a funnel plot was constructed for “atypical and higher categories,” “suspicious of malignancy and higher risk category,” and “malignant” considered positive, which did not reveal the presence of publication bias (p = 0.65, p = 0.72, and p = 0.48, respectively) (Fig. 8).
Likelihood ratio scatter matrices for the different diagnostic thresholds. The points represent the studies and are numbered according to the order given in Table 1. The red diamond with lines represents the meta-analytic estimate of the likelihood ratios with confidence intervals. a Likelihood ratio scatter matrices for detecting malignancy when the “atypical” category was considered positive for malignancy. b Likelihood ratio scatter matrices for detecting malignancy when the “suspicious of malignancy” category was considered positive for malignancy. c Likelihood ratio scatter matrices for detecting malignancy when the “malignant” category was considered positive for malignancy.
Likelihood ratio scatter matrices for the different diagnostic thresholds. The points represent the studies and are numbered according to the order given in Table 1. The red diamond with lines represents the meta-analytic estimate of the likelihood ratios with confidence intervals. a Likelihood ratio scatter matrices for detecting malignancy when the “atypical” category was considered positive for malignancy. b Likelihood ratio scatter matrices for detecting malignancy when the “suspicious of malignancy” category was considered positive for malignancy. c Likelihood ratio scatter matrices for detecting malignancy when the “malignant” category was considered positive for malignancy.
Funnel plot to evaluate the publication bias in the study for different thresholds. a “Atypical” category was considered positive for malignancy. b “Suspicious of malignancy” category was considered positive for malignancy. c “Malignant” category was considered positive for malignancy.
Funnel plot to evaluate the publication bias in the study for different thresholds. a “Atypical” category was considered positive for malignancy. b “Suspicious of malignancy” category was considered positive for malignancy. c “Malignant” category was considered positive for malignancy.
Discussion
To our knowledge, the present study is the first systematic review and meta-analysis regarding the utility of the Sydney system in reporting lymph node cytopathology. Previously, similar work has been done on the IAC Yokohama system and the Bethesda System for Reporting Thyroid Cytopathology [11‒13]. The pooled risk of malignancy was calculated for each category of the Sydney system utilising the data from the 9 published studies. The “insufficient,” “benign,” “atypical,” “suspicious,” and “malignant” categories were associated with a pooled ROM of 34%, 5%, 65%, 93%, and 100%, respectively. Heterogeneity was higher in the “insufficient” and “atypical” categories compared with the other categories. The reasons attributed to inadequate cases were predominantly deep-seated difficult-to-aspirate nodes. The time for follow-up for evaluation of the benign lesions ranged from 6 months to 1 year for different studies.
The present study demonstrated the role of FNAB for both ruling in and ruling out malignancy through different thresholds. The cut-off of “atypical and higher categories” has a SN of 97% with a lower 95% CI of 95% and a specificity of 96%. The negative likelihood ratio was very low; thus, this threshold is useful in both ruling in and ruling out malignancy.
The threshold of “suspicious of malignancy and higher categories” had a SN of 91% with a lower 95% confidence interval of 85% and a high specificity of 99% with a lower 95% CI of 97%. The cut-off of “malignant” had a low SN of 75% with a lower 95% CI of 65% and a very high specificity of 100%. Thus, both these thresholds had a high positive likelihood ratio, thereby ruling in the diagnosis of malignancy. This is supported by the rule of thumb of SNOUT and SPIN, whereby high SN and specificity help in ruling out and ruling in a disease, respectively [14].
The statistical analysis for assessment of SN, specificity, and diagnostic accuracy should be done after excluding the cases in the insufficient category, as they cannot be included in either negative or positive for malignancy. However, two out of the nine studies have included insufficient cases in their statistical analysis [3, 5].
The Sydney system recommends the utility of rapid on-site evaluation (ROSE) to assess the cellularity and adequacy of the sample. ROSE helps in reducing the inadequacy rate and also allows the division of the sample for ancillary studies like flow cytometry and cell block, if needed. While seven of the nine studies used ROSE to assess the adequacy of the sample, none of them performed any separate analysis to evaluate the utility of ROSE in lymph node fine needle aspirates [2, 4‒6, 8‒10]. The Sydney system proposes two diagnostic levels. The first level involves the categorisation of the various lymph node aspirates into the five diagnostic categories, while the second level involves the application of ancillary techniques like flow cytometry and immunohistochemistry to arrive at a final specific diagnosis. All studies except Juanita et al. [9] performed ancillary techniques for further definite categorisation of the nodal aspirates.
One of the problems of the Sydney system is its reproducibility in different labs. A recent study by Caputo et al. [15] evaluated the digital examination of lymph node cytopathology using the Sydney system. The overall interobserver concordance was moderate (Fleiss κ = 0.476). Category‐wise analysis revealed higher agreement for the inadequate, benign, and malignant categories and much less agreement for the atypical and suspicious categories (p < 0.0001 for all). The lower agreement for atypical and suspicious categories was expected because these cases represent a grey zone in which the diagnostic confidence is insufficient to render a clear‐cut diagnosis of benign or malignant categories.
The present systematic review and meta-analysis had a few limitations. Firstly, all the studies included were retrospective. Secondly, high levels of heterogeneity were observed in the assessment of the pooled ROM, especially for the insufficient, benign, and atypical categories. This meta-analysis highlights the accuracy of the Sydney system in reporting lymph node aspirates. It exhibits the significance of the “suspicious” and “malignant” categories in diagnosing malignancy and of the “benign” category in excluding malignancy.
Statement of Ethics
An ethics statement is not applicable because this study is based exclusively on the published literature.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
The authors received no funding for this study.
Author Contributions
Sana Ahuja contributed to writing of the protocol, assessment of the risk of bias, and primary writing of the manuscript; is an independent assessor in the screening of studies; and reviewed the extracted data. Adil Aziz Khan is an independent assessor in the screening of studies, checked the extracted data, checked and reviewed the risk of bias assessment, and critically reviewed the manuscript. Rhea Ahuja contributed to literature search and statistics and critically reviewed the manuscript. Pragun Ahuja contributed to literature search, helped to achieve consensus in case of conflict in screening, and critically reviewed the manuscript. Sufian Zaheer involved in study ideation, checked the extracted data, checked and reviewed the risk of bias assessment, and critically reviewed the manuscript.
Data Availability Statement
All data generated in this study are included in this article and its supplementary files. Further enquiries can be directed to the corresponding author.