Objectives: The primary objective is to determine the accuracy of fine-needle aspiration biopsy (FNAB) in breast lesions reported according to the International Academy of Cytology (IAC) Yokohama system for reporting breast FNAB. The participants include any patient presenting with any breast lesion found suitable for FNAB. The target condition was breast cancer. The secondary objective was to study the proportion of inadequate FNAB in the selected studies. Methods: PubMed/MEDLINE and Embase were searched for studies having all the following key search terms: Breast AND FNAB AND Diagnostic Accuracy published in the time frame of 2017 to May 16, 2022. The Cochrane and PROSPERO databases, citations of selected articles and articles citing the selected articles were also searched. Studies assessing the diagnostic accuracy of breast FNAB in diagnosing breast cancer, which had at least 75 subjects (and at least 20 subjects each in the benign and malignant FNAB groups), were selected. The reference standard was histopathology (or adequate clinical follow-up for benign disease). Studies were screened independently by two researchers, with a consensus reached among the authors in cases of conflict. The risk of bias and applicability were assessed using the QUADAS-2 tool. Sensitivity and specificity at each diagnostic cut-off were assessed by bivariate generalized linear mixed-model meta-analysis. The area under the receiver operating characteristics curve (AUC) and inadequacy rate were assessed by random-effects meta-analysis. The confidence intervals of sensitivity, specificity, and AUC were examined against a value of 0.95. Results: Twenty-two studies, all of which were cross-sectional single-gate studies, were selected with a total of 10,886 subjects with a primary breast lesion having concurrent FNAB and reference standard reports. Sensitivity and specificity, with 95% confidence intervals, were 0.978 [0.968, 0.985] and 0.832 [0.76, 0.886] for the diagnostic cut-off of “Atypical considered positive for malignancy,” 0.916 [0.892, 0.935] and 0.983 [0.97, 0.99] for the cut-off of “Suspicious of Malignancy considered positive,” and 0.763 [0.706, 0.812] and 0.999 [0.994, 1] for the cut-off of “Malignant considered positive.” The overall AUC was 0.975 [0.962, 0.984]. FNAB sampling without imaging guidance was associated with lower inadequacy. Discussion: There is strong evidence that the overall accuracy, sensitivity for “Atypical category considered positive” and specificity when “Suspicious or Malignant categories are considered positive” of FNAB are high when using the categories of the IAC Yokohama Reporting System, demonstrating the usefulness of FNAB in diagnosing breast cancer.

The International Academy of Cytology (IAC) Yokohama system for reporting breast fine-needle aspiration biopsy (FNAB) [1] has rapidly gained worldwide acceptance in reporting breast FNAB. This IAC Yokohama System (IAC System) proposes a standardized format for reporting and includes a cytopathological category which is linked to a risk of malignancy (ROM). The IAC Yokohama system is an internationally authored development of several previous recommendations from the National Mammographic Breast Cancer Screening Program in Australia and the National Cancer Institute (NCI) of the USA After the IAC System was published, it was quickly followed by multiple studies using the system that have demonstrated the usefulness of this system in diagnosing breast lesions among a cross section of the population [2‒23], as well as, in specialized subsets of patients including males and patients having suspicious radiological breast lesions [24, 25]. However, precise estimates of the sensitivity and specificity that collate the data, that is, a systematic review and meta-analysis of the diagnostic accuracy of breast FNAB using the IAC System, have not been performed. As a result, high-level evidence that breast FNAB meets strict criteria for accuracy, such as whether the lower confidence interval for specificity of a malignant FNAB diagnosis is greater than 99%, is missing.

We decided to undertake a systematic review and meta-analysis of the diagnostic accuracy of the IAC System. The primary research question was to estimate the accuracy of FNAB of the breast reported using the IAC System in diagnosing breast malignancies in patients with a breast lesion. The secondary question was to assess the inadequacy rate of FNAB in breast lesions.

Acceptable sensitivity/specificity was defined as a sensitivity/specificity of 0.95 for the “Atypical” and “Suspicious of Malignancy” cut-offs; the IAC System has three cut-offs: “Atypical,” “Suspicious of Malignancy,” and “Malignant.” A value of 0.95 for sensitivity was taken because a test with a sensitivity of 0.95 will have a negative likelihood ratio of 0.1 even if the specificity was at a noninformative value of 0.5. A value of 0.95 for specificity was taken because a test with a specificity of 0.95 will have a positive likelihood ratio of 10 even if the sensitivity was 0.5. The values of 0.1 for sensitivity and 10 for specificity were thought to represent large increases in the posttest probability in accordance with published studies [26, 27]. For the “Malignant” cut-off, a specificity of 0.99 was considered acceptable according to NCI guidelines [28].

We also decided to compare the accuracy of reporting FNAB by the IAC System against the accuracy of breast FNAB reported by other 5-tier systems, such as the NCI system [28] and the National Health Service Breast Screening Programme (NHSBSP) system of the UK [29], at a comparatively late stage of the systematic review. The overall quality of the studies would also be assessed by the QUADAS-2 tool [30].

This systematic review with meta-analysis was registered in the Open Science Foundation on May 16, 2022. The review protocol is available at https://osf.io/puev5.

The studied population consisted of any patient presenting to the hospital or care center with a symptomatic or screen-detected breast lesion. Prior testing was not commented upon in the protocol due to the high variability in breast patients. FNAB for a primary breast lesion reported by the IAC System was considered the index test. The reference standard was preferably histopathological examination by any method including core needle biopsy, incisional biopsy, excisional biopsy, lumpectomy or mastectomy, but adequate clinical and radiological follow-up for benign lesions was also acceptable. The outcome measures studied were sensitivity and specificity in diagnosing malignant disease per lesion in adequate cases for each IAC System category cut-off along with the area under receiver operating characteristics (ROC) curve (AUC). Malignant disease included invasive carcinoma, sarcoma, lymphoma, malignant Phyllodes tumor, ductal carcinoma in situ (DCIS), and papillary carcinoma in situ. Borderline Phyllodes tumor, lobular carcinoma in situ (LCIS), and atypical ductal hyperplasia were considered nonmalignant. The proportion of inadequate FNAB reports in each study was the outcome variable for the secondary question.

There are three possible decision cut-offs to decide whether a FNAB reported by the IAC System is positive for malignancy. First, categorization in the “Atypical” or any higher risk category can be considered as a positive test result for malignancy. Alternatively, a category “Suspicious of Malignancy” or higher may be considered positive for malignancy. Finally, only reports categorizing a lesion as “Malignant” can be considered positive. Each of these diagnostic cut-offs will have different sensitivity and specificity, which were estimated separately. Different decision points may be used to rule out or rule in malignant disease Also, the overall accuracy of the procedure needs to be assessed by an appropriate measure like the AUC [31].

For this systematic review, PubMed/MEDLINE and EMBASE were searched for studies published from 2017 to May 16, 2022, which was the date of the final search. The search considered only studies from 2017 onward because the first proposal for the IAC System was published online in November 2016 [32]. The Cochrane Review database and PROSPERO database were also searched. The citations of all included studies and all studies citing the included studies and the base articles for the IAC System, which were obtained through Google Scholar, were also searched for suitable articles. The authors were contacted if screened full texts could not be retrieved or for missing information.

The search strategy was to find articles at the intersection of four main concepts: breast (the site of lesion), FNAB (the index test), diagnostic accuracy (the type of study), and 2017–2022 (the time period of interest). These four concepts were joined by the Boolean “AND” operator. Related search terms within each main concept were joined using the Boolean “OR” operator and used to expand the search. An example PubMed search was “(Breast) AND (“fine needle aspiration biopsy” OR “aspiration needle” OR “fine needle” OR “fine-needle” OR FNA OR FNAB OR FNAC) AND (“diagnostic accuracy” OR “sensitivity and specificity” OR “predictive value” OR “sensitivity” OR “specificity” OR “ppv” OR “npv”) AND [2017–2020]”. The search results were imported into the Rayyan online software and screened blindly and independently by two investigators. Any differences in the screen results were resolved by consensus among the other investigators. Only journal articles were included. The full text of potentially eligible articles was then reviewed.

Studies evaluating breast FNAB using the IAC System in patients presenting in a hospital or tertiary care setting with a breast mass or having the breast lesion detected clinically or on screening were included. Studies dealing with just a nipple discharge were excluded. Studies focusing on just a subgroup of patients, for example, lesions less than 1 cm, male patients, cytopathologically indeterminate patients, or radiologically indeterminate patients, were excluded. Studies evaluating the diagnostic accuracy using five-tier systems other than the IAC System, such as the NCI system or the NHSBSP system, were also excluded from the primary analysis but were considered for an out-of-protocol analysis at a late stage in the study. Studies using a three-tier system or a four-tier system were excluded completely. The minimum sample size for any study to be selected was seventy-five, with a minimum of twenty malignant and twenty benign IAC System categories.

The selection of studies was carried out independently by two authors. In case of conflict between the two independent reviewers, consensus was reached by discussion with the independent reviewers and the other authors, with at least three out of the total of five authors needing to agree. Relevant data were extracted from the studies by a single investigator and checked by a second investigator. When the definition of the target condition did not match the definition of the present meta-analysis, data were extracted and classified to match the present meta-analysis. For example, the study by Wong et al. [7] classified LCIS as an in situ condition, but the cases representing LCIS in that study were categorized under the nonmalignant category for the same reference diagnosis in this meta-analysis. Similarly, two other studies considered borderline phyllodes tumor as a malignant condition [4, 8], but the data were extracted in a way so as to categorize borderlines phyllodes in the nonmalignant group in the present study.

The selected articles were assessed for risk of bias by the QUADAS-2 tool by at least two authors for each study and conflicts resolved through discussion. The direction of bias was deduced. For flow and timing, risk of bias for overestimating accuracy was determined. Studies having a low proportion of histopathological follow-up of “Benign” FNAB may underestimate the specificity since lesions categorized as “Benign” by FNAB with concordant clinical and radiological “Triple test” findings have little clinical need to be confirmed by histopathology. A malignant FNAB has a higher probability of having a follow-up biopsy, for example, for grading and immunohistochemistry testing, and therefore a low proportion of malignant FNAB may represent a genuine loss to follow up with a higher risk of inappropriate bias overestimating accuracy. We paid greater attention to studies showing high risk of bias in overestimating accuracy. Out of the seven items in QUADAS-2, if the risk in three items or more were assessed as unclear or worse, the overall risk was deemed to be high. If two items were assessed as unclear, the overall risk was deemed unclear. If only one or none of the items were assessed to have an unclear rating, the overall bias of that study was assessed to be low. Visualization of the risk of bias was performed by the online implementation of the package “robvis” [33], which can be found at https://mcguinlu.shinyapps.io/robvis/.

The meta-analysis for sensitivity and specificity for each cut-off, that is, “Atypical considered positive,” “Suspicious of Malignancy considered positive,” and “Malignant considered positive” for the lesions was carried out after excluding the inadequate samples in each study. We used the random-effects bivariate method proposed by Chu and Cole [34] and recommended by the Cochrane handbook [35]. This method is implemented by the package “Altmeta” [36] of R statistical software version 4.2 [37] and also the online app MetaDTA [38, 39]. The AUC of the studies individually was estimated by the package “pROC” [40] using the defaults. The meta-analysis of the AUC was accomplished with the command “valmeta” in the package “metamisc” [41]. Forest plots for sensitivity and specificity at each diagnostic cut-off or “decision point” were visualized, as were forest plots for AUC, bivariate sensitivity-specificity plots for each cut-point and overall sROC plots. The diagnostic odds ratio for each study was estimated by the package “meta” and the publication bias was assessed by the Deeks method using the effective sample size [42]. The proportions of inadequate category in each study were assessed by the package “meta” and assessed using a random-effects generalized linear mixed model.

Besides sensitivity and specificity, a value of 0.95 for AUC was decided as representing high accuracy by consensus among the authors. Statistical heterogeneity between studies for diagnostic accuracy at each diagnostic threshold was estimated by the I2 and Cochran Q (with the p value) of the diagnostic odds ratios for each study by the package “meta” using the function “metabin.”

Subgroup analysis was done separately for studies having low, moderate, and high overall risks of bias, as set out in the protocol. During the review process, it was decided to do subgroup analysis for the mode of FNAB sampling, including studies sampled under image guidance only; studies using only FNAB sampling by palpation; studies using both guided and palpation sampling, and studies where the method of FNAB sampling was unclear, and for indicators of bias in individual QUADAS-2 domains which included:

  • Studies categorized according to the proportion of cases categorized as “Benign” by FNAB having histopathological follow-up: studies having less than 20% of lesions categorized as “Benign” on FNAB being tested by histopathology versus studies having greater than 20% of lesions categorized as “Benign” on FNAB being tested by histopathology.

  • Studies categorized according to the proportion of cases categorized as “Malignant” on FNAB having histopathological follow-up: studies having greater than 70 percent histopathology follow-up versus studies having less than 70% follow-up following a “Malignant” FNAB

  • Studies having unclear risk of bias in patient selection

  • Studies with a risk of bias in index test, that is, low risk of bias versus unclear risk of bias.

Subgroup analysis was also done for studies categorized according to the annual case number of breast FNAB as deduced from the study; studies with less than 100 or an unclear number of breast FNAB reported per year versus studies reporting between 100 and 200 breast FNAB per year versus studies reporting more than 200 breast FNAB per year.

The package “meta” was used for the analysis of the proportion of the inadequate category. Meta-regression was performed to study the effect of the method of FNAB sampling, that is, image guided, by palpation, both image guided and by palpation, and unclear, the type of lesion in the study, that is, palpable, palpable and impalpable, and unclear, and the case number of breast FNAB per year, using greater than 200, between 100 and 200, less than 100, or unclear.

Out-of-protocol analysis of the diagnostic accuracy of studies assessing the accuracy of breast FNAB by other five-tier systems was performed by the same methods as the primary analysis. Finally, a leave one out meta-analysis, that is, performing multiple meta-analyses leaving one study out while including all the others, was performed as a sensitivity analysis. The strength and consistency of the diagnostic accuracy estimates were assessed by a likelihood ratio scatter matrix as described by Rubinstein et al. [43]. Finally, Fagan Nomograms were constructed to estimate the post-test probabilities at pretest probabilities of 10% and 50% using the R package “biostatUZH” [44].

The PRISMA diagram for study selection is presented in Figure 1. Twenty-two studies were identified [2‒23], and there details are presented in Table 1. The full text of one screened study, which had a low probability of being included, could not be retrieved even after contacting the corresponding author. The distribution of the FNAB categories with histopathological correlation of these studies is given in the online supplementary Table 1 (see www.karger.com/doi/10.1159/000527346 for all online suppl. material). The 2 × 2 cross table information, that is, the true positive, true negative, false positive, and false negative, for each of these studies for each analysis, that is, the “Atypical considered positive,” “Suspicious of Malignancy considered positive,” and the “Malignant considered positive,” is given in the online supplementary Table 2. A further six studies were identified which reported the diagnostic accuracy of breast FNAB using other 5-tier systems and therefore were excluded from the primary analysis, but were nonetheless included in the out-of-protocol analysis [45‒50].

Table 1.

Characteristics of the studies selected for the systematic review and meta-analysis

 Characteristics of the studies selected for the systematic review and meta-analysis
 Characteristics of the studies selected for the systematic review and meta-analysis
Fig. 1.

PRISMA flow diagram for selection of studies. The author of the record not retrieved was contacted for the manuscript. Additionally, another author for the records screened was contacted for missing data.

Fig. 1.

PRISMA flow diagram for selection of studies. The author of the record not retrieved was contacted for the manuscript. Additionally, another author for the records screened was contacted for missing data.

Close modal

The risk of bias assessed for each study is given in Figure 2, and the summary of the risk of bias assessment across all included studies is given in Figure 2b. The risk of bias in patient selection was unclear in 20 out of 22 studies because information about patients having histopathology assessment but not FNAB was not provided. Fifteen out of 22 studies did not report whether the FNAB was reported before the histopathological examination, and a risk of bias category of unclear was given in all such cases. Most studies also did not report whether the person performing and reporting the reference test, that is, histopathology, had access to the FNAB results or slides. Twenty-one out of 22 studies had a high risk of flow bias in either direction since a large proportion of “Benign” categorized FNAB in these studies did not have histopathological correlation. All of these 21 studies had less than 70% of “Benign” category FNAB followed up by histopathology, and 13 of these studies had less than 50% of “Benign” FNAB followed up by histopathology. We found that a sharp fall in reported specificity occurred in studies in which less than 20% of “Benign” FNAB were also tested by histopathology, suggesting a very high risk of bias in underestimating specificity for these studies (online suppl. Fig. 1). The forest plots for sensitivity, specificity, and bivariate sensitivity-specificity (SROC) plots are given in Figure 3a–c, respectively, for the analysis considering “Atypical and higher risk categories” as positive for malignancy. The corresponding plots considering “Suspicious of Malignancy and higher risk category” as positive for malignancy are given in Figure 3d–f, respectively, and for the analysis considering only the category “Malignant” as positive for malignancy is given in Figure 3g–i, respectively. The complete list of parameters of the meta-analysis for the three diagnostic cut-offs is given in the online supplementary Table 3.

Fig. 2.

a “Traffic light” plot for risk of bias of the included studies. b Summary of risk of bias assessed across all included studies.

Fig. 2.

a “Traffic light” plot for risk of bias of the included studies. b Summary of risk of bias assessed across all included studies.

Close modal
Fig. 3.

a Forest plot of sensitivity for the scenario where the Atypical category was considered positive for malignancy. b Forest plot of specificity for the scenario where the Atypical category was considered positive for malignancy. c Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when Atypical category was considered positive for malignancy. d Forest plot of sensitivity for the scenario where the Suspicious of Malignancy category was considered positive for malignancy. e Forest plot of specificity for the scenario where the Suspicious of Malignancy category was considered positive for malignancy. f Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when Suspicious of Malignancy category was considered positive for malignancy. g Forest plot of sensitivity for the scenario where only the Malignant category was considered positive for malignancy. h Forest plot of specificity for the scenario where only the Malignant category was considered positive for malignancy. i Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when only the Malignant category was considered positive for malignancy.

Fig. 3.

a Forest plot of sensitivity for the scenario where the Atypical category was considered positive for malignancy. b Forest plot of specificity for the scenario where the Atypical category was considered positive for malignancy. c Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when Atypical category was considered positive for malignancy. d Forest plot of sensitivity for the scenario where the Suspicious of Malignancy category was considered positive for malignancy. e Forest plot of specificity for the scenario where the Suspicious of Malignancy category was considered positive for malignancy. f Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when Suspicious of Malignancy category was considered positive for malignancy. g Forest plot of sensitivity for the scenario where only the Malignant category was considered positive for malignancy. h Forest plot of specificity for the scenario where only the Malignant category was considered positive for malignancy. i Bivariate sensitivity versus specificity plot showing the results of the meta-analysis for the accuracy when only the Malignant category was considered positive for malignancy.

Close modal

The forest plot and results for AUC are given in Figure 4. A high 95% lower confidence interval for the overall AUC suggests a high overall accuracy of FNAB using the IAC Yokohama system, considering all categories. For studies not explicitly following the IAC System, but for other systems like the NCI or NHSBSP, the performance is similar (Table 2).

Table 2.

Summary results of the primary meta-analyses and subgroup analyses

 Summary results of the primary meta-analyses and subgroup analyses
 Summary results of the primary meta-analyses and subgroup analyses
Fig. 4.

Forest plot for the AUC with the results of random-effects meta-analysis.

Fig. 4.

Forest plot for the AUC with the results of random-effects meta-analysis.

Close modal

Analysis of the diagnostic odds ratio for heterogeneity revealed marked heterogeneity between the studies for all three diagnostic thresholds (I2 = 75.5%, Cochran Q = 85.84 with 21 degrees of freedom and p value <0.001 for the threshold “Atypical and higher risk categories considered positive”; I2 = 84.0%, Cochran Q = 131.02 with 21 degrees of freedom and p value <0.001 for the threshold “Suspicious of Malignancy and higher risk considered positive”; I2 = 45.1%, Cochran Q = 38.26 with 21 degrees of freedom and p value of 0.01 for the threshold “Malignant considered positive”).

The separate subgroup analysis for diagnostic categories for the accuracy indices for the non-IAC Systems is given in Table 2. The non-IAC systems perform as well as the IAC Yokohama system for all three categories. Subgroup analysis reveals that studies from centers having a low or unclear caseload are likely to have lower sensitivity compared to centers having a high caseload. Studies having a low histopathological follow-up for “Benign” FNAB have a lower specificity compared to other studies, especially at the threshold “Atypical and higher considered positive.” Studies in which FNAB sampling was performed exclusively under image guidance also showed lower specificity compared to other studies, again for the threshold “Atypical and higher considered positive.”

The diagnostic odds ratios for each study at each cut-off analyzed for small study effects and publication bias using the method of Deeks did not show statistically significant bias (p values for Deeks test = 0.16, 0.25 and 0.55 for the cut-offs “Atypical and higher considered positive,” “Suspicious of malignancy and higher considered positive,” and “Malignant considered positive”, respectively).

The meta-regression results for proportion in Table 3 show that studies in which all FNAB were image guided had a higher inadequacy rate compared to those in which both guided and by palpation or only by palpation FNAB were performed. There was a trend toward significant differences between studies where palpable and impalpable lesions were tested against studies where only palpable lesions were tested or the palpability of the lesions were unclear.

Table 3.

Analysis of inadequacy rate with meta-regression studying the effect of method of FNAB, type of lesion, and annual case load on the proportion of inadequate FNAB

 Analysis of inadequacy rate with meta-regression studying the effect of method of FNAB, type of lesion, and annual case load on the proportion of inadequate FNAB
 Analysis of inadequacy rate with meta-regression studying the effect of method of FNAB, type of lesion, and annual case load on the proportion of inadequate FNAB

Besides checking for the effects of subgroups, a leave one out meta-analysis was performed to see whether any single study had an undue influence on the results. The leave one out meta-analysis for AUC is given in online supplementary Table 4. The graphical summaries by the sensitivity-specificity plot for any undue influence on the sensitivity or specificity at the different cut-offs are also given in online supplementary Figure 2a–c. No single study had a marked effect on the confidence intervals of the estimates for any diagnostic accuracy analysis. However, the 95% predictive interval for the AUC and sensitivity for the diagnostic decision cut-off of “Atypical and higher considered positive” was markedly improved if a single study [5] was left out. Leaving out any of two studies [5, 9] markedly improved the 95% predictive interval of sensitivity and specificity at the diagnostic decision cut-off of “Suspicious of Malignancy and higher considered positive”. Both these studies were characterized by a low annual case load of FNAB during the study period.

The likelihood ratio scatter matrices are shown in Figure 5. These graphs suggest that there is substantial evidence that the cut-off “Atypical and higher categories considered positive” is useful in ruling out breast cancer while the cut-offs of “Suspicious of Malignancy and higher considered positive” and “Malignant considered positive” are useful in ruling in breast cancer. Studies whose likelihood ratios were most different from the meta-analytic estimates were excluded in a further sensitivity analysis (online suppl. Table 5) but did not show undue influence on the results. The Fagan Nomograms for calculating the posttest probabilities are given in online supplementary Figure 3.

Fig. 5.

Likelihood ratio scatter matrices for the different diagnostic thresholds. The points represent the studies, and are numbered according to the order given in Table 1. The studies are colored according to the overall risk of bias after assessment by the QUADAS-2 tool (green for low overall risk of bias, light red for high overall risk of bias, gray for unclear overall risk of bias). The black diamond with lines represents the meta-analytic estimate of the likelihood ratios with confidence intervals. Most of the studies for the threshold Atypical considered positive show a negative likelihood ratio less than 0.1, suggesting good performance in ruling out disease in case of a negative result, that is, a benign categorization on FNAB. Most of the studies for the threshold Suspicious of Malignancy and Malignant have a likelihood ratio greater than 10, suggesting good performance in confirming the disease for a Suspicious of Malignancy or Malignant’ categorization on FNAB.

Fig. 5.

Likelihood ratio scatter matrices for the different diagnostic thresholds. The points represent the studies, and are numbered according to the order given in Table 1. The studies are colored according to the overall risk of bias after assessment by the QUADAS-2 tool (green for low overall risk of bias, light red for high overall risk of bias, gray for unclear overall risk of bias). The black diamond with lines represents the meta-analytic estimate of the likelihood ratios with confidence intervals. Most of the studies for the threshold Atypical considered positive show a negative likelihood ratio less than 0.1, suggesting good performance in ruling out disease in case of a negative result, that is, a benign categorization on FNAB. Most of the studies for the threshold Suspicious of Malignancy and Malignant have a likelihood ratio greater than 10, suggesting good performance in confirming the disease for a Suspicious of Malignancy or Malignant’ categorization on FNAB.

Close modal

The primary objective of this study was to demonstrate the accuracy of breast FNAB using the IAC System for reporting. The summary of the meta-analysis with our assessment of the certainty of the estimates is given in Table 4. The present analysis demonstrates that FNAB may be used both to rule out malignant disease and to rule in malignant disease albeit using different thresholds. The decision threshold of “Atypical and higher categories” being considered positive has a sensitivity of almost 98%, with a lower 95% confidence interval of 96.8%. Studies following the NCI or NHSBSP systems also show similar point estimates, though the lower confidence interval is marginally below 95% due to the lower number of studies. There is a consistently high and precise sensitivity at the threshold “Atypical and higher category considered positive” across almost all subgroups, even in studies deemed at a high risk of bias. The negative likelihood ratio was very low, and therefore a diagnostic category of “Benign” is sufficiently accurate to rule out malignancy in lesions of the breast. This is supported as well by the rule of thumb of SNOUT which states that a high sensitivity helps in ruling out a diseased condition [51].

Table 4.

Summary of the analysis with certainty of evidence of the meta-analysis

 Summary of the analysis with certainty of evidence of the meta-analysis
 Summary of the analysis with certainty of evidence of the meta-analysis

A decision cut-off of “Suspicious of malignancy or higher” or “Malignant” have sufficiently high specificity, with a lower 95% confidence interval of 97% and 99.4%, respectively. The point estimates are consistently and precisely satisfied in all subgroups, even in studies adjudged to be at a high risk of bias and in studies underestimating specificity due to a low histopathological follow-up of “Benign” FNAB. The positive likelihood ratios are very high and these categories help in ruling in a diagnosis of cancer. This is supported by using the rule of thumb of SPIN, regarding high specificity rules in a diseased condition [51].

Studies conducted in centers having low case numbers of FNAB per year had lower sensitivity compared to those having a high FNAB load, though their effect on specificity was less. This was especially true for the diagnostic cut-offs “Suspicious of Malignancy and higher category considered positive” and “Malignant considered positive.” These studies are primarily responsible for the heterogeneity in estimates of sensitivity for the “Suspicious of Malignancy” or “Malignant” diagnostic thresholds. Studies with a case load of 100 cases or more per year appear to report similar diagnostic accuracy to centers with a higher case load.

While assessing the risk of bias, many studies were assessed as unclear for patient selection and index test domains of QUADAS-2. Most studies did not provide information about the number of eligible patients who did undergo FNAB, resulting in the risk of bias for patient selection to be rated as “unclear.” Such information may be ascertainable, for example, by checking the number of core needle biopsies or excisions, which were performed without having a concomitant FNAB.

For risk of bias in the index test, only 7 out of 22 studies reported whether the FNAB was performed before or independently of histopathology, with 15 “unclear.” Reporting FNAB before the histopathological examination is common and routine practice; therefore, it is likely that these “Unclear” assessments are a result of inadequate reporting rather than an inadequate research design. Most of the studies did not report whether FNAB results were available while reporting the histopathology, but we did not consider that to result in significant bias for the reference standard since interpretation of all related test reports is a common practice which helps in accurate final diagnosis.

Almost all studies were at risk of bias in patient flow, mostly due to low histopathological follow-up in benign FNAB. However, the bias in such cases underestimates specificity and therefore accuracy. Studies where a diagnosis of a “Malignant” FNAB show a low histopathology follow-up showed little difference in subgroup analysis from the rest of the studies.

It is unlikely, that these studies rated as “Unclear” on patient selection or index test, resulted in actual bias despite the adverse QUADAS-2 rating. In the subgroup analysis performed separately and shown in Table 2, the sensitivity and specificity are very similar to the rest of the studies and concordant with the overall findings. Even the subgroups identified to have a high overall risk of bias performed similarly to the other subgroups. We suggest that the actual diagnostic performance of FNAB is adequately represented even in the studies assessed as “Unclear” in the QUADAS-2 assessment because most of the studies represent actual “real-world” data, being taken from actual FNAB and histopathology results in hospital practice.

Studies found to have a high risk of underestimating accuracy, that is studies that had a <20% histopathological follow-up for “Benign” FNAB and which had a high risk of bias in the “Flow and timing” domain of QUADAS-2, showed substantially lower specificity compared to other studies on subgroup analysis, leading to heterogeneity of reported results. These studies have been conducted in centers with a high annual FNAB case number, suggesting high trust in the FNAB report by treating surgeons and physicians. Additionally, most of these studies involved only image-guided FNAB, where it is probable that only cases with a clinico-radiologically nonconcordant “Benign” FNAB report were sent for histopathological confirmation, which falsely lowered the reported specificity of FNAB. Despite such a bias, the underestimation in reported specificity is considerably lessened when higher thresholds of “Suspicious of Malignancy or higher category” or “Malignant considered positive” are considered.

AUC gives an overall picture of accuracy, and hence is a test for overall effectiveness [52]. It is a form of the c-statistic used to gauge the predictive strength of a test. The overall accuracy as estimated by the AUC is consistently high across all subgroups, regardless of caseload or risk of bias. The estimate is similar for the out-of-protocol analysis for the studies not explicitly following the IAC system, suggesting that pathologists grade the ROM similarly using any five-tier system, regardless of the name.

We found marked heterogeneity in FNAB inadequacy rates between studies. Hospitals using image guided FNAB exclusively seemed to perform the worst in this regard while FNAB by palpation held up quite well. It is possible that smaller, less well defined, unfixed, difficult to sample lesions, which often would not be detectable on palpation, may be selected for image guided FNAB. Also, the character of the breast lesions selected for image guided FNAB may be markedly different, possibly more fibrotic, and more prone to having low cellularity. Breast lesions are also difficult to immobilize when one hand is holding the sampling device and the other the ultrasound probe. Thus, fixation and sampling by a second person other than the radiologist performing the examination may well be advisable, as could performing an additional unguided sample once the detected lesion is found to be palpable.

Expectedly, studies in which FNAB was performed for both palpable and impalpable lesions showed a trend toward a worse inadequacy rate than those dealing with just palpable lesions. Also, the sole study in this meta-analysis reported from a center with a low case load showed a higher proportion of inadequate samples, compared to the studies from centers with a higher FNAB load. However, the overall analytic estimate of inadequacy in this study may be biased since the selection of studies was made keeping diagnostic accuracy in mind. Inadequacy rate needs to be analyzed by a separate review with more appropriate selection questions.

The primary limitation of this meta-analysis is that most of the studies are retrospective, and many of these retrospective studies utilized retrospective recategorization. Only one study is prospective and an additional four were a retrospective analysis of a diagnosis made on a five-tier scale. With a prospective study, it is easier to characterize any spectrum or verification bias that could arise and interpret accordingly. Therefore, more prospective studies of the accuracy of the IAC System in breast lesions are required. In addition, more studies estimating the accuracy of the IAC System for papillary lesions [53, 54], small size [55], and male sex [56] could be useful. Breast FNAB using rapid on site evaluation (ROSE) [7, 14] or in populations with a lower risk such as a mammographic BIRADS 4a lesion [24], can diagnose a case as benign rapidly and economically, benefiting the patient by saving time, discomfort, and expense while effectively providing the same information as a core needle biopsy.

This study deals with the diagnostic accuracy of the IAC System. Many practicing pathologists are also interested in the ROM, which is presented in online supplementary Table 1, to supplement previous reviews of the ROM of various categories [57, 58]. However, ROM is a predictive value, which is inherently dependent on the prevalence of malignancy in the particular study population and is thus highly variable, and the current study focuses on the prevalence independent measures of sensitivity and specificity.

This meta-analysis demonstrates the accuracy of breast FNAB using the IAC System. More specifically, it demonstrates the usefulness of the “Benign” category in excluding malignancy and the “Suspicious of Malignancy” and “Malignant” categories in diagnosing malignancy despite many of the included studies having artefactually low reported specificity. There appears to be little difference in the accuracy of diagnosing breast lesions between the IAC System and other five-tier systems, although the IAC System does provide clear definitions, key cytopathological diagnostic criteria for the lesions and tumors commonly found in each category, a differential diagnosis discussion, and a ROM for each category closely tied to a diagnostic management recommendation. The accuracy of breast FNAB is also largely similar among all centers that report more than 100 breast FNAB cases per year. The high accuracy of breast FNAB strongly supports the continued routine use of breast FNAB in all clinical settings including low- and middle-income countries, and this conclusion may be supplemented by a valid cost-effectiveness analysis.

The authors gratefully acknowledge the help of the Associate Editor for making major changes in the manuscript to improve the cohesiveness and readability of the final article.

An ethics statement is not applicable because this study is based exclusively on published literature.

The authors have no conflicts of interest to declare.

The authors received no funding for this study.

Pranoy Paul: involved in study ideation, independent assessor in study screening, and risk of bias assessment; reviewed the extracted data; and critically reviewed the manuscript. Shweta Azad: involved in study ideation, helped to achieve consensus in case of conflict in screening, checked the extracted data, checked and reviewed the risk of bias assessment, and critically reviewed the manuscript. Shruti Agrawal: involved in study ideation, helped to achieve consensus in case of conflict in screening, checked and reviewed the risk of bias assessment, and critically reviewed the manuscript. Shalinee Rao: involved in study ideation, helped to achieve consensus in case of conflict in screening, and critically reviewed the manuscript. Nilotpal Chowdhury: involved in study ideation, writing the protocol, independent assessor in study screening, and risk of bias assessment and extracted the data, statistics, primary writing of manuscript.

All data generated or analyzed during this study are included in this article and its supplements. Further inquiries can be directed to the corresponding author.

1.
Field AS, Raymond WA, Schmitt F, editors. The International Academy of Cytology Yokohama System for reporting breast fine needle aspiration biopsy cytopathology. Cham, Switzerland: Springer International Publishing; 2020.
2.
Montezuma D, Malheiros D, Schmitt FC. Breast fine needle aspiration biopsy cytology using the newly proposed IAC Yokohama System for reporting breast cytopathology: the experience of a single institution. Acta Cytol. 2019 Feb;63(Suppl 4):274–9.
3.
Chauhan V, Pujani M, Agarwal C, Chandoke RK, Raychaudhuri S, Singh K, et al. IAC standardized reporting of breast fine-needle aspiration cytology, Yokohama 2016: a critical appraisal over a 2 year period. Breast Dis. 2019;38(3–4):109–15.
4.
Agarwal A, Singh D, Mehan A, Paul P, Puri N, Gupta P, et al. Accuracy of the International Academy of Cytology Yokohama System of breast cytology reporting for fine needle aspiration biopsy of the breast in a dedicated breast care setting. Diagn Cytopathol. 2021 Oct;49(2):195–202.
5.
McHugh KE, Bird P, Sturgis CD. Concordance of breast fine needle aspiration cytology interpretation with subsequent surgical pathology: An 18-year review from a single sub-Saharan African institution. Cytopathology. 2019 Sep;30(5):519–25.
6.
De Rosa F, Migliatico I, Vigliar E, Salatiello M, Pisapia P, Iaccarino A, et al. The continuing role of breast fine-needle aspiration biopsy after the introduction of the IAC Yokohama System for reporting breast fine needle aspiration biopsy cytopathology. Diagn Cytopathol. 2020 Aug;48(12):1244–53.
7.
Wong S, Rickard M, Earls P, Arnold L, Bako B, Field AS. The International Academy of Cytology Yokohama System for reporting breast fine needle aspiration biopsy cytopathology: a single institutional retrospective study of the application of the system categories and the impact of rapid onsite evaluation. Acta Cytol. 2019 Jun;63(4):280–91.
8.
Agrawal S, Anthony ML, Paul P, Singh D, Mehan A, Singh A, et al. Prospective evaluation of accuracy of fine-needle aspiration biopsy for breast lesions using the International Academy of Cytology Yokohama System for reporting breast cytopathology. Diagn Cytopathol. 2021;49(7):805–10.
9.
Dixit N, Trivedi S, Bansal VK. A retrospective analysis of 512 cases of breast fine needle aspiration cytology utilizing the recently proposed IAC Yokohama system for reporting breast cytopathology. Diagn Cytopathol. 2021;49(9):1022–31.
10.
Marabi M, Aphivatanasiri C, Jamidi SK, Wang C, Li JJ, Hung EH, et al. The International Academy of Cytology Yokohama System for reporting breast cytopathology showed improved diagnostic accuracy. Cancer Cytopathol. 2021 Nov;129(11):852–64.
11.
Wong YP, Vincent James EP, Mohammad Azhar MAA, Krishnamoorthy Y, Zainudin NA, Zamara F, et al. Implementation of the International Academy of Cytology Yokohama standardized reporting for breast cytopathology: an 8-year retrospective study. Diagn Cytopathol. 2021;49(6):718–26.
12.
Tejeswini V, Chaitra B, Renuka I, Laxmi K, Ramya P, Sowjanya K. Effectuation of international academy of cytology yokahama reporting system of breast cytology to assess malignancy risk and accuracy. J Cytol. 2021;38(2):69–73.
13.
Sundar PM, Shanmugasundaram S, Nagappan E. The role of the IAC Yokohama System for reporting breast fine needle aspiration biopsy and the ACR breast imaging-reporting and data system in the evaluation of breast lesions. Cytopathology. 2022 Mar;33(2):185–95.
14.
Nigam JS, Kumar T, Bharti S, Surabhi SR, Bhadani PP, et al. The International Academy of Cytology standardized reporting of breast fine-needle aspiration biopsy cytology: a 2 year’s retrospective study with application of categories and their assessment for risk of malignancy. Cytojournal. 2021 Nov;18:27.
15.
Agrawal N, Kothari K, Tummidi S, Sood P, Agnihotri M, Shah V. Fine-needle aspiration biopsy cytopathology of breast lesions using the International Academy of Cytology Yokohama System and rapid on-site evaluation: a single-institute experience. Acta Cytol. 2021;65(6):463–77.
16.
Sarangi S, Rao M, Elhence PA, Nalwa A, Bharti JN, Khera S, et al. Risk stratification of breast fine-needle aspiration biopsy specimens performed without radiologic guidance by application of the International Academy of Cytology Yokohama System for reporting breast fine-needle aspiration cytopathology. Acta Cytol. 2021 Nov;65(6):483–93.
17.
Nargund A, Mohan R, Pai MM, Sadasivan B, Dharmalingam P, Chennagiri P, et al. Demystifying breast fnac’s based on the international academy of cytology, yokohama breast cytopathology system- a retrospective study. J Clin Diagnostic Res. 2021;15(3):EC01–5.
18.
Ahuja S, Malviya A. Categorization of breast fine needle aspirates using the International Academy of Cytology Yokohama System along with assessment of risk of malignancy and diagnostic accuracy in a Tertiary Care Centre. J Cytol. 2021 Jul;38(3):158–63.
19.
Kamatar PV, Athanikar VS, Dinesh U. Breast fine needle aspiration biopsy cytology reporting using International Academy of Cytology Yokohama System-two year Retrospective Study in Tertiary Care Centre in Southern India. Natl J Lab Med. 2019;8(4):PO01–3.
20.
Apuroopa M, Chakravarthy VK, Rao DR. Application of yokohama system for reporting breast fine needle aspiration cytology in correlation with histopathological and radiological findings. Ann Pathol Lab Med. 2020 May;7(4):A210–215.
21.
Niaz M, Khan AA, Ahmed S, Rafi R, Salim H, Khalid K, et al. Risk of malignancy in breast FNAB categories, classified according to the newly proposed International Academy of Cytology (IAC) Yokohama System. Cancer Manag Res. 2022 May;14:1693.
22.
Deshpande S, Rao K, Sushma Y, Saikumar G. International academy of cytology guidelines based categorization of breast fine-needle aspiration cytology lesions and their histopathological correlation. J Datta Meghe Inst Med Sci Univ. 2021 Apr;16(2):334.
23.
Joshee A, Joshee R. Breast FNA cytology reporting using new proposed IAC Yokohama reporting system: a single institution retrospective study. Int J Adv Res Med. 2021;3(2):267–71.
24.
Agrawal S, Anthony ML, Paul P, Singh D, Agarwal A, Mehan A, et al. Accuracy of breast fine-needle aspiration biopsy using the International Academy of Cytology Yokohama System in clinico-radiologically indeterminate lesions: initial findings demonstrating value in lesions of low suspicion of malignancy. Acta Cytol. 2021;65(3):1–7.
25.
Field AS, Raymond WA, Schmitt F. The International Academy of Cytology Yokohama System for reporting breast fine-needle aspiration biopsy cytopathology: recent research findings and the future. Cancer Cytopathol. 2021;129(11):847–51.
26.
Grimes DA, Schulz KF. Refining clinical diagnosis with likelihood ratios. Lancet. 2005 Apr;365(9469):1500–5.
27.
Baeyens J-P, Serrien B, Goossens M, Clijsen R. Questioning the “SPIN and SNOUT” rule in clinical testing. Arch Physiother. 2019 912019 Mar;9(1):1–6.
28.
The uniform approach to breast fine-needle aspiration biopsy. Diagn Cytopathol. 1997 Jan;16(4):295–311.
29.
Wells CA, Ellis IO, Zakhour HD, Wilson AR. Guidelines for cytology procedures and reporting on fine needle aspirates of the breast. Cytopathology. 1994 Oct;5(5):316–34.
30.
Whiting PF, Rutjes AWS, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. Quadas-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.
31.
Robinson IA, Blackham RB. Is there a better way to assess performance in breast cytology? Cytopathology. 2001 Aug;12(4):227–34.
32.
Field AS, Schmitt F, Vielh P. IAC standardized reporting of breast fine-needle aspiration biopsy cytology. Acta Cytol. 2017 Feb;61(1):3–6.
33.
McGuinness LA, Higgins JPT. Risk-of-bias VISualization (robvis): An R package and Shiny web app for visualizing risk-of-bias assessments. Res Synth Methods. 2021;12:55–61.
34.
Chu H, Cole SR. Bivariate meta-analysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. J Clin Epidemiol. 2006 Dec;59(12):1331–2.
35.
Takwoingi Y, Dendukuri N, Schiller I, Rücker G, Jones H, Partlett C, et al. Chapter 11: undertaking meta-analysis. In: Deeks J, Bossuyt P, Leeflang M, Takwoingi Y, editors. Cochrane handbook for systematic reviews of diagnostic test accuracy version 2. London: Cochrane. p Draft version for inclusion.
36.
Lin L, Chu H. altmeta: alternative meta-analysis methods; 2022.
37.
R Core Team. A language and environment or statistical computing; 2022.
38.
Freeman SC, Kerby CR, Patel A, Cooper NJ, Quinn T, Sutton AJ. Development of an interactive web-based tool to conduct and interrogate meta-analysis of diagnostic test accuracy studies: MetaDTA. BMC Med Res Methodol. 2019 Apr;19(1):1–11.
39.
Patel A, Cooper N, Freeman S, Sutton A. Graphical enhancements to summary receiver operating characteristic plots to facilitate the analysis and reporting of meta-analysis of diagnostic test accuracy data. Res Synth Methods. 2021 Jan;12(1):34–44.
40.
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77.
41.
Debray T, de Jong V. metamisc: Meta-Analysis of Diagnosis and Prognosis Research Studies, R package version 0.2.5; 2021.
42.
Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005 Sep;58(9):882–93.
43.
Rubinstein ML, Kraft CS, Parrott JS. Determining qualitative effect size ratings using a likelihood ratio scatter matrix in diagnostic test accuracy systematic reviews. Diagnosis. 2018 Dec;5(4):205–14.
44.
Hailey SR, Held L, Meyer S, Rueeger S, Rufibach K, Schwab S. biostatUZH: Misc Tools of the Department of Biostatistics, R package version 1.8.0/r103. EBPI, University of Zurich; 2020.
45.
Arul P, Masilamani S. Application of National Cancer Institute recommended terminology in breast cytology. J Cancer Res Ther. 2017;13(1):91–6.
46.
Ogbuanya AU-O, Anyanwu SN, Iyare EF, Nwigwe CG. The role of fine needle aspiration cytology in triple assessment of patients with malignant breast lumps. Niger J Surg Off Publ Niger Surg Res Soc. 2020;26(1):35–41.
47.
Ibikunle DE, Omotayo JA, Ariyibi OO. Fine needle aspiration cytology of breast lumps with histopathologic correlation in Owo, Ondo State, Nigeria: a five-year review. Ghana Med J. 2017;51(1):1–5.
48.
Madubogwu CI, Ukah CO, Anyanwu S, GU C, Onyiaorah IV, Anyiam D. Sub-classification of breast masses by fine needle aspiration cytology. Eur J breast Heal. 2017 Oct;13(4):194–9.
49.
Stephen M, Daye S-F, Raphael S. Palpable breast masses in a tertiary institution of South-South Nigeria; fine-needle aspiration cytology versus histopathology: a correlation of diagnostic accuracy. New Niger J Clin Res. 2018;7(12):43–7.
50.
Mohan BP, Krishnan SK, Prasad P, Jose L, Das NM, Feroze M. Correlation of fine needle aspiration cytology (FNAC) with histopathology in palpable breast lesions: a study of 200 cases from a tertiary care center in South India. J Med Sci Clin Res. 2018 Jul;6(7).
51.
Sackett DL, Straus S. On some clinically useful measures of the accuracy of diagnostic tests. ACP J Club. 1998;129(2):A17–9.
52.
Debray TPA, Damen JAAG, Snell KIE, Ensor J, Hooft L, Reitsma JB, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017 Jan;356:6460.
53.
Jamidi SK, Li JJX, Aphivatanasiri C, Chow MBCY, Chan RCK, Ng JKM, et al. Papillary lesions of the breast: a systematic evaluation of cytologic parameters. Cancer Cytopathol. 2021 Aug;129(8):649–61.
54.
Gümrükçü G, Doğan M, Gürsan N, Boylu B, Ekren E, Aker FV. How accurately FNAC reflects the breast papillary lesions? J Cytol. 2022;39(1):30–6.
55.
de Cursi JAT, Marques MEA, de Assis Cunha Castro CAC, Schmitt FC, Soares CT. Fine-Needle Aspiration Cytology (FNAC) is a reliable diagnostic tool for small breast lesions (≤1.0 cm): a 20-year retrospective study. Surg Exp Pathol. 2020;3(1).
56.
Oosthuizen M, Razack R, Edge J, Schubert PT. Classification of male breast lesions according to the IAC Yokohama System for reporting breast cytopathology. Acta Cytol. 2021;65(2):132–9.
57.
Hoda RS, Brachtel EF. International Academy of Cytology Yokohama System for reporting breast fine-needle aspiration biopsy cytopathology: a review of predictive values and risks of malignancy. Acta Cytol. 2019;63(Suppl 4):292–301.
58.
Field AS, Raymond WA, Rickard M, Arnold L, Brachtel EF, Chaiwun B, et al. The International Academy of Cytology Yokohama System for reporting breast fine-needle aspiration biopsy cytopathology. Acta Cytol. 2019;63(Suppl 4):257–73.

Additional information

Shweta Azad and Shruti Agrawal are co-second authors.Registration: https://osf.io/puev5.