Background: The high-risk ‘suspicious for papillary thyroid carcinoma' (SPTC) is a clinically relevant diagnosis in the cytological interpretation of thyroid aspirates. While The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) has provided invaluable terminology standardization, a performance comparison for this diagnostic category has not been performed. Therefore, this study evaluates the SPTC diagnosis before and after the introduction of TBSRTC in a large meta-analysis and at a single institution. Materials and Methods: The meta-analysis analyzed publications of SPTC or similar diagnoses before and after the introduction of TBSRTC. Similarly our own institutional experience was analyzed for the 8 years surrounding the introduction of TBSRTC. A correlation of the cytopathology and surgical pathology diagnoses was performed. Results: The introduction of TBSRTC coincided with a significant decrease in the fraction of cases called SPTC in the meta-analysis (4.5-3.1%, p < 0.00001) and in the institutional review (1.7-0.9%, p = 0.005). Meanwhile, the malignancy risk for those cases increased significantly in the meta-analysis from 62.5 to 80.5% (p < 0.00001) and trended upwards in the institutional review from 69 to 79% (p = 0.4). The follow-up rate was similar in both time periods in the meta-analysis and the institutional review. Conclusions: The introduction of TBSRTC coincided with a decrease in the fraction of cases called SPTC and an increase in the malignancy risk associated with that diagnosis.
Thyroid nodules are very common and usually benign . However, since thyroid nodules may contain a malignancy, and the incidence of thyroid cancer appears to be increasing , thyroid nodules are routinely biopsied with fine-needle aspiration (FNA) [1, 3]. When the cytological diagnosis is malignant, the positive predictive value is greater than 99%, and when it is benign, the negative predictive value is typically less than 5% [4, 5]. While these performance characteristics are desirable, indeterminate categories are unavoidable , and confusion over nomenclature necessitated The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) [5, 7]. Before TBSRTC, terminologies for indeterminate diagnostic categories varied significantly among institutions [1, 3, 8] even though it was clear that the subset of indeterminate aspirates with features that were worrisome but not completely diagnostic for malignancy had a significantly higher risk than those with focal and less worrisome atypical features [9, 10, 11]. Thus, the ‘suspicious for malignancy' (SFM) category is an important component of TBSRTC and is mainly comprised of cases that are suspicious for papillary thyroid carcinoma (SPTC) but lack the necessary features to make a definite malignant diagnosis. While the category is similar in many ways to ‘suspicious' categories that had been used in thyroid cytopathology prior to TBSRTC, a comparison of patients and outcomes with these diagnoses before and after TBSRTC has not been performed. The objective of the present study is to evaluate if the introduction of TBSRTC coincided with any change in the fraction of thyroid FNA called SPTC, the rate at which patients with that diagnosis underwent surgery, or the malignancy risk either in the published literature or at our institution.
A literature search was performed on the PubMed database with a boolean term including suspicious, malignancy, papillary thyroid carcinoma, cytopathology, and thyroid. Each paper was reviewed for the following variables: number of total biopsies, incidence of SPTC, number of SPTC cases with surgical follow-up, and number of resected SPTC cases with a malignant histological diagnosis. If a publication subdivided the SFM category into entities such as suspicious for medullary thyroid cancer or suspicious for lymphoma, then those non-SPTC cases were excluded. Some studies only considered the categorical diagnosis of SFM, and these cases were counted as SPTC cases given the rarity of other suspicious subcategories. The publications were classified as pre-TBSRTC if the studies were performed prior to the introduction of a risk-tiered system at that institution and as post-TBSRTC if they were performed after the introduction of a risk-tiered system at that institution. The studies were additionally classified as prospective and retrospective. A publication was classified as prospective if it reported diagnoses that were made preoperatively according to a predetermined diagnostic definition. A publication was classified as retrospective if it reported diagnoses that were made in a batch review of cases that were classified with a different classification scheme than the one in use at the time of original clinical evaluation.
After obtaining institutional review board approval, the pathology database was searched for the 6,998 thyroid FNA procedures performed at The Johns Hopkins Hospital over an 8-year period from January 1, 2005, to December 31, 2012. This period of time included the 4 years prior to the adoption TBSRTC (January 1, 2005, to December 31, 2008) in which 3,364 procedures were performed and the 4 years after the adoption of TBSRTC (January 2, 2009, to December 31, 2012) in which 3,634 procedures were performed. These included some of the same patients already described in our preliminary report of the 26 months immediately after the institution of TBSRTC . A small group (n = 212) of patients had biopsies both before and after January 1, 2009. Patients who were biopsied prior to January 1, 2009, and underwent thyroidectomy after this date were counted in the pre-TBSRTC group.
Cytopathology and Cytological-Histological Correlation
All cases were obtained at The Johns Hopkins Hospital under ultrasound guidance with on-site evaluation by either a cytopathologist or a cytotechnologist as has been described . Paired direct smears were performed; one slide was stained with Diff-Quik® for on-site evaluation, and another slide was immediately immersed in 95% alcohol and stained with the Papanicolaou stain. Final diagnoses for all cases were made on both Diff-Quik-stained and Papanicolaou-stained direct smear slides by board-certified cytopathologists. In this study surgical follow-up was available for cytological-histological correlation in 89 cases. The preoperative size and site of the nodule were matched with the findings of the surgical pathology for each biopsy site. In a single case, a correlation could not be made due to unresolved documentation about where the biopsy was performed, and this case was excluded. Papillary microcarcinoma discovered incidentally away from the FNA site was not counted as malignant follow-up. Demographic data such as age, gender, and the size of the nodule were collected. Statistic analysis was performed using native functions in the statistical programming language R (http://cran.r-project.org). Continuous variables were analyzed with the Student t test, and binomial variables were analyzed with the χ2 test or the Fisher exact test if any contingency contained a count of less than 30.
Meta-Analysis of Cytopathological Data
Studies published before and after the adoption of TBSRTC were analyzed to investigate any differences in the fraction of cases diagnosed as SPTC, the fraction of cases diagnosed as SPTC that were followed up with surgery, and the malignancy rate of those cases. The articles and their findings are summarized in table 1. There were 13 studies that included 51,863 thyroid FNA in the pre-TBSRTC period and 12 studies that included 50,192 thyroid FNA in the post-TBSRTC period. These numbers are minimum estimates because, as shown in table 1, only some publications included the total number of patients seen during their study periods. All publications in both pre- and post-TBSRTC periods provided the number of SPTC cases and the malignant rate of resected SPTC cases. The fraction of cases diagnosed as SPTC that were ultimately malignant after resection was significantly higher in the post-TBSRTC period than in the pre-TBSRTC period. The malignancy rate for SPTC cases in the pre-TBSRTC period was 67.8% (739/1,095, range 41.2-91.8), and it was 79.1% (704/889, range 53-97.3) in the post-TBSRTC period (p < 0.00001).
Five [14, 16, 18, 19, 23] of the pre-TBSRTC studies and 6 [9, 10, 11][32, 33, 34] of the post-TBSRTC studies reported the total number of thyroid FNA cases along with the fraction of them that were diagnosed as SPTC. The pre-TBSRTC publications involved 12,041 FNA biopsies of which 537 (4.5%, range 2.5-8.7) were diagnosed as SPTC. This is higher than the post-TBSRTC publications that involved 27,918 FNA of which 674 (2.4%, range 1.6-3.5) were diagnosed as SPTC. The difference was significant (p < 0.00001). Interestingly, these 2 time periods differed significantly in the rate of surgical follow-up. After the introduction of TBSRTC, the overall surgical follow-up rate for thyroid cases decreased significantly from 26.5% (3,186/12,041, range 19.2-34.6) to 20.6% (5,747/27,920, range 15.3-29.0, p < 0.00001), and the follow-up rate for cases diagnosed as SPTC similarly decreased from 82.9% (445/537, range 45.2-91.7) to 76.8% (519/676, range 50-97.2).
To correct for any bias that may have arisen from including studies in which cases were diagnosed retrospectively, the subset of studies that were diagnosed prospectively was considered separately. This group of publications included 5 pre-TBSRTC studies [14, 16, 18, 19, 23] that collectively involved at least 12,041 patients and 3 post-TBSRTC studies [9, 11, 30] that collectively involved at least 11,175 patients. The summary statistics for these studies are shown in table 2. Similar to the meta-analysis including all studies, the limited set of prospectively diagnosed studies showed that the rate of SPTC diagnosis decreased significantly from 4.5% (537/12,041, range 2.5-8.7) to 3.1% (341/11,175, range 1.6-3.5, p < 0.00001). Also similar to the meta-analysis that included all studies, the malignancy rate for the publications that reported prospectively diagnosed cases increased from 62.5% (278/445, range 60.1-81.3) to 80.5% (231/287, range 77.9-91.4, p < 0.00001). Interestingly, after removing the retrospectively diagnosed cases, the surgery rate actually increased marginally from 82.9% (445/537, range 45.2-91.7) to 84.2% (287/341, range 76.9-97.2) despite an overall decrease in the percent of patients who underwent surgery - in the prospectively diagnosed studies, the overall surgical follow-up rate was 26.5 (3,186/12,041, range 19.2-34.6), and it was 20.7 (2,317/11,175, range 15.3-22.4) after TBSRTC. While the surgical follow-up rate for cases diagnosed as SPTC before and after TBSRTC was not significantly different (p = 0.7), it was substantially different from the dramatic decrease seen when all the publications were considered. Additionally, the marginal increase in the surgical follow-up rate for cases diagnosed as SPTC contrasts starkly with the clear decrease in the overall surgical follow-up rate observed among the same studies.
Meta-Analysis of Cytopathological-Histopathological Correlation Data
To better understand the malignancy rate of the SPTC diagnosis before and after the introduction of TBSRTC, the malignant histological diagnoses were examined. Thirteen studies in the pre-TBSRTC [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25] and 8 studies in the post-TBSRTC [11, 26, 27, 28, 29, 30, 32, 33] provided the specific histopathological diagnoses for their cases with malignant follow-up, and these are detailed in table 3. From this data, the rate of the follicular variant of papillary thyroid carcinoma (FV-PTC) as a fraction of all malignancy was equivalent, i.e. 9.8% (72/731, range 0-30.8) before and 8.9% (37/414, range 0-20.5) after TBSRTC (p = 0.73613). This equivalence is meaningful because the overall malignancy rate increased, at a fraction significantly greater than the change in FV-PTC (p = 0.00073). The increase in the malignancy rate was commensurate with the increase in cases that ultimately turned out to be classical PTC on resection (p = 0.44).
The review of our institutional experience covered 2 time periods: the 4 years before the adoption of TBSRTC in which 3,362 biopsies were performed and the 4 years after the adoption of the TBSRTC in which 3,629 were performed. There were 46 cases of SPTC in pre- TBSRTC periods and 29 cases in post-TBSRTC periods. These comprised 1.7% (46/3,362) and 0.9% (29/3,629) of the cases before and after TBSRTC, respectively. The decrease was statistically significant (p = 0.00477). As seen in table 4, there were no significant differences in the basic demographic features of patients before and after the introduction of TBSRTC. The introduction of TBSRTC coincided with an insignificant upward trend in the surgical follow-up rate for cases diagnosed as SPTC (78 vs. 86%, p = 0.5); the upward trend was remarkable given the significant decrease in overall surgical follow-up from 15.8% (530/3,362) to 12.1% (441/3,629, p = 0.00001).
Paradoxically, this decrease in the SPTC call rate coincided with a significant increase in the fraction of thyroid FNA cases that were diagnosed as malignant from 3.4% (115/3,362) to 4.6% (167/3,629, p = 0.014) and an increase in the fraction of cases that were diagnosed as atypia of undetermined significance (AUS) from 9.1% (305/3,364) to 11.5% (419/3,640, p = 0.001). The increase in the AUS rate was statistically equivalent to the increase in the malignancy call rate (p = 0.8). As seen in table 5, the call rate for non-SPTC categories within SFM was low and equivalent for both time periods. The malignancy rate for the SPTC category after the introduction of TBSRTC was similar to the malignancy rate before TBSRTC, i.e. 69% (29/42) and 68.6% (22/28, p = 0.4). The fraction of PTC-FV cases was subjectively similar, although our institutional study was not sufficiently large to detect a difference in the PTC-FV rate.
The practical reason why SPTC is part of TBSRTC is to preserve the high positive predictive value of the malignant category by providing the cytopathologist with a less-binding diagnosis when some of the cytological criteria of malignancy are missing. On an intuitive level, SPTC is a necessary diagnostic category because of the morphological heterogeneous nature of thyroid tumors and the intrinsic incompleteness of sampling that occurs in FNA. The findings of both a decrease in the SPTC call rate and the increase in the SPTC malignancy rate parallel the consensus in TBSRTC that the malignant category should have a high positive predictive value and the consensus on the AUS category that provides a lower-risk category for compromised specimens. Given that many of the papers reviewed here did not report the incidences of all categories, it is not clear from the meta-analysis how much the existence of the AUS category led to the decrease in the SFM call rate and the increase in the predictive value.
As in the meta-analysis, in our institutional experience the rate of SPTC decreased and the predictive rate for malignancy increased. Paradoxically, this coincided with a proportionate increase in both the fraction of cases that were cytologically malignant and the fraction of cases that were diagnosed as AUS. If the decrease in the SPTC call rate were solely attributable to the consensus acceptance of AUS, then the increase in the AUS rate would be expected to be greater than the increase in the malignancy rate. As this is not the case, it is unlikely that the decrease in the SFM call rate is solely attributable to cytopathologist factors. An alternative explanation could be that the rate of BRAF mutation-associated PTC is increasing , and the classical morphological findings in PTC are more pronounced in these tumors . While there is no evidence in this retrospective review to test such a hypothesis, the influence of such environmental and epidemiological factors on the morphology and the performance of FNA in the thyroid cannot be eliminated and may have potential indications for the future of ancillary test development and use.
Before TBSRTC, various diagnostic terminologies were used to describe what is now called SFM. These included ‘atypical follicular proliferation, highly suspicious for papillary thyroid carcinoma', ‘thyroid neoplasm with features likely consistent with papillary thyroid carcinoma', and descriptive diagnoses. While the standardization of terminology within TBSRTC is obvious, the relationship between that standardization and performance metrics in published clinical or pathological practice is much less obvious. Interestingly, consolidating the different terminologies into the SPTC category led to a decrease in the use of that category. Our meta-analysis shows that the introduction of TBSRTC coincided with a decrease in the rate at which thyroid FNA biopsies were diagnosed as SFM but an increase in the fraction of those cases that were malignant. The impact on clinical management appears unchanged - most of the patients with this diagnosis underwent surgery regardless of the terminology with which they were diagnosed. There are two possible explanations for the decreased rate of SFM and the increased malignancy rate that we have shown in the meta-analysis. The first is that standardized terminology, especially the establishment of AUS, allowed for a seemingly less risky way to report specimens that lack the full complement of malignant findings. The second possible explanation is that SFM is a surrogate for non-classical genetic variants of PTC that are decreasing in abundance. The exact contribution of either of these two possibilities, if any, remains unclear.
The meta-analysis presented here intentionally separates studies reporting prospectively and retrospectively diagnosed material. This is done because studies with retrospectively assigned diagnoses introduce bias into the some of the most important features of the data - overall and cytological diagnosis-specific surgical follow-up rates. This is not surprising from a practical perspective since this bias is the product of other basic sources of bias. First, as the studies involve reviewing slides from distant history, more material is available in the study than at the time of the original read - the resection diagnosis may even be known to the person reviewing the case. Second, retrospective reviews of multiple years of cases within a limited period of study time inherently lead to intrastudy training and fatiguing. This also occurs in prospectively diagnosed cases but to a much smaller extent due to the necessarily higher attention given in the first-time legally binding diagnostic process and the fact that multiple years worth of cases are interpreted over a period of multiple years rather than concentrated. Finally, retrospective reviews that are limited to enrich a particular diagnostic category intrinsically contain selection bias, and that is significant enough to lead to artificial estimates of follow-up disease. Despite these shortcomings, retrospectively reviewed case studies are a chief source of the cytopathology knowledge base since they enable a first and safe validation of diagnostic criteria and categories. Also, when they are performed in an exhaustive or sizeable randomly selected set, they provide a first approximation of the expected incidences and problems in new categories. When performed methodically, they can even lead to the formulation of diagnostic criteria . As such, series involving retrospectively reviewed material were included in the meta-analysis for purposes of estimating how diagnostic categories are distributed. However, the biases involved in retrospectively reviewed studies led to some significantly misleading results - especially those relating to the surgical follow-up. Accordingly, all of our conclusions are derived from the prospectively reviewed studies.
The meta-analysis and institutional study have the usual unavoidable biases of retrospective reviews: selection bias, referral bias, and the contribution of unknown confounding variables. The greatest limitation in this is study the lack of universal surgical follow-up. As a result of this limitation, the natural incidence of disease within this patient population remains unknown, and it is unclear if the changes described here are due to the consensus terminology in TBSRTC or underlying shifts in the molecular epidemiology of PTC. As an equivalent fraction of patients with a diagnosis of SFM underwent surgery in both time periods and in both the meta-analysis and the institutional study, it is likely that selection biases possibly introduced by a less-than-perfect surgical follow-up rate are minimal.
Standardized terminology in TBSRTC coincided with a decrease in the use of the SFM category and an increase in the fraction of cases with that diagnosis that were ultimately found to be malignant. These findings were consistent in both the meta-analysis and the institutional review.