Abstract
Background: Transbronchial needle aspiration (TBNA) is a safe and useful sampling technique for the diagnosis of mediastinal adenopathies/masses, but its accuracy seems to be influenced by selected clinical and procedural aspects. Objectives: We performed a systematic review to identify the main predictors of a successful transbronchial aspirate according to different clinical settings. Methods: We searched Medline and Embase for all studies evaluating predictors of TBNA diagnostic yield, published up to February 2012. Two authors reviewed all titles/abstracts and retrieved the full text of articles that are potentially relevant to identify studies according to predefined selection criteria. The methodological quality of studies was assessed through the revised Quality Assessment of Diagnostic Accuracy Studies tool. Evidence synthesis was graded according to overall number of studies, patients involved and methodological features. Results: Fifty-three studies, involving more than 8,000 patients and evaluating 23 potential predictive factors, were included. Major predictors in an unselected population, as well as in patients with suspected/known lung cancer, included lymph node size (short axis length ≥2 cm), presence of abnormal endoscopic findings, subcarinal and right paratracheal location, and the use of histological needle by an experienced bronchoscopist. Stage I and sampling of more than one lymph node stations were the only predictors of a successful TBNA result in patients with suspected sarcoidosis. Conclusions: The diagnostic yield of TBNA depends on selected clinical and procedural features. Knowledge of factors that predict a positive TBNA result may help optimize the diagnostic success of the procedure in different clinical settings.
Introduction
The role of transbronchial needle aspiration (TBNA) for the diagnosis of mediastinal adenopathies/masses, as well as for lung cancer staging, is well established. It is a safe, low-cost, minimally invasive sampling technique, performed while carrying out diagnostic bronchoscopy, avoiding separate staging procedures and some unnecessary surgical approaches. However, against very high specificity (96-100%), the sensitivity of TBNA varies greatly in the published literature. To date, two meta-analyses of data on TBNA accuracy for lung cancer staging have been performed: one reported an average pooled sensitivity of 76%, ranging from 14 to 100% [1], while the other, restricted to studies including patients with non-small cell lung cancer, provided two separate estimates according to low or high prevalence of mediastinal disease, 39 and 78%, respectively [2]. Such variability is all the more surprising when we consider that the included studies involved only subjects with suspected/known lung cancer. Therefore, it is likely that TBNA sensitivity for the diagnosis of lymphadenopathies/masses due to benign or still unknown conditions varies even more widely, because of the higher heterogeneity of the study population.
Besides the underlying clinical setting, the reasons for variability on TBNA accuracy appears to be related to both differences in methodology and to a number of other selected factors, evaluated as predictors of a successful aspirate in several investigations. These include selected baseline clinical characteristics, as well as procedural aspects. Further factors were analyzed in selected clinical conditions, such as lung cancer, sarcoidosis and tuberculosis.
However, the results on such predictors have often been conflicting and the real value of each one has not yet been defined. Therefore, the aim of the present systematic review is to summarize the available literature to identify the main predictive factors of a positive transbronchial aspirate according to different clinical conditions.
Materials and Methods
Search Strategy and Study Selection
We searched Medline and Embase for all original studies evaluating factors predicting TBNA diagnostic yield in patients with mediastinal lymphadenopathies/masses, published up to February 2012, using a combination of free text and MESH terms related to TBNA and diagnostic studies. The electronic search was supplemented by hand searching the bibliography of relevant articles.
The following criteria were established for inclusion: (1) observational/interventional studies evaluating factors influencing the yield of TBNA in mediastinal lymphadenopathies/masses; (2) studies with sample size ≥10; (3) studies with full text in English.
Exclusion criteria were: (1) observational/interventional studies evaluating factors influencing the yield of TBNA also for pulmonary/endobronchial lesions, where the outcome of interest was not available separately for mediastinal lymphadenopathies/masses; (2) observational/interventional studies on endobronchial ultrasound (EBUS)-TBNA or computed tomography fluoroscopy-guided TBNA.
Two independent authors firstly reviewed all titles/abstracts to identify potentially relevant articles. Then, the study selection, based on a full-text review, was performed according to the above predefined inclusion/exclusion criteria, and disagreements were resolved by discussion.
Data Extraction and Methodological Quality Assessment
Two reviewers independently extracted information on study design, population, number of subjects, primary outcome, size of needle utilized, predictive factors evaluated and relative results.
We assessed the studies for methodological quality using the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [3]. Since its publication in 2003, it has been widely used. More than 200 review abstracts in the Database of Abstracts of Reviews of Effects mention this tool, and it has been cited more than 500 times. The QUADAS tool is recommended for use in systematic reviews of diagnostic accuracy by the Agency for Healthcare Research and Quality, Cochrane Collaboration, and the UK National Institute for Health and Clinical Excellence [3].
It consists of 4 key domains that discuss patient selection, index test, reference standard, flow of patient selection, timing of the index tests and reference standard. Through specified questions, each domain is assessed about risk of bias, and the first three items are also assessed in terms of concerns about applicability.
Synthesis of Evidence
Because of the absence of a standardized tool for our review question, we summarized evidence according to the overall number of studies, patients involved and methodological features (sample size and statistical analysis), into the following three groups: (1) evidence of a predictive role, when the majority of studies/patients involved suggested a predictive role; (2) evidence of no predictive role, when the majority of studies/patients involved reported no statistically significant results; (3) inconsistent/insufficient evidence, when a similar number of studies/patients involved reported conflicting results or data were too scanty (i.e. ≤1 study without statistical analysis and/or with very small sample size). Evidence of a predictive role was further divided into two levels: ‘strong' when studies with statistical analysis were ≥2 and/or the number of patients involved was >500, and ‘weak' in the remaining conditions. For the main stations of sample (7, 4R, 4L and 10/11) we calculated a weighted average of the overall pooled estimates of diagnostic yield.
Results
Search Results and Characteristics of Included Studies
The first search identified 827 references. After initial screening based on title/abstracts, 743 articles were excluded because they were not relevant and the remaining papers were retrieved for detailed full-text evaluation. Of these, 31 did not fulfill eligibility criteria and, thus, 53 original studies were included (fig. 1). Their main characteristics and results are summarized in table 1. Five were randomized clinical trials, 29 were observational prospective studies and 19 were retrospective chart reviews. Statistical analyses were performed in 28 studies.
The application of the QUADAS-2 tool revealed an overall low methodological quality. Figure 2 presents the judgments on risk of bias (fig. 2a), concerns about applicability (fig. 2b) for each domain, and the final summarized proportion of studies deemed as at ‘low' or ‘high' risk of bias and having ‘low' or ‘high' concerns regarding applicability to the review question (fig. 2c). Overall, five studies were judged at ‘low risk of bias' and seven as having ‘low concerns about applicability' and, out of these, one study met both the conditions.
Graphical display of the revised QUADAS-2 results according to risk of bias (a), applicability concerns (b) and overall (c).
Graphical display of the revised QUADAS-2 results according to risk of bias (a), applicability concerns (b) and overall (c).
Data Synthesis
The high heterogeneity and characteristics of studies prevented any data pooling; therefore, we have described results for major predictors according to different clinical contexts and provided a synthesis of evidence, as previously reported (table 2).
Lymph Node Station (ATS/IALCS)
Twelve studies involved unselected patients and, of these, six identified station 7 as the main predictor [4,5,6,7,8], four identified station 4R [5,8,9,10] and one identified hilar stations [11]. Another four studies [12,13,14,15] did not find any significant statistical difference. Eleven studies evaluated this factor in patients with suspected/known diagnosis of lung cancer. Station 7 was reported as the main predictor in three studies [16,17,18], station 4R in two studies [17,19], hilar stations in two studies [20,21], station 3p in one study [22] and station 4 (4R + 4L) in three studies [16,23,24]. Two studies [25,26] did not find any significant statistical difference. Six studies analyzed lymph node station as the predictor in patients with suspected sarcoidosis [27,28,29,30,31,32], but each one reported different results and the only investigation reporting statistical analysis showed no difference among the stations sampled [30]. One study was performed in patients with suspected tuberculosis and reported the highest accuracy for station 7 [33]. The weighted average of the overall pooled estimates of diagnostic yield for stations 7, 4R, 4L and 10/11 is provided in table 3.
Lymph Node Size
Out of seven studies that assessed this outcome in an unselected population, six reported higher sensitivity for lymph nodes ≥2 cm in the short axis with a linear relationship with increasing size, when evaluated [5,9,10,12,13,14]. Similar results were obtained in four out of seven studies, involving patients with suspected/known lung cancer [20,21,22,25]. The remaining studies reported no significant difference in TBNA sensitivity depending on the lymph node sizes [34,35,36]. The only study performed on patients with suspected sarcoidosis did not detect any significant difference [30].
Type of Disease
Out of eight studies that evaluated the type of disease as the predictor, seven reported higher sensitivity for malignant (excluding lymphoma) than benign lesions, such as sarcoidosis and tuberculosis [10,13,14,37,38,39,40], while one study did not find any statistically significant difference based on the underlying disease [12].
Operator Experience
Three of the five studies [13,39,41,42,43] in unselected populations and both studies in a lung cancer setting [44,45] found higher sensitivity when TBNA was performed by an experienced bronchoscopist or after any training program. The only study performed in the context of suspected sarcoidosis reported no statistically significant difference depending on the operators [30].
Needle Size
A histology needle (19/18-G) was reported as a better predictor of a positive aspirate than a cytological one (21/22-G) in three out of four studies in unselected populations [9,46,47], and in two [21,48] out of three investigations performed in lung cancer patients. The other two studies [12,17] did not find any statistically significant difference according to needle size.
Number of Needle Passes
Three studies investigated this factor as a predictor in an unselected population: one found that diagnostic yield increased after the second needle pass [9], one reported the maximum yield from the third until the fifth pass, reaching a plateau after the sixth [5], and another one did not observe any statistical differences with the increasing number of passes [40]. One study was performed in a lung cancer setting [20] and suggested the best diagnostic yield from the fourth to the seventh pass. The only study on patients with suspected sarcoidosis did not report any statistical difference [30].
Endoscopic Findings
The presence of abnormal endoscopic appearance was evaluated in nine studies [5,8,10,12,18,25,34,41,49], and six identified it as a relevant predictor.
ROSE
Two randomized clinical trials and one observational study assessed the impact of ROSE (rapid on site examination) as the primary outcome, and all of them failed to detect any statistically significant difference in TBNA diagnostic yield with or without this technique [11,50]. The only study that evaluated ROSE in a lung cancer setting observed a significant increase of diagnostic yield when TBNA was performed with ROSE [20].
Type of Specimen with Histological Needle
Of the studies investigating type of specimen with histological needle, all six reported a higher diagnostic performance for combined (both histological and cytological) and histological specimens than that for cytological specimens [33,47,48,51,52,53].
Selected Predictive Factors in Patients with Suspected/Known Lung Cancer
Cancer Cell Type. Out of twelve studies, six reported small cell type as the best predictor [21,24,25,40,45,49], two identified non-small cell [18,23] and four did not detect any statistically significant difference in TBNA accuracy according to cell subtype [7,26,34,51].
Tumor Location. Two studies found a statistically higher yield when tumors were located on the right rather than on the left side [21,49], while another study did not report any significant difference. A PET SUV-max ≥5 [26] and a locally advanced tumor stage [34] were reported as significant predictors, while central or peripheral tumor location and lymph node stage [34] did not significantly influence TBNA results.
Selected Predictive Factors in Patients with Suspected Sarcoidosis
Out of five studies [27,29,30,54,55], four reported stage I as a better predictor than stage II, while another one did not detect any statistical difference [30]. The latter study found sampling ≥2 lymph node stations as the only significant predictor.
Selected Predictive Factors in Patients with Suspected Tuberculosis
One study assessed lymph node density on computed tomography scans as a predictor, but did not find any statistically significant difference [33].
Sensitivity Analysis
We performed a sensitivity analysis excluding the eight studies with staging purpose only from the category ‘suspected/known lung cancer', in order to investigate the potential selection bias due to the different prevalence of disease in subjects with a previously known diagnosis of cancer. Some factors with a previous ‘strong' evidence of a predictive role, such as ‘endoscopic findings', ‘type of specimen', ‘needle size' and ‘lymph node station', became ‘weak', or there was ‘insufficient evidence', since reducing the number of studies/patients involved meant the results failed to meet the evidence criteria, although there were not substantial changes.
Discussion
The present systematic review provides an extensive description and synthesis of the main results from all published studies evaluating TBNA yield predictors for the diagnosis of mediastinal lymphadenopathies/masses. Major predictors in an unselected population, as well as in a suspected/known lung cancer clinical setting, included: an increasing lymph node size, the presence of abnormal endoscopic findings, underlying malignant conditions, station 4R and 7 as the site of samples, and the use of histological needle by an experienced bronchoscopist. However, the type and duration of educational interventions evaluated varied widely among studies. Despite the level of evidence about the last two ‘modifiable' environmental features being similarly strong, we actually believe that, in daily practice, having long-term endoscopic skills and in-depth knowledge of mediastinal anatomy is the most important predictor. The choice of needle size should be dependent on clinical situations, because, although the use of a histological needle has been shown to obtain a higher diagnostic yield, it requires a greater technical expertise and it should be employed only after the operator has completed a training period with cytology needles.
Although the weighted average of pooled estimates of diagnostic yield for hilar stations was only slightly lower than that obtained for 4R and 7, it did not influence the overall synthesis of evidence, as it was derived without taking into account the study design. With reference to the number of needle passes, although data were limited, it is reasonable to conclude that it is necessary to perform at least three needle passes, up to a maximum of five, to obtain the best accuracy. Other predictors in patients with suspected/known lung cancer included selected features of the primary tumor, such as the presence of a small cell subtype rather than non-small cell lung cancer, most likely due to the higher biological aggressiveness and lower adhesion of small cells, and also a right-side location. There was also little evidence on the role of ROSE in this setting. However, data from two randomized controlled clinical trials, primarily designed to assess the efficacy of ROSE for the diagnosis of hilar/mediastinal adenopathy in an unselected population, failed to detect a significant increase in diagnostic yield, although the median number of needle passes and bronchoscopy complication rates were significantly reduced when this technique was added [11,50].
Factors possibly influencing the TBNA results in patients with suspected sarcoidosis have been investigated in few small studies, and, of these, only one has evaluated this issue as the primary outcome, providing statistical analyses. In this study, sampling more than one lymph node station was the only variable significantly associated with the likelihood of a positive aspirate. This finding was indirectly confirmed by Tremblay and colleagues [56], who performed a randomized trial to primarily compare the yield of EBUS-TBNA versus TBNA in suspected sarcoidosis patients and suggested that the superiority of EBUS-TBNA could have been related to the greater average number of lymph node stations sampled.
The systematic and extensive search of the available literature as well as the large number of included studies and patients involved are the major strengths of this study. Moreover, the concordance of most results among studies is reassuring in terms of the reliability and validity of information obtained.
However, the present review has several limitations. First, there was a baseline high heterogeneity among studies in terms of design, size of sample and outcome measure, assessed using patients or lymph nodes as the unit of analysis, various cytopathological criteria for classification of specimens and different definitions of test performance, including ‘diagnostic yield', ‘accuracy' and ‘sensitivity'. Furthermore, most investigations were not primarily designed as a diagnostic study, but reported experiences from routine clinical practice, and only 18 studies evaluated the role of predictors as a main endpoint. Thus, it is likely that the poor methodological quality of studies, as reported by the QUADAS-2 results, affects the validity of our findings. With reference to selection bias, some studies did not state if there was a consecutive enrollment and some others made inappropriate exclusions (i.e. patients with lymph node size <2.5 cm). Moreover, most investigations enrolled patients with suspected/known selected clinical diagnoses, leading to an overestimation of sensitivity, since the probability of obtaining a positive result is closely related to the prevalence of lymph node involvement [1,2]. Furthermore, several confounding factors could have affected the performance and interpretation of index tests, as TBNA was often performed within the same study by different operators with different needle types, sizes and number of passes, and ROSE was also occasionally added. Another relevant limitation is represented by the poor application of a reference standard test. Due to the high specificity of TBNA, positive results were generally assumed to be true positive and were not surgically confirmed; instead, negative results were verified by different reference tests (i.e. mediastinoscopy, mediastinotomy, thoracotomy, video-assisted thoracoscopy, median sternotomy), leading to a potential verification bias, or by clinical follow-up alone when surgical staging was not indicated.
In summary, conventional TBNA is a useful and safe diagnostic technique, but its accuracy has been suggested to be closely related to various underlying clinical and environmental factors. TBNA sensitivity could be excellent, if performed by an experienced bronchoscopist with a histology needle providing both cytological and histological specimens, in patients with enlarged lymphadenopathies (short axis length ≥2 cm) in paratracheal or subcarinal regions, endoscopic findings and clinical suspicion of lung cancer, as well as quite low if none of these conditions occurs.
Despite the above-mentioned limitations, clinicians should take into account the information provided in the present review in order to choose the more appropriate diagnostic test according to the selected clinical setting. Although, in recent years, EBUS-TBNA has been suggested to improve the diagnostic yield of conventional TBNA [57], higher costs prevent its routine use in all bronchoscopy centers. Thus, TBNA still represents a very useful alternative procedure for the diagnosis of mediastinal lymphadenopathies/masses and it would be beneficial to obtain further data on its accuracy and predictors from large, high-quality investigations.
Financial Disclosure and Conflicts of Interest
None of the authors have potential conflicts of interest.