Abstract
Objective: Although transcriptomic assessments of small samples using high-throughput techniques are usually performed on fresh or frozen tissues, there is a growing demand for those performed on stained cellular specimens already used for diagnostic purposes. Study Design: The possibility of detecting mRNAs and microRNAs (miRNAs) from routinely processed cytological samples using nCounter® technology was explored. Fresh samples from pleural and peritoneal effusions were analyzed using 2 parallel methods: samples were smeared and routinely stained using the May-Grünwald-Giemsa or Diff-Quik® method and mounted using conventional methods, and they were also studied following a snap freezing method, in which samples were maintained at −80°C until use. mRNAs and miRNAs were assessed and compared after total RNA extraction from both routinely processed samples and their matched frozen controls. Results: A good concordance was found between the gene expression measured in routinely processed samples and their matched frozen controls for the majority of mRNAs and miRNAs tested. However, the standard deviation of low-expressed miRNA was high. Conclusions: Although nCounter® technology is a robust method to measure and characterize both mRNAs and miRNAs from routinely processed cytological samples, caution is recommended for the interpretation of low-expressed miRNA.
Introduction
In the era of personalized medicine, gene expression profiling of mRNA in human cancers, along with its regulation by various short noncoding small RNAs called microRNAs (miRNAs), is increasingly gaining attention [1, 2]. Being major players in tumor initiation, progression, and dissemination [3‒6], mRNAs and miRNAs hold promise not only as cancer biomarkers but also as potential drug targets [7‒9].
The quantity and quality of RNAs extracted from tumors are critical factors for gene profiling [10]. However, extracting high-quality RNAs is often challenging since ribonucleic acid molecules are unstable and prone to degradation [10]. The best currently available methods for total RNA extraction rely on fresh or frozen materials [10]. However, in the clinical setting, the most frequent type of material handled is either formalin-fixed paraffin-embedded tissues or routinely fixed and stained cellular specimens. Various pre-analytical factors must be carefully considered to ensure high-quality samples [10]. These factors include sampling technique, type of specimen, delays in fixation, type of fixation, method of RNA preparation, and type of molecular platform used for data analysis and interpretation. Type of fixation is by far the major drawback [11, 12] as most fixatives may partially degrade RNA molecules, resulting in recovery of fragments of various sizes after extraction; furthermore, fixed samples are not always candidates for all high-throughput techniques [10].
Several studies have demonstrated that commercially available DNA extraction methods can be applied to formalin-fixed paraffin-embedded tissue and cytological samples obtained from routine clinical diagnosis [13‒17]. A few studies have shown the use of nCounter® technology for detecting mRNA from cytological specimens [18‒20], but to the best of our knowledge, none have compared gene expression data obtained from routinely processed cytological specimens with their matched frozen controls.
The possibility of using RNA expression methods from specimens previously processed for diagnostic purposes is of paramount importance, especially for clinical scenarios in which fresh or frozen samples are difficult to obtain, for instance, when a core-needle biopsy cannot be performed on a tumor owing to its small size or its hard-to-reach location and the number of smears is limited, a patient presents with a recurrent malignant effusion with inconclusive immunohistochemical findings, or if a clinician orders RNA profiling in a patient with resampling difficulties who was initially diagnosed based on a cytological specimen.
In this proof-of-principle study, we first explored the feasibility of using cytology slides previously processed for routine mRNA and miRNA profiling diagnostic purposes by using a multiplex digital color-coded technology. We then evaluated the reliability of our results by comparing RNA values from routinely processed and stained cytological samples with those from matched frozen controls.
Materials and Methods
This study was conducted in accordance with the EQUATOR guidelines and the Declaration of Helsinki and was approved by our institution as a protocol for translational research.
Study Flow Chart
Figure 1 shows the flow chart of our study. Selected cases consisted of cellular specimens in which the corresponding stained smears showed at least 50% tumor cells with the rest consisting of red blood and inflammatory cells. There was 1 case of malignant pleural effusion and 1 case of peritoneal effusion, collected from February to October 2014 at the Gustave Roussy Comprehensive Cancer Center (Villejuif, France). The malignant pleural (TE85 sample) and peritoneal (TE7 sample) effusions were from patients with metastatic colon and ovary carcinomas, respectively. Two external samples from normal prostate and liver human tissues were used to monitor mRNA and miRNA analyses since these samples are known to express differential levels of both types of RNAs.
Sample Processing and Total RNA Extraction
After centrifugation of effusion specimens, 1 part of each sample was processed according to standard smearing, fixation, and staining procedures using May-Grünwald-Giemsa (MGG; 2 smears) and Diff-Quik® (DQ, 2 smears) methods, then mounted with conventional methods, while the other part, representing the matched frozen controls, was dry-pelleted and snap-frozen at −80°C (Fig. 1). Removal of coverslips was performed 1 week later and tumor cells, previously identified by an experienced cytopathologist, were harvested with a commercially available kit consisting of a blue matrix that captures the cells as it air-dries (Pinpoint Slide RNA Isolation System II®; Zymo Research, Irvine, CA, USA) (Fig. 1).
Total RNA extraction was performed following the manufacturer’s instructions (Pinpoint Slide RNA Isolation System II®; Zymo Research, Irvine, CA, USA). Total RNA from matched snap-frozen dry pellets was extracted using TRIzol reagent (Life Technologies, Carlsbad, CA, USA) (Fig. 1). In both cases, RNA recovery was quantified with Qubit® 2.0 fluorometer (Life Technologies). Quality control (QC), including evaluations of RNA purity and integrity, was assessed with a NanoDrop® ND 8000 (NanoDrop Technologies, Wilmington, DL, USA) and a Bioanalyzer Labchip® (Agilent Technologies, Santa Clara, CA, USA), respectively (Table 1).
Digital Multiplexed Gene Expression Analysis of mRNAs and miRNAs
A fully automated multiplexed digital platform (nCounter® Analysis System; Nanostring Technologies, Seattle, WA, USA) was used for mRNA and miRNA analysis. In brief, this platform uses a digital color-coded barcode [21] for precise and simultaneous multiplexed measurements of multiple analytes, such as DNA, RNA, and proteins, in a single reaction. Each color-coded barcode is attached to a single target-specific probe corresponding to a molecule of interest. When combined with invariant controls, the probes form a multiplex CodeSet. The use of molecular barcodes and single molecule imaging for direct hybridization without the need for enzymes or library preparation ensures highly robust and precise detection and quantification of hundreds of unique molecules simultaneously [21, 22].
Two commercially available kits were chosen to detect a large spectrum of markers in order to explore as many genes as possible: one was used for mRNA detection (nCounter® PanCancer Pathways panel; Nanostring Technologies) and the other for miRNA (nCounter® Human v3 miRNA Assay; Nanostring Technologies) measurements.
The PanCancer Pathways Panel is able to detect the expression of 770 genes, including 40 internal reference controls (housekeeping genes) from 13 cancer-associated canonical pathways (e.g., cell cycle, chromatin modeling, apoptosis, MAPK, and PI3K). Overall, 50 ng of total RNA from each sample and 2 normal human references (total RNA from frozen human prostate, ref: AM7988, and liver, ref: AM7960; Thermo Fisher, Courtaboeuf, France) were hybridized according to the manufacturer’s instructions. Raw data (counts) for each mRNA were normalized through 2 steps, as previously described [23, 24]. In the first step, positive and negative spikes-in controls were taken into account to remove technical variations. For the second step, the nSolverTM analysis software (version 2.5; Nanostring Technologies) was used for biological normalization. Briefly, average counts (geometrical mean) of relevant housekeeping genes of all samples were used to scale each sample, and scaling factors of housekeeping genes were monitored. The geNorm algorithm [25] was used to identify reference genes based on variation between successive normalization factors. Briefly, this function ranks genes on the V number (variation between successive norm factors as reference genes are removed). Genes are excluded when their V number is equal to or less than the smallest V number for all the genes plus 1 (Vmin + 1). Fourteen housekeeping genes were defined as candidates in our analysis.
The human miRNA expression assay v3CodeSet was used to detect expression of 800 miRNAs using 100 ng of total RNA. Specific and long oligos (bridge oligos) complementary to mature miRNAs were hybridized and used as an anchor to enable binding to a second long miRNA adaptor (miRNAtag sequence). This complex was then stabilized during a ligation step. Excess tags were removed by enzymatic degradation. The ligated miRNA/miRNAtag complexes were then counted through the classical Nanostring chemistry, which requires hybridization of capture probes and reporter probes at 65°C for 20 h. Specific miRNA assay ligation controls were introduced in dilution series during the ligation step to evaluate the efficiency of the enzymatic step. Briefly, 6 synthetic controls corresponding to small RNA constructs were introduced to verify ligation specificity: 3 positive ligation controls A, B, and C and 3 synthetic ligation controls A, B, and C. The digital nCounter® Analyzer was used to count individual fluorescent barcodes and to quantify miRNA molecules present in each sample following a high-sensitivity mode (555 fields of view: FOV). The miRNA analysis was achieved using nSolverTM software. The average (geometric mean) of the top 100 most highly expressed miRNAs was then calculated for each sample in order to scale each sample according to the geometric mean of our experiments [24]. In addition, technical replicates of reference RNAs (5 and 6 from the liver and prostate, respectively) were produced.
Statistical Analysis
A statistical analysis was conducted on technical replicates of samples processed several times under the same conditions (human liver and prostate RNAs) and based on a prior analysis of differently processed clinical samples. Log2 mean expression and standard deviations (SDs) were analyzed. All expression data were log-transformed (natural logarithms) to normalize distributions. For each gene, means and SDs of the replicates were calculated. For each gene, the coefficient of variation (CV) was estimated as
CV (%) = (√eσ2 – 1) × 100.
The relation between CV (%) and mean was then plotted. Corrected p values, based on the Benjamini-Yekutieli method, were calculated with the nSolverTM software. Frequencies of the plotted p values were used to evaluate whether gene variation was due to technique type or tumor type.
Results
Quantity, Quality, and Integrity of RNA Extracted from Routinely Processed Smears
Table 1 shows the quantity, quality, and integrity of RNA extracted from all samples. Total RNAs extracted from dry snap-frozen pellets of clinical effusions (TE7 and TE85) and from referent normal tissue (prostate and liver) showed longer RNA molecules than 300 nucleotides in 60–88% of samples. Percentages of long RNA molecules ranged from 40 to 62% in routinely processed (fixed and stained) cytological specimens, thereby indicating higher rates of RNA degradation in routinely processed clinical samples. However, total RNA purity (measured through R260/230 nm and R260/280 nm) was close to 1.9/2.0, except for 1 sample (TE7) stained with MGG (R260/280 nm = 1.18).
QC of the Experiments
Evaluation of the targeted transcriptome analysis quality using digital multiplex color-coded barcode technology comprises 2 parameters: imaging QC, which calculates detected count percentages, and binding density, which corresponds to the number of molecules detected per square micron. In our experiments, as recommended by the manufacturer, 99% of the scanned FOV, namely, imaging QC scanned at 555 FOV (see online suppl. Fig. 1a; see www.karger.com/doi/10.1159/000510174 for all online suppl. material) was detected, and binding density was homogeneous (online suppl. Fig. 1b). In terms of quality, no significant differences were observed between routinely processed samples and their matched frozen controls. Moreover, both mRNA and miRNA showed good-quality counts. This was partly due to the normalization of counts in positive and negative spikes-in controls and housekeeping genes, which removed most of the technical biases and RNA content arising from biological variations. The spikes-in controls were well detected and stable in the tested samples, as reflected by their concentration.
Specific QC of mRNA Quantification
Figure 2 shows the mRNA quantification QC results before and after normalization (Fig. 2a, b, respectively). Before normalization, average (geometrical mean) housekeeping gene expressions ranged between 270 and 456 for ovarian (TE7) and between 64 and 173 for colon (TE85) cancer effusion samples, regardless of technical conditions. After geNorm normalization [25], 14 housekeeping genes were defined as robust genes for count normalization. Geometrical mean counts oscillated between 180 and 201 for TE7 samples and between 201 and 230 for TE85 samples. As expected, normalization reduced gene expression variation among samples. Controls ran in parallel; in particular, normal RNAs extracted from human prostate and liver, yielded similar data (online suppl. Fig. 2).
a, b Quality of mRNA extracted from routinely processed cytological smears is equivalent to that extracted from matched dry frozen pellets. a Average of all 40 housekeeping gene counts. b Average of robust housekeeping gene counts after geNorm normalization and gene filtering. Error bars represent SD. MGG, May-Grünwald-Giemsa; DQ, Diff-Quik; SD, standard deviation.
a, b Quality of mRNA extracted from routinely processed cytological smears is equivalent to that extracted from matched dry frozen pellets. a Average of all 40 housekeeping gene counts. b Average of robust housekeeping gene counts after geNorm normalization and gene filtering. Error bars represent SD. MGG, May-Grünwald-Giemsa; DQ, Diff-Quik; SD, standard deviation.
Specific QC of miRNA Quantification
Negative and positive controls were monitored during miRNAs analysis to check the enzymatic step efficiency. As recommended by the manufacturer, negative and positive ligation controls were detected between 0 and 4 copies for the former and between 11 and 16 copies for the latter (online suppl. Fig. 1d).
Statistical Analysis Exploring Robustness of mRNA and miRNA Measurements in Both Controls and Routinely Processed Clinical Cytological Specimens
mRNA measurements of referent human liver and prostate tissue replicates were highly correlated: Pearson correlation coefficients >0.96 and >0.97, respectively (online suppl. Table 1a, b). SD was higher than or equal to 30% when mRNA expression (log expressed mean values) were lower than 3.5 (Fig. 3a). SD was lower than 10% when mRNA values were higher than 6. Clinical sample replicates also showed a high degree of correlation: Pearson correlation coefficients >0.94 and >0.88 for TE7 and TE85 clinical cytological specimens, respectively (online suppl. Table 1c, d). Sampling distribution SDs showed some differences: SD was higher than 30% for TE7 and TE85 when mean mRNA expression values were 4 and 5, respectively (Fig. 3b).
Distribution of SD and gene expression. mRNAs in control human liver and prostate fresh frozen tissues (a), in TE7 and TE85 clinical cytological samples (b), miRNAs in control human liver and prostate fresh frozen tissues (c), and in TE7 and TE85 specimens (d). CV, coefficient of variation; miRNA, microRNA; SD, standard deviation.
Distribution of SD and gene expression. mRNAs in control human liver and prostate fresh frozen tissues (a), in TE7 and TE85 clinical cytological samples (b), miRNAs in control human liver and prostate fresh frozen tissues (c), and in TE7 and TE85 specimens (d). CV, coefficient of variation; miRNA, microRNA; SD, standard deviation.
miRNA measurements from liver and prostate tissue replicates and from clinical cytological samples also revealed a high level of correlation for both tissues (online suppl. Table 1e, f) and cytological specimens (online suppl. Table 1g, h). For tissue replicates, SD was higher than or equal to 30% when miRNA expression (log expressed mean values) was lower than 3.5 (Fig. 3c); it was lower than 20% when miRNA values were higher than 6. For the clinical samples, SD values were lower than 30% when miRNA expression was higher than 2 (Fig. 3d).
Analysis of Cancer Pathways from Routinely Processed Smears
The gene expression analysis feasibility from routinely processed smears was further investigated with differential gene expression analysis of TE7, TE85, and normal RNA controls. A limited number of genes were significantly regulated and associated with fixation: 50 genes had a p value <0.05 (online suppl. Fig. 2a). More than 400 genes were significantly regulated and associated with tumor type (online suppl. Fig. 2b), suggesting that air-drying or methanol fixation had a limited impact on mRNA dosage in our experiments.
To illustrate a more advanced feature, an elementary pathway analysis based on the quantification of the 770 genes was then conducted to investigate 13 cancer-associated pathways, both from fresh frozen reference tissues and from cytological clinical samples. The data from TE7 and TE85 samples were subsequently compared to normal human liver and prostate samples. Heatmaps (Fig. 4a, b) showed that the samples were clustered according to their tissue of origin and not to the type of specimen processing. A very good correlation was observed for TE7. Normal human liver and prostate samples appeared separate from colon cancer effusion (TE85) despite displaying a similar expression pattern. In our dataset, the highest regulated pathway corresponded to the Wingless/integrated (int-1; Wnt) and the Hedgehog (HH) signaling pathways in TE7 (Fig. 4a).
Pathway score heatmaps. a The graphic displays global significance scores of each sample; the global statistical significance indicates the extent of gene pathway differential expression with a covariate (tumor type), regardless of whether each gene is up- or downregulated. Yellow denotes gene pathways that exhibit extensive differential expression with the covariate and red denotes pathways with less differential expression. The 770 genes studied are grouped on their own specific pathway in this graphic obtained using normal liver tissue as the reference tissue. b Heatmap and c Hedgehog pathways score; the graphic shows the Hedgehog pathway (28 genes) heatmap in 4 different tissues. MGG, May-Grünwald-Giemsa; DQ, Diff-Quik.
Pathway score heatmaps. a The graphic displays global significance scores of each sample; the global statistical significance indicates the extent of gene pathway differential expression with a covariate (tumor type), regardless of whether each gene is up- or downregulated. Yellow denotes gene pathways that exhibit extensive differential expression with the covariate and red denotes pathways with less differential expression. The 770 genes studied are grouped on their own specific pathway in this graphic obtained using normal liver tissue as the reference tissue. b Heatmap and c Hedgehog pathways score; the graphic shows the Hedgehog pathway (28 genes) heatmap in 4 different tissues. MGG, May-Grünwald-Giemsa; DQ, Diff-Quik.
Since the HH signaling pathway was highly regulated in the TE85 clinical sample, it was further analyzed in all samples. Figure 4b shows that the genes of this pathway were differentially expressed in all samples and that their aggregation was determined by sample type, and not by sample processing. A boxplot (Fig. 4c) illustrates the HH pathway activation level and highlights differences between the sample types, as compared with normal human prostate and liver tissues. To illustrate a pathway not significantly altered in our comparison, the MAPK pathway containing 157 genes is presented in online suppl. Figure 3 where samples classified first per tissue type and then according to sample preparation show less significant differences in gene expression.
Discussion/Conclusion
Our preliminary study suggests that routinely processed cytological specimens represent valuable surrogates for fresh or frozen materials to analyze cancer-related mRNA and miRNA gene expression. We also show that commercially available multiplexed digital color-based barcode assays appear to be highly efficient in detecting gene expression, especially from samples with low input.
We have presented evidence showing these tools can robustly detect mRNAs and miRNAs, especially when miRNAs are present in high copy numbers. The key advantage of these technologies is their high-throughput capacity to analyze multiple transcripts in a single tube without the need for enzymatic amplification. Other equally important advantages include high sensitivity and short turnaround time. For example, it took us only 3 days to analyze 770 mRNAs and 799 miRNAs from RNA preparation to primary analysis and roughly a week from sample preparation to analysis of minute amounts of RNAs. Remarkably, everything was accomplished at an affordable cost compared to the huge cost of targeted therapies (Table 2).
The initial cytological examination of routinely processed smears for tumor fraction is particularly critical. Undoubtedly, laser microdissection is the ideal technique for tumor fraction enrichment. However, it requires specialized microscopic equipment, is time consuming, and usually needs membrane slides. Moreover, some of the stains used in routine protocols and tested in our study, such as the MGG or DQ protocols, contain a methanol fixation step which makes laser capture microdissection difficult with some platforms (unpublished results). In our study, being aware of the fact that macrodissection is less precise than microdissection, we only tested smear material containing at least 50% tumor cells to minimize tumor RNA dilution with RNA from normal cells. The use of a very simple and commercially available kit allowed us to macrodissect the regions of interest and to extract and purify RNA within a day, thereby contributing to turnaround time reductions from sample reception to sign out report.
A noteworthy outcome of our study is that digital multiplexed gene expression analysis enabled us to accurately detect mRNAs and miRNAs from routinely processed cytological smears. Our analysis revealed a strong correlation between the RNA expression levels observed in routinely processed cytological samples and those measured in their respective snap-frozen dry pellets controls. This implies that although pre-analytical factors, such as fixation and staining of cytological smears, may partially degrade mRNA molecules, pathologists can exploit this approach to obtain good-quality mRNA yields. What makes this system even more efficient than the primer systems used for reverse-transcriptase quantitative RT-qPCR analysis is its barcoded probe system. Instead, of the 2 primers used for RT-qPCR analysis required for hybridization of 2 targets in the same fragment, barcoded probes require hybridization of only the single target of interest.
There are several advantages to using routinely processed and stained cytological samples. First, using macrodissected cytological specimens rather than sections of core-needle biopsies is critical when the tumor is difficult to reach, when it is too small, and when no biological material is available other than previously processed cellular specimens [26]. Second, it may help assess tumor gene expression profiling sequentially in patients with recurrences who received different chemotherapy regimens and study the potential genetic evolution of a tumor. Third, it may be used to look for molecular targets not detectable by immunohistochemical procedures in patients with tumors that have metastasized through malignant effusion without easy access to the initial tumor. In such clinical situations, the use of cytological specimens may be helpful for clinicians in treatment decision-making.
Two limitations of our exploratory study are obviously the very small number of samples tested (n = 2) which need to be extended and the difficulty in quantifying low levels of miRNA expression. We found it difficult to mitigate the impact of treatment (fixation and staining) on miRNA output levels in TE7 and TE85 cells exhibiting expression values ranging between 1 and 3.6 (Fig. 3d). This implies that higher values of miRNA expression parallel smaller SD values.
In conclusion, the possibility for morphologists to exploit already processed and stained cytological samples for gene expression analyses in molecular cytopathology opens a new window in terms of diagnosis, prognosis, and therapeutic decisions. If our proof-of-principle is confirmed in additional, larger, multicentric, and independent studies, the routine implementation after previous validation of such high-throughput assays to evaluate DNA [13] and RNA [27] in pathology laboratories may benefit patients with advanced cancers and potential candidates for targeted treatments. The use of non-formalin-fixed and paraffin-embedded cytology, such as smears, touch preparations, and liquid-based cytology, has been shown to reliably replace cell blocks when no other material is available for patients with lung cancer [28]. Moreover, high-throughput assays for DNA and RNA extraction have been adopted in the recently updated molecular testing guidelines for the selection of patients with lung cancer for treatment with targeted tyrosine kinase inhibitors, jointly supported by the College of American Pathologists, the International Association for the Study of Lung Cancer, and the Association for Molecular Pathology [29]. However, considering the critical consequences that the analysis of cytological samples may have for patients in terms of diagnosis, prognosis, disease monitoring, and treatment outcomes, we strongly recommend that the implementation of molecular testing as a standard-of-care be performed only after in-house validation, considerations regarding the bias of expression level normalization [30, 31], and under very strict technical QCs, such as those developed for other high-throughput tools [31‒36].
Acknowledgements
The authors thank Charles Marcaillou (Integragen, Evry, France) and Nathalie Droin, PhD (INSERM U1009, Gustave Roussy, Villejuif, France) for their contributions to preliminary experiments in RNA sequencing and RT-qPCR using Taqman technology on miRNA. They are also grateful to Clara Nahmias, PhD (INSERM UMR U 981 Gustave Roussy, Villejuif, France) for her help and to Umberto Malapelle, PhD (Federico II University, Naples, Italy) for critical reading of the manuscript.
Statement of Ethics
Subjects gave their written informed consent to publish their case. The study protocol was approved by the institute’s committee on human research.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
This publication had no financial support.
Author Contributions
David Gentien, Laure Piqueret-Stephan, and Philippe Vielh designed the study. Laure Piqueret-Stephan, Emilie Henry, Benoît Albaud, and Audrey Rapinat performed the experiments under the supervision of David Gentien. Serge Koscielny performed the statistical analysis. David Gentien, Laure Piqueret-Stephan, Serge Koscielny, Jean-Yves Scoazec, and Philippe Vielh wrote the manuscript.
References
Additional information
David Gentien and Laure Piqueret-Stephan contributed equally.