Abstract
The epithelial-mesenchymal (E/M) hybrid state has emerged as an important mediator of elements of cancer progression, facilitated by epithelial mesenchymal plasticity (EMP). We review here evidence for the presence, prognostic significance, and therapeutic potential of the E/M hybrid state in carcinoma. We further assess modelling predictions and validation studies to demonstrate stabilised E/M hybrid states along the spectrum of EMP, as well as computational approaches for characterising and quantifying EMP phenotypes, with particular attention to the emerging realm of single-cell approaches through RNA sequencing and protein-based techniques.
Epithelial-Mesenchymal Plasticity, E/M Hybrids, and Cancer Progression
E/M Hybrid States in Cancer
Epithelial-mesenchymal plasticity (EMP) defines a bi-directional axis of phenotypic change: epithelial-mesenchymal transition (EMT) and the reverse process of mesenchymal-epithelial transition (MET). These phenotypic changes have been implicated in embryonic and postnatal development, wound healing, and tissue repair, but are also exploited by cancer cells in avoidance of oncogene-induced senescence, immune escape, invasive progression, survival during dissemination, generation of cancer stem cells, the creation of metastases, and therapeutic resistance [Derynck and Weinberg, 2019; Dongre and Weinberg, 2019; Williams et al., 2019].
Although it is likely that cancer cells can be found to exist at any point across the full EMP spectrum, it is becoming increasingly clear that carcinoma cells often undergo a partial EMT or incomplete EMT, which result in an E/M hybrid state, where carcinomas co-express both epithelial and mesenchymal markers [Huang et al., 2013; Zhang et al., 2014; Grosse-Wilde et al., 2015; Fustaino et al., 2017; Pastushenko et al., 2018; Saitoh, 2018; Kroger et al., 2019; McFaline-Figueroa et al., 2019] (Fig. 1). The possibility of EMT phenotype has been suggested in several primary human cancers, including prostate, breast and lung cancer [Cattoretti et al., 1988; Livasy et al., 2006; Sarrió et al., 2008; Wang et al., 2017; Zacharias et al., 2018], although these studies have not specifically demonstrated co-expression of E and M markers in individual cells. Some studies showed evidence of both markers within the tumour parenchyma [Cattoretti et al., 1988; Livasy et al., 2006; Sarrió et al., 2008], but not necessarily in the same cell. Livasy et al. [2006] reported that 17/18 basal cancers expressed vimentin, a rate of vimentin positivity much higher than other series, and showed that further sections of the same tumours showed cytokeratin 5/6, 8, and 18 expression. However, they did not show co-staining as such in the same cell, and no prognosis was assigned to vimentin in this study. In Cattoretti et al. [1988], 58% of basal breast cancers were positive, and they demonstrated vimentin and cytokeratin co-expression, respectively, in serial sections. Wang et al. [2017] used a signature of gene expression developed from cell lines in data from The Cancer Genome Atlas (TCGA), in which it is not possible to know whether cells are co-expressing these factors, and presumably stromal cells could be included. Stromal cells may actually help confer an EMT skill set on tumour cells without them undergoing EMT per se, potentially conferring a hybrid-like situation. Zacharias et al. [2018] showed E-cadherin and vimentin positivity in sections of the same cancer.
The E/M hybrid phenotype has been more specifically evident in studies showing co-expression of epithelial (e.g., cytokeratins, E-cadherin, β-catenin) and mesenchymal (e.g., vimentin, N-cadherin, fibronectin) markers in various carcinoma types [Raviraj et al., 2012; Kolijn et al., 2015; Yamashita et al., 2018; Meyer et al., 2019; Navas et al., 2020]. With the advent of molecular profiling, the EMP continuum (and implicit E/M hybrid states) is increasingly well characterised. EMP scores show a spectrum of values derived by composite analysis of expression levels for a range of epithelial and mesenchymal genes in both cell lines and tumour samples, across a large range of cancer origins [e.g., Tan et al., 2014; McFaline-Figueroa et al., 2019]. Good concordance exists amongst these different scoring systems, each of which show that cells can undergo varying extents of EMT across tumour types [Chakraborty et al., 2020], and thus a tumour can contain different proportions of cells with varied EMT-ness. These scores, and single cell RNA sequencing (scRNA-seq) data obtained across a range of cancer cell lines, indicate a continuum in the gene expression landscape of EMP in both tumours and tumour cell lines [Cursons et al., 2018; Karacosta et al., 2019; Cook and Vanderhyden, 2020].
Given that EMP often involves changes in cellular morphology, an EMP continuum in the morphological landscape is also being increasingly characterised [Leggett et al., 2016; Devaraj and Bose, 2019]. The trajectories of individual cells in these EMP landscape(s) need not be a cell-autonomous phenomenon; they can be influenced by various biophysical and biochemical modes of cell-cell communication [Mandal et al., 2020; Tripathi et al., 2020b] and by microenvironment-specific conditions such as hypoxia, acidity and nutritional levels that may vary considerably across a single tumour and can also influence movement along the EMP axis [Williams et al., 2019]. Novel analytical tools with single cell RNA-sequencing data are able to align cells along a “pseudospatial” trajectory between these 2 states [McFaline-Figueroa et al., 2019]. A further facet to this heterogeneity lies in the relationship to therapy response. The spectrum of E/M transition is likely to confer a similarly broad variation in drug sensitivity across a single tumour. Additionally, the post-therapy tumour may have even greater heterogeneity due to the potential for treatment to induce E/M change. This change may vary both due to variable E/M induction depending on the initial E/M status of individual cells as well as uneven distribution of drug through a tumour depending on variations in blood flow and stromal density [Trédan et al., 2007].
The term metastable state was coined to describe the instability of these E/M hybrid cells [Klymkowsky and Savagner, 2009; Jia et al., 2019], although it may not apply equally to all E/M hybrids. Although a continuum of states have been shown in clinical tumours as discussed above [Karacosta et al., 2019], certain E/M hybrid states appear to be favoured, and these have (i) relative stability and/or increased proportionality [Biswas et al., 2019] and (ii) enhanced malignant potential, with several studies now having demonstrated a higher malignant potential from E/M hybrid states than more frankly mesenchymal variants [Celia-Terrassa et al., 2012, 2018; Huang et al., 2013; Bierie et al., 2017 ; Pastushenko et al., 2018; Kroger et al., 2019]. E/M hybrid subpopulations that remain more epithelial, with limited additional mesenchymal features, appear to have the greatest malignant and metastatic potential [Jolly et al., 2015; Pastushenko et al., 2018; Gupta et al., 2019].
The stability of certain E/M hybrid states is also implied by dynamics of EMP-positive cancer cell systems. Separate analysis of epithelial and mesenchymal transcriptomic components of breast cancer cell lines has shown that high epithelial gene expression often accompanies mesenchymal gene expression in cells designated as mesenchymal, reflecting a predominantly E/M hybrid phenotype, which was also seen in TCGA breast cancer specimens [Foroutan et al., 2018]; see section Computational Approaches, below. The observation of stability states across the continuum also parallels mathematical modelling predictions based on phenotypic plasticity stabilisation factors [Jolly et al., 2018]; see section Mechanism-Based Mathematical Models below. A further hypothesis to explain the duel observations of both a continuum of E/M states observed in many clinical tumours and that of metastable hybrid states is that perturbations or mutations in EMT-related signalling mechanisms within particular tumours lead to an interruption of progression along the E/M axis. This has been proposed [McFaline-Figueroa et al., 2019], backed by the demonstration of accumulation of cells at particular points by disruption of EMT-related pathways by genetic manipulation.
Thus, the development of robust and quantifiable methods for estimating the type and extent of the E/M hybrid state in clinical specimens is becoming an important goal.
The E/M Hybrid Phenotype in Experimental Systems
EMP-related changes occupy a non-linear landscape in the multi-dimensional space of cancer progression [Jolly et al., 2017]. The clinical consequences of the physical and functional heterogeneity in cancers resulting from plasticity and changes across the EMP spectrum make it essential to explore the E/M hybrid phenotype, for which model systems including cell line systems, preclinical models (e.g., syngeneic mouse cancers), and patient-derived xenografts (PDX) have been devised. For example, tumours arising in a mouse model of metastatic castration-resistant prostate cancer, in which the PI3K/AKT and RAS/MAPK pathways are co-activated in the prostate epithelium, exhibit EMT as detected by a GFP reporter driven off the vimentin promoter [Ruscetti et al., 2016]. Three sub-populations of cells were identified: epithelial (EpCAM+Vim–), E/M hybrid (EpCAM+Vim+), and mesenchymal (EpCAM–Vim+). When cultured, the isolated epithelial and mesenchymal cells remained in their initial cell state, while the E/M hybrid tumour cells exhibited plasticity, transitioning into either epithelial or mesenchymal states within 24 h of plating. It has been shown that EMP state sub-populations isolated from systems with inherent EMP tend to revert to a stable equilibrium, as seen with EpCAM-low sub-populations of MMTV-PyMT [Beerling et al., 2016] and PMC42-LA [Bhatia et al., 2019] cells, and either epithelial (EpCAMhigh) or mesenchymal (EpCAMlow) sub-populations of the HCC-38 human breast cancer cell line [Yamamoto et al., 2017], all of which revert to a tightly controlled epithelial: mesenchymal ratio [Beerling et al., 2016]. In SCC9 cells, pEMThigh and pEMTlow sub-populations remained distinct at 4 and 24 h of culture after cell sorting, but resembled parental SCC9 cells after 4 days [Puram et al., 2017]. In head and neck squamous cell carcinoma (HNSCC) cells, segregated sub-populations returned to unsorted cell proportions over time, with a more rapid plasticity seen in E/M hybrid sub-populations [Pastushenko et al., 2018]. These data comprehensively suggest that complex dynamics exist amongst the sub-populations of EMP-positive carcinoma cells, supporting the hypothesis of stable E/M hybrid states.
Indeed, evaluation of cancer cell lines in vitro has revealed E/M hybrid states in lung, breast, colorectal, ovarian, prostate, and renal cancer cells [Hendrix et al., 1997; Huang et al., 2013; Bronsert et al., 2014; Andriani et al., 2016; Bierie et al., 2017; George et al., 2017]. E/M hybrid phenotypes have also been observed in PDX models of human lung, breast and colorectal cancer. In the context of PDX models, serial passages allow the human cancer stroma to be replaced with that of mouse, which provides advantages in clearly distinguishing the changes in the epithelial markers [Chao et al., 2017; Pastushenko et al., 2018; Mizukoshi et al., 2020]. Aiello et al. [2018], using a lineage-labelling strategy in the KPCY mouse model of pancreatic ductal adenocarcinoma (PDAC), reported that pEMT cells were more predominant/proficient in generating epithelial cells compared to cells from tumours that had undergone full EMT [Aiello et al., 2018]. A more comprehensive picture of the spectrum of EMT states is provided by the study from Pastushenko et al. [2018], where intermediate states were investigated using combinations of the CD106, CD51, and CD61 biomarkers. Enhanced metastatic competence was identified in relatively stable E/M hybrid states that retained more epithelial features. Further transplantation assays, scRNA-seq and methylome analysis from the sub-populations revealed the diversity in EMT features, tumour seeding, dissemination and metastatic capability [Pastushenko et al., 2018].
E/M Hybrid Circulating Tumour Cells
Circulating tumour cells (CTCs) represent a window into the metastatic process, with numbers of CTCs correlating strongly with prognosis and therapy response [Saxena et al., 2019; Burr et al., 2020]. Evidence for the relevance of E/M hybrid phenotypes and partial EMT states in CTC and CTC clusters as part of liquid biopsy assays is also building in terms of their metastatic competence, stemness and therapeutic resistance. The existence of E/M hybrid phenotypes has been observed in CTCs of clinical cancer patients at advanced stages [Theodoropoulos et al., 2010; Armstrong et al., 2011]. Observation of the E/M hybrid state has been particularly evident in CTCs, likely in part due to the utilisation of epithelial markers to isolate CTCs from a large background of non-epithelial blood cells, and the subsequent analysis of mesenchymal markers using various methods. Pioneering studies by Yu et al. [2013], from the Haber lab, employed RNA in situ hybridisation of epithelial versus mesenchymal pools of mRNA to show that E/M hybrid EMP states were more common than fully epithelial or mesenchymal states [Yu et al., 2013]. Interestingly, CTC clusters with an E/M hybrid phenotype undergo collective migration, in which cells migrate with retained contact junctions. The leading cells showed increased mesenchymal characteristics and actin-mediated mobility, while central cells within the cluster maintained polarity and intercellular junctions, migrating along with the traction forces generated by leader cells [Revenu and Gilmour, 2009; Aceto et al., 2014, 2015; Jolly et al., 2015]. Such clusters of cancer cells ready to disseminate are often reported in the invasive front of tumours [Yang et al., 2008]. Numerous studies have since confirmed such E/M hybrid enrichment in CTCs [reviewed in Francart et al., 2018; Saxena et al., 2019], and CTC phenotype analysis has shown an associations between E/M axis CTC sub-types and poorer overall survival (OS) and disease-free survival (DFS) of PDAC patients [Sun et al., 2019].
As detailed more thoroughly in the next section, the last 5 years have seen a surge in single cell RNA-sequencing technologies, which more precisely identify the partial EMT or E/M hybrid states from cancer datasets [Sarioglu et al., 2015]. Dong et al. [2018] reported the presence of E/M hybrid states in scRNA-seq datasets of primary breast cancer and lung adenocarcinoma PDX [Dong et al., 2018]. Puram et al. [2017] compared the scRNA-seq data from the primary and metastatic HNSCCs and identified a subset of malignant cells with a partial EMT signature [Puram et al., 2017]. The E/M hybrid signature correlated with a malignant basal HNSCC subtype and lymph node metastasis, signifying that partial EMT promotes loco-regional invasion In situ, these E/M hybrid cells were seen at the leading edge of tumours associated with cancer-associated fibroblasts [Puram et al., 2017]. Similarly, cells at different positions on the E/M axis were observed to associate with different stromal cell populations in a range of human tumours in a second study [Pastushenko et al., 2018]. The advent of more combinatorial single cell approaches utilising NGS and mass cytometry is now allowing the portrayal of integrated genome, transcriptome, DNA methylome [Barros-Silva et al., 2018], and proteome [Abouleila et al., 2019] information from individual cancer cells in relation to their EMT phenotypic state [Dey et al., 2015; Macaulay et al., 2017]. It is indeed an exciting time for exploration of the E/M hybrid state, attested to by these numerous excellent studies and the growing armamentarium of technologies, analytical tools and clinical implications.
Computational Approaches to Reveal Regulatory Mechanisms Underpinning E/M Plasticity and the E/M Hybrid Phenotype
Deriving Signatures with Differential Expression
Plastic phenotypes are often quantified using a workflow that combines a perturbation experiment in vitro with a differential expression analysis. To generate an EMT gene expression signature, an EMT programme is stimulated using an established EMT inducer such as TGFβ [Taube et al., 2010; Du et al., 2016; Cursons et al., 2018] or EGF [Cursons et al., 2015] to enable the identification of a group of genes that are differentially expressed in response to the stimulus within the given system and in correspondence with morphological changes. These differentially expressed genes establish a stimulus-specific gene set for EMT, which can also be separated into “epithelial” and “mesenchymal” components representing the up- and down-regulated genes [Cursons et al., 2018; Foroutan et al., 2018]. Along these lines, Figure 2 displays Cancer Cell Line Encyclopaedia (CCLE) breast cancer cell lines and TCGA tumour samples across an epithelial-mesenchymal landscape. The landscape shows that basal and claudin-low cell lines have lower epithelial scores but higher mesenchymal scores, while the luminal cell lines are less mesenchymal and more epithelial. We would expect that samples expressing an E/M hybrid phenotype would be found in the top right quadrant, with both a high mesenchymal and high epithelial score. Importantly, many basal and even some of the more fully mesenchymal claudin-low cell lines exhibit an E/M hybrid transcriptome, as do the vast majority of tumours. Application of these methods was used to categorise a number of E/M intermediate states (E, I0, I1, I2, M) in the TCGA breast cancer data and plot a pathway through these using temporal data from the TGFB-induced EMT in MCF10A cells [Panchy et al., 2020].
Most experimental studies reviewed here can be used to derive epithelial and mesenchymal gene signatures, however very few have developed gene signatures specific to the E/M hybrid state. Some approximation of this has been made by combining “epithelial” and “mesenchymal” components into one signature. Grosse-Wilde et al. [2015] produced an E/M hybrid signature by combining 30 from each of the most extremely expressed genes in the E and M signatures from clonal HMLER human breast cancer cells and found it to predict poor OS in luminal A, HER2-enriched and basal breast cancer subtypes [Grosse-Wilde et al., 2015]. They found considerable overlap between the HMLER E/M signature and a lung E/M signature also compiled from combining “epithelial” and “mesenchymal” components derived from a microarray dataset of 93 lung cancer cell lines [Loboda et al., 2011]. This lung E/M signature was also found to best differentiate 2 major unsupervised sub-populations of colon cancers, where the mesenchymal sub-population also correlated with poorer outcome, although the hybrid state was not studied [Loboda et al., 2011]. Signatures from scRNA-seq are discussed below.
Methods for Applying Signatures
The higher-level biological information captured by gene expression signatures can be mapped onto new datasets using gene set analysis methods, revealing the extent to which they express similar molecular phenotypes. Gene set analysis methods [reviewed and benchmarked recently by Geistlinger et al., 2021] can assess specific hypotheses, such as “Does this group of samples express an EMT programme?”, or competitively test a group of gene signatures for example, “Which molecular programme is upregulated in these samples?”. When applied to differential analysis workflows, EMT gene expression signatures can evaluate the differential enrichment for an EMT programme between 2 groups of samples. Alternatively, single-sample gene set analysis methods such as ssGSEA [Barbie et al., 2009] or singscore [Foroutan et al., 2018; Bhuva et al., 2020] can be used to identify individual samples that are concordant with gene expression signatures and estimate the EMP phenotype of a single sample. In recent work, a refinement of the singscore method has been developed to remove the requirement for whole-transcriptome scale measurement in order to apply EMT signatures [Bhuva et al., 2020]. Bhuva et al. [2020] demonstrate that by adding as few as 5 stably expressed anchor genes to the genes included in an EMT signature, resolution of epithelial, mesenchymal and hybrid phenotypes can be achieved in the context of a low- or medium throughput measurement methods such as qRTPCR or Nanostring. This refinement enables the translation of EMT signatures commonly used with RNA sequencing in an experimental research context into clinical application or application to archival FFPE tissues where whole-transcriptome RNA sequencing is frequently not possible.
Alternative Scoring of EMT Phenotypes
Similarly, various scoring methods have been developed for the quantification of EMT phenotypes in samples independent of a differential expression workflow. These approaches provide scores along a continuous phenotype based on the weighted sum (76 gene signature [76 GS] method) [Byers et al., 2013; Guo et al., 2019] or distribution of gene expression values such as Kolmogorov-Smirnov test [Tan et al., 2014] for genes that are pro-mesenchymal or pro-epithelial. To place samples on a spectrum inclusive of an E/M hybrid state, a multinomial logistic regression-based method was developed using a small number of key regulators to predict a score between 0 and 2 (epithelial to mesenchymal), where a middle value of 1 indicates the E/M hybrid state [George et al., 2017]. Maximum variability in terms of EMT is associated with increased E/M hybrid signatures [Chakraborty et al., 2020], consistent with the higher plasticity ascribed with hybrid E/M phenotype(s). Intriguingly, higher heterogeneity in EMT scores was recently suggested as a potential biomarker for inflammatory breast cancer, a highly aggressive cancer with 5-year survival rate less than 30% [Chakraborty et al., 2021]. The multinomial logistic regression-based metric has potential advantages over the 76 gene signature score and the Kolmogorov-Smirnov test, as it is capable of distinguishing between the “pure” individual E/M hybrid and mixtures of epithelial and mesenchymal cells, since it can assign values to modelled single cells and uses a small number of predictors to calculate the E/M score [Chakraborty et al., 2020].
Gene Set Scoring in Single-Cell Data
scRNA-seq offers a powerful tool to identify epithelial, mesenchymal and E/M hybrid states in mixed cell populations within heterogeneous samples. Various single-cell specific methods are being developed for the phenotypic quantification of individual cells, as summarised in Table 1. Whether these tools were designed for cell type annotation or gene signature scoring, they perform different algorithms to quantify phenotypes of individual cells. Although these methods and their application to EMT phenotypes are still being actively developed, the resolution of scRNA-seq data supports the investigation of complex phenotypes, such as E/M hybrids states. For example, visualising the presence of different marker combinations in cell line models of endocrine therapy resistance in breast cancer [Hong et al., 2019] enables us to identify the functional conditions in which E/M hybrid cells are more prevalent, such as in CD44-high/oestrogen-depleted or long-term oestrogen-deprived subsets (Fig. 3). Gene set testing methods to characterise molecular phenotypes are well established for bulk analyses [Khatri et al., 2012; Geistlinger et al., 2021]; however, the knowledge bases and gene signature databases that were developed for bulk sequencing may not be appropriate for interpretation of single-cell data. Instead, to phenotypically characterise groups of cells in scRNA-seq data, various reference-based cell type annotation methods have been developed. These methods bypass comparison to bulk-derived gene sets by projecting individual cells into single-cell reference datasets using similarity metrics such as cosine similarity [Kiselev et al., 2018; Srivastava et al., 2018] or Spearman’s correlation [Aran et al., 2019], or by applying transfer learning [Lieberman et al., 2018]. However, many of these approaches are dependent upon clustering performance, which introduces a selection bias [Zhang et al., 2019], and may not necessarily infer cell states [Chen et al., 2019]. Alternative methods, such as marker-based phenotyping, rely on existing knowledge of gene expression garnered from protein measurement or bulk RNA sequencing. Additionally, marker-based approaches may not be appropriate for quantifying E/M hybrid phenotypes due to the problem of drop out in scRNA-seq data, where lowly expressed genes may be undetected and falsely quantified as “unexpressed” [Stegle et al., 2015].
Mapping to References in Single-Cell Data
Lähnemann et al. [2020] have recently reviewed single-cell “mapping” or cell type annotation, where the authors argue in favour of reference-based methods. Here, unsupervised clustering followed by cell type annotation is considered “reference-free”. The marker genes used in these “reference-free” methods are found on a correlative basis, where the functional relevance of the genes in specific phenotypes is not guaranteed [Yuan et al., 2017]. As discussed by Lahnemann et al. [2020], we need to enable quantification of the uncertainty in mapping cell type or cell state; however, this requires more sophisticated statistical methods. Although pseudo-bulk analyses [Lun et al., 2016] enable the application of the methods established for bulk sequencing [Tung et al., 2017; Crowell et al., 2020], these generalise large, heterogeneous populations that could otherwise reveal cell-to-cell variation and novel phenotypes.
Single-Sample Gene Set Testing in Single-Cell Data
Single sample gene set testing methods have been developed for bulk analyses that provide continuous phenotype scores for individual samples. If applied to scRNA-seq data these methods would phenotypically characterise individual cells, independent of clustering. However, previously developed bulk methods do not address transcript drop out and consequential sparsity of single cell data [Hicks et al., 2018]. Imputation methods have been developed [Lähnemann et al., 2020] but can introduce errors in downstream analyses [Andrews and Hemberg, 2018]. Current single sample methods cannot be directly applied to single-cell data, also due to the nuances of scRNA-seq such as the synchronisation of cell cycle stages [Stegle et al., 2015; Chen et al., 2019]. To overcome these challenges, many single-cell specific enrichment methods are starting to emerge.
A handful of single-cell single-sample gene set testing methods have been developed that model the count overdispersion and technical noise observed in single-cell data, such as PAGODA [Fan et al., 2016] or VAM [Frost, 2020]. Alternatively, Vision calculates signature scores by summing the expression of “up” genes minus the expression of “down” genes, then standardising based on the cell’s expression [DeTomaso et al., 2019]. Although Vision does not seem to penalise or account for gene drop out, the aggregate expression score avoids the per-gene technical noise problem. These methods have attempted to adjust gene set testing for scRNA-seq data; however, no benchmarking or comparison between methods has been performed. There is work being done to identify cell-type-specific drop out patterns [Qiu, 2020], suggesting that drop out perhaps cannot be modelled during cell-type discovery, and that we need an approach for characterising phenotype that is independent of drop out patterns.
Gene Expression Signature Resources and the Hallmarks Gene Set
Gene expression signatures for a range of molecular phenotypes are available through the C2 collection of gene sets in the Molecular Signature Database (MSigDB) [Liberzon et al., 2015], which contain thousands of experimentally curated gene sets from the literature. Knowledgebases such as the Gene Ontology [Ashburner et al., 2000; Consortium., 2019], KEGG [Kanehisa and Goto, 2000] or Reactome [Fabregat et al., 2018] are also available through MSigDB and can provide sets of genes associated with particular pathways or molecular programmes. To summarise the large number of gene sets available in MSigDB and reduce redundancy, Liberzon et al. [2015] performed consensus clustering on >8,000 gene sets from the C1–C6 collections to identify 50 clusters of gene sets. The authors assigned biological themes to each cluster, then curated and refined clusters to produce 50 “Hallmark” gene sets. This includes a 200-gene EMT signature that is representative of 107 individual EMT-related gene sets in MSigDB, including experimentally derived expression signatures as well as pathways from knowledgebases.
Curated EMT Gene Signatures
Other EMT gene signatures have been defined using different meta-analysis methods and curation approaches [Groger et al., 2012; Tan et al., 2014; Foroutan et al., 2017]. For example, Foroutan et al. [2017] used 2 meta-analysis techniques to obtain a stimulus-specific EMT signature using 10 cell line microarray datasets that provided evidence for TGFβ-induced EMT across a range of cell lines. The signature was then used to assess cell lines from CCLE and patient samples from TCGA for evidence of a TGFβ-driven EMT. In the majority of tumour types, a more mesenchymal profile correlated with inferior survival, although exceptions existed. E/M hybrids were not assessed, however. Byers et al. [2013] curated a 76-gene EMT signature using a lung cancer cell line panel based on gene-gene correlations with 1 of 4 key EMT genes (CDH1, VIM, FN1 and CDH2). This expression signature was validated on independent datasets where their signature successfully identified cell lines that had undergone EMT, showing poorer disease control on EGFR inhibitors for tumours with a more mesenchymal profile. To obtain a directional cancer-specific EMT signature (with epithelial and mesenchymal components), Tan et al. [2014] stratified pan-cancer cell lines and patients according to known epithelial and mesenchymal markers. By applying single-sample gene set enrichment using EMT gene sets from MSigDB, the authors identified EMT signatures that were associated with the most epithelial and most mesenchymal samples. Although mesenchymal profiles correlated with poorer outcome in some tumours such as ovarian and colorectal cancer, this was not universal, with mesenchymal breast tumours having a better prognosis except in a post-chemotherapy cohort. Again, although this analysis lent itself to the potential study of the hybrid state, this was not explored.
Need for New Gene Expression Signatures
Many of these EMT gene signatures have been derived from microarray datasets and may not best represent the data we now obtain from modern RNA sequencing platforms [Wang et al., 2014]. High-throughput RNA-sequencing of fluorescence-activated cell-sorted populations [Fustaino et al., 2017], or even new single-cell sequencing technologies, offer an opportunity for new EMT signatures to be derived. These experiments may better represent the datasets currently being used for EMT scoring, and more accurately reflect the true biology. For example, McFaline-Figueroa et al. [2019] used single-cell sequencing to demonstrate regulatory checkpoints along the EMT continuum. The authors were able to spatially segregate cells that had undergone EMT from epithelial colonies, and by applying their method, Monocle2, were able to align cells along a “pseudospatial” trajectory between these 2 states. Single-cell RNA sequencing experiments enable a finer characterisation of phenotypes and can reveal expression patterns of cells along an EMT continuum, making them suitable for deriving gene expression signatures in rarer, novel states such as the E/M hybrid state, and potentially in stable intermediates. Single-cell sequencing so far lacks the methodology for signature derivation.
Method development for single-cell data analysis remains an active area of research, where identifying expression patterns in individual cells has been particularly challenging due to the low number of genes detectable per cell [Hicks et al., 2018]. Consequently, there are no established workflows for deriving gene signatures from single-cell data. Using scRNA-seq approaches, E/M hybrid states have been identified based on the presence of both pro-epithelial and pro-mesenchymal markers in skin cancer [Dong et al., 2018], ovarian cancer [Gonzalez et al., 2018], HNSCC [Puram et al., 2017; Karacosta et al., 2019], and lung, mammary and skin cancer [Pastushenko et al., 2018]. Despite observations of E/M hybrid states, or partial EMT programs, there is a lack of gene expression signatures for this phenotype.
To instead identify expression signatures of cells in an unsupervised way, Puram et al. [2017] applied non-negative matrix factorisation clustering to reveal a partial EMT gene expression programme that expressed both ECM genes and EMT markers in a subset of cells. To establish the independence of this partial EMT signature, the authors confirmed the lack of correlation with known EMT signatures and used a highly expressed marker from this signature, TGFβ-induced (TGFBI), to sort SCC9 cells into pEMThigh and pEMTlow cells and validate the findings. The cells that expressed the partial EMT signature were more invasive, but less proliferative, and were found using immunohistochemistry analysis for the top 10 genes (plus the p63 HNSCC marker) near the leading edge of the tumour adjacent to cancer-associated fibroblast populations.
Single-cell sequencing offers the opportunity to characterise the heterogeneity of EMP and to uncover the particular molecular signatures that regulate the E/M hybrid state. Deriving robust signatures for the E/M hybrid state requires analysis methods that are sensitive to drop out and can address the challenge of mapping between bulk and single-cell data. To ensure the quality and reproducibility of both the signature-derivation methods and gene expression signatures, it is of importance that these are accessible and well communicated for them to be eventually benchmarked. The application of single-cell-derived signatures to bulk-sequencing data will also need to be validated and be able to distinguish E/M hybrid subpopulations from heterogeneous epithelial and mesenchymal populations.
Mechanism-Based Mathematical Models to Characterise Signatures and Implications of E/M Hybrid Phenotypes
Role of Mathematical Modelling in Identifying the E/M Hybrid state(s)
Mathematical models have played a major role in our understanding of the E/M hybrid state. As experiments typically tend to be performed at a single spatial scale (cell-level or population-level) it can be challenging to interpret results from these different spatial scales when processes also occur over multiple overlapping time scales. Mathematical models are a powerful tool with which to integrate the observations from different spatial and time scales, to not only interpret these experimental results, but to also generate and test new hypotheses that can be experimentally tested. This approach has proven extremely insightful in transforming the perception of EMT from a binary (all-or-none) process to a multi-stable process. Significantly, mathematical modelling studies were among the first to predict the existence of a stable E/M hybrid state characterised by stabilised co-expression of epithelial and mesenchymal markers [Lu et al., 2013; Tian et al., 2013]. This was later confirmed experimentally, both in vitro [Bao et al., 2013; Zhang et al., 2014; Pankova et al., 2016] and in vivo [Puram et al., 2017; Aiello et al., 2018; Pastushenko et al., 2018].
The two mathematical models, by Lu et al. [2013] and Tian et al. [2013], which initially predicted the existence of the stable E/M hybrid state, focused on signalling pathways within a single cell and built upon decades of observational and functional work by developmental and cancer biologists. Multiple signalling pathways have been implicated in EMT over many studies, but the activities of these pathways can be considered to converge to a core regulatory circuit composed of two families of transcription factors, SNAIL and ZEB, with two families of micro-RNAs, miR-34 and miR-200, respectively [Nieto et al., 2016]. These two mathematical models used ordinary differential equations to characterise the dynamical behaviour of the core EMT regulatory circuit and observed that a stable E/M hybrid state, distinct from the epithelial and mesenchymal states, was possible. Further extensions to these models have identified factors to stabilise E/M hybrid phenotypes and have since been experimentally validated [Hong et al., 2015; Jolly et al., 2016; Sha et al., 2020; Silveira and Mombach, 2020; Xin et al., 2020]. For instance, knockdown of these “phenotypic stability factors” such as OVOL1/2 can drive E/M hybrid cells to a “complete EMT” phenotype during mammary morphogenesis [Watanabe et al., 2014] and in lung cancer too [Jolly et al., 2016]. These models tend to display non-linear responses and hysteresis whereby cells en route through EMT and MET may take different paths.
Different models of hysteretic patterns in EMT dynamics have been presented. Here, for illustrative purposes, let us consider an example presented in Figure 4 [adapted from Jolly et al., 2015] where ZEB1 mRNA levels vary in response to an inducing signal, in this case the protein SNAIL. In Figure 4a, for a proposed miR-200/ZEB/SNAIL circuit, if a cell starts in an epithelial phenotype and its endogenous SNAIL levels are increased, then it first transitions to the E/M hybrid state, and if SNAIL is further increased the cell transitions to the mesenchymal state. In the reverse direction, by decreasing SNAIL, the cell directly transitions to the epithelial state from the mesenchymal state. In Figure 4b, by including OVOL, a cell initially in the epithelial state transitions to the E/M hybrid state, and then the mesenchymal state with increasing TGFβ. When decreasing TGFβ, the cell first transitions to the E/M hybrid state and then the epithelial state. However, the path followed in both directions is still different – a phenomenon called as “hysteresis”. Celia-Terrassa et al. [2012] further demonstrated mathematically and validated experimentally by applying TGFβ treatment and then TGFβ withdrawal, that most but not all normal and tumour mammary epithelial cells exhibit hysteretic patterns in TGFβ-driven EMT. Statistically significant evidence of hysteresis is also found in TGFβ-treated lung cancer cells [Karacosta et al., 2019]. Interestingly, in both these scenarios presented in Figures 4a, b, we see parameter regions where more than one phenotype can co-exist, suggesting that isogenic cells exposed to the same dose of EMT-inducing signal can yet display “non-genetic” heterogeneity in their EMT status due to various reasons such as stochasticity in biochemical reactions underlying these networks [Huang, 2009].
Alternative single-cell models have focused on larger networks describing the multiple signalling pathways implicated with EMT. While continuous models for large networks can provide information regarding temporal network dynamics, such as phenotypic plasticity and transition rates, it can be experimentally intractable to determine all of the relevant parameters, therefore Boolean-logic approaches, where a gene is considered either active or inactive, have been considered. Using this approach, a study by Steinway et al. [2014] identified two stable fixed points corresponding to the epithelial and mesenchymal phenotypes. A subsequent study identified new fixed points that exhibit features of both epithelial and mesenchymal phenotypes thought to be associated to E/M phenotypes and identified putative targets that supress hepatocellular carcinoma EMT. In vitro analysis indicated many of the predictions did supress the TGFß-driven EMT, and thus are potential therapeutic targets [Steinway et al., 2015]. Later work by Font-Clos et al. [2018] extended the work of Steinway et al. [2014] to identify multiple E/M fixed points between the more stable epithelial and mesenchymal fixed points. Hari et al. [2020] have since compared the Boolean-logic approach with RACIPE (Random Circuit Perturbation), where an ensemble of continuous models is generated with random chosen kinetic parameters for a given network topology and steady state solutions are clustered to identify robust dynamical features of the given network. They show that multistability correlates with the net number of positive feedback loops. In contrast to traditional approaches where nodes in a network are targeted to control EMT plasticity, Hari et al. [2020] propose breaking the feedback loops by targeting their links rather than nodes as a possible route to restrict the EMT plasticity.
Population Dynamics of EMP
Experiments at the population scale have observed epithelial-mesenchymal heterogeneity where cells in the same tissue/population could be either epithelial, mesenchymal, or in an E/M hybrid state, and are able to switch between these states. Recent studies have demonstrated the ability of cells to switch states in the absence of EMT-inducing factors such as TGFß. Ruscetti et al. [2016] showed that an initial population of E/M hybrid cells could quickly generate a mixture of epithelial, E/M hybrid, and mesenchymal cells. Further, Bhatia et al. [2019] demonstrated that an initial population consisting of either epithelial or mesenchymal cells had the ability to regain the phenotypic equilibrium of the parental population. Mathematical models at the population scale for EMT dynamics have tended to be phenomenological. Markov models, where the states of cells in the population at a future time depend only on the states on the current time, have been popular in describing experimental data [Gupta et al., 2011; Mandal et al., 2016; Risom et al., 2018; Chapman et al., 2019; Karacosta et al., 2019]. However, these models lack integration with the core regulatory network models previously discussed. Therefore, recent work has explored how to link the well-characterised dynamics of EMT regulatory networks to experimentally observed population-level behaviour. In an example of integrating experimental data with known interaction and regulatory networks, Cursons et al. [2018] used an experimental model of EMP driven by the well-known ZEB1-miR200 network [Gregory et al., 2011]. Exploring plasticity across an EMP landscape, they integrated experimental data with a known miR-mRNA interaction network and known protein interaction networks to characterise a co-targeted regulatory network regulating EMT and MET.
One approach to incorporate core regulatory networks is to consider a population of cells where each cell contains a copy of the core EMT regulatory circuit [Tripathi et al., 2020a]. The dynamics of the EMT circuit can be calculated for each cell independently to determine the phenotype: epithelial, E/M hybrid, or mesenchymal. Motivated by spontaneous EMT observed in Ruscetti et al. [2016], Tripathi et al. [2020a] developed a model which included cell proliferation, and show that noise in partitioning of signalling pathways during division could be a possible mechanism by which epithelial-mesenchymal heterogeneity can be maintained over extended periods of time involving multiple generations and passages.
Spatiotemporal Dynamics and Heterogeneity in EMP: Role of Cell-Cell Communication and Cell-ECM Crosstalk
An alternative approach to link the cell-level EMT behaviour is to explicitly consider spatial structure. Early evidence of EMT in clinical samples arose from differences in spatial localisation of ß-catenin and E-cadherin across different parts of the tumour [Brabletz et al., 2001]. Related mathematical models tend to be multi-scale models where dynamics of intracellular E-cadherin and ß-catenin signalling drive cellular level properties including cell-cell adhesion. This has been explored using both an off-lattice framework [Ramis-Conde et al., 2008] and an on-lattice cellular Potts model, by integrating CompuCell3D and Bionetsolver [Andasari et al., 2012]. Both models demonstrate properties associated with EMT where cells lose cell-cell adhesion, break off from a primary tumour body, and migrate through and invade surrounding tissues as part of the metastatic cascade. Tumour budding, where numerous finger-like projections, or buds, extend from the primary tumour towards neighbouring stroma in various cancers, has also been identified as a marker of invasive behaviour and a possible clinical readout of partial EMT [Bronsert et al., 2014; Grigore et al., 2016]. Other models include features associated with EMT states, such as the different survival times, migration speeds, and interactions with the environment, to develop a novel multi-organ model that explicitly accounts for EMT processes in individual cancer cells in the context of the invasion-metastasis cascade, and are able to produce biologically realistic outcomes regarding tumour shape and metastatic distribution [Franssen and Chaplain, 2020]. Bocci et al. [2019b], by coupling EMT and cell migration, consider one part of this process – the migration from the primary tumour – to explain individual and collective cancer cell migration. Their model robustly recapitulates circulating tumour cell cluster fractions and size distributions experimentally observed across several cancer types. They identify that mechanisms that increase the population of the E/M hybrid state, rather than a complete EMT, are required to generate large clusters of five to ten cells.
Mesenchymal states and epithelial states of breast cancer stem cells have been observed located at the tumour invasive front and more centrally, respectively [Liu et al., 2013]. By integrating the core EMT regulatory circuit into each cell in a 2D hexagonal grid lattice, spatio-temporal models can be developed to explore the role of cell-cell communication, through Notch-Delta or Notch-Jagged signalling, to show that a gradient of EMT-inducing signal, for example TGFβ, can recapitulate the experimentally observed spatial segregation of different EMT phenotypes [Bocci et al., 2019a]. Mechanosensing through cell-extracellular matrix crosstalk can also influence EMT through an EMT-ECM stiffness feedback loop. Stiffer substrates can results in cells becoming more mesenchymal-like [Matte et al., 2019] and EMT can alter the stiffness of the ECM [Peng et al., 2017]. Kumar et al. [2014] used a system of ordinary differential equations to describe these relationships and illustrate that the scattering of a heterogeneous population depends on both the ECM density and fraction cells which have undergone EMT. By considering a cell contractility-ECM feedback loop Ahmadzadeh et al. [2017], in experimental results and modelling, also find that the driving force underlying EMT is directly proportional to matrix stiffness and identify intermediate ECM stiffness as being optimal for cell invasion. Further experiments by Margaron et al. [2019] investigate the mechanical properties of single cells in different EMT states and interestingly find that the mechanical properties of E/M hybrid cells are not in between those of epithelial and mesenchymal cells. Instead, they find that E/M hybrid cells produce lower forces and as a consequence of their lower contractility move faster and have a higher invasive potential. Using these observations they identified that triple-negative breast cancer cells have E/M hybrid characteristics rather than mesenchymal characteristics. Their results suggest that the mechanical and structural aspects of the E/M hybrid state are important to understand tumour-cell dissemination. Talking to clinical relevance, we found culturing breast cancer cells in dense collagen led to cells with an E/M hybrid phenotype. Such cells were detectable in clinical specimens and corresponded to a higher rate of local relapse [Raviraj et al., 2012]. In keeping with this, we reviewed the impact of high mammographic density, a potential surrogate for tissue stiffness, on local outcome, with 5 of 7 studies linking high breast density with local breast cancer relapse [Shawky et al., 2019].
The EMT/MET status of cells can also be altered by communication with stromal cells in the tumour microenvironment, for instance, the feedback loops formed among epithelial and mesenchymal cells and macrophages of varying polarization status – M1 and M2 [Li X et al., 2019]. Similar to EMP, polarization status of macrophages is now being thought of as a continuum [Muraille et al., 2014], which can depend on microenvironmental conditions such as the extent and/or duration of hypoxic response [Henze and Mazzone, 2016; Schaefer et al., 2017]. With recent modelling efforts to investigate the effects of cyclic nature of hypoxia [Saxena and Jolly, 2019; Ardaševa et al., 2020], future models focused on investigating the crosstalk between micro-environmental factors such as hypoxia, metabolic crosstalk, and dynamic transitions among varying phenotypes of tumour and immune cells will be valuable in elucidating the dynamics of EMT in a tissue.
In summary, mathematical models have served as a powerful tool to unravel new biological insights into EMT/MET, and have driven the next set of experiments, for instance, those characterising the features of stable E/M hybrid state(s). Further development of these models, with close integration and validation with experimental data, will improve understanding of stability of the E/M hybrid states, their spatial patterns, and how they contribute to disease progression.
Clinical Utilisation of the E/M Hybrid State
Advantages of an increasing focus on the E/M hybrid state are 2-fold: (i) evidence emerging as described above that E/M hybrid cells associate better with carcinoma progression than frankly mesenchymal cells, and (ii) the increased certainty that the cells being identified are indeed carcinoma cells exhibiting EMP, since such bi-phenotypic cells are seldom observed outside of malignancy, unlike the abundance of cells exhibiting a mesenchymal phenotype, such as carcinoma-associated fibroblasts, endothelial cells, and various immune cells.
Immunohistochemical Studies
A number of studies have associated evidence of mesenchymal change (gain of mesenchymal markers and/or aberrant epithelial markers) in the tumour parenchyma with poor outcomes [e.g., Domagala et al., 1990]. However, many early studies measured mesenchymal and epithelial marker expression in separate specimens without exploring co-staining, as mentioned above, and so would have potentially classified both fully mesenchymal and E/M hybrid tumours as mesenchymal. Interestingly, one such early study in breast cancer found the worst prognosis from tumours with relatively equal proportions of epithelial and mesenchymal cells rather than a preponderance of either one [Thomas et al., 1999]. Mesenchymal change is especially prevalent in basal-like breast cancers, the majority of which are “triple-negative breast cancers” lacking all 3 predictive receptors (ER, PR, HER2) and have poorer survival than other subtypes [Thomas et al., 1999; Sarrió et al., 2008; Jeong et al., 2012; Karihtala et al., 2013].
Immunohistochemical analysis of co-expression of epithelial and mesenchymal markers provided a more specific assessment of the E/M hybrid phenotype. Such an approach allowed us to identify an association between co-expression of cytokeratin (CK)-19 and vimentin in cells located in the dense connective tissue stroma and poorer local relapse-free survival, but not distant metastasis [Raviraj et al., 2012]. A similar scenario has been reported of colon carcinoma tumour budding into the stroma and shown to represent partial EMT [Brabletz et al., 2001; Grigore et al., 2016; Meyer et al., 2019]. Although no report has explored the clinical significance of E/M hybrid occurrence at the invasive edge in colorectal cancer, loss of epithelial differentiation at the invasive front correlated with increased distant metastases in one early study [Ono et al., 1996], and ZEB2 expression at the invasive front in primary colorectal cancers correlated with cancer-specific survival in more recent work [Kahlert et al., 2011].
Novel Technologies
The impact of E/M axis positioning, inclusive of the hybrid state, on cancer pathobiology leads to the potential for EMP assessments of individual cancers to have prognostic/predictive clinical significance for patients, in terms of both malignant potential, therapy resistance, and eventual survival (see Table 2 for summary of studies to date). Several studies have identified basal-like breast cancers co-expressing classical epithelial (e.g., cytokeratin 5/6) and mesenchymal markers (e.g., vimentin) [Thomas et al., 1999; Cakir et al., 2012], and such co-expression has also been seen in colon cancer [Meyer et al., 2019], high grade serous ovarian cancer [Gonzalez et al., 2018], and lung carcinoma by novel single-cell techniques such as DepArray and CyToff, respectively. High resolution, sub-cellular microscopy has also been used to assess E/M hybrid state in a variety of tumours [Godin et al., 2020]. Single-cell RNA-seq analysis has also shown that partial EMT was an independent predictor of nodal metastasis, higher grade, and adverse pathologic features for example, lymphovascular invasion in head and neck squamous cell carcinoma [Puram et al., 2017].
Grosse-Wilde et al. [2015] hypothesised that stemness was associated with E/M hybrid cells, and employed E, M and E/M signatures derived from stable, clonal HMLER sublines, and single cell qPCR analysis to link a mixed E/M signature with stemness in (i) individual cells, (ii) luminal and basal cell lines, (iii) in vivo xenograft mouse models, and (iv) in all breast cancer subtypes. Co-expression of E and M signatures was associated with poorest outcome in HER2 positive and basal breast cancer patients with 71% higher relapse and 41% higher mortality (Table 2) as well as with enrichment for stem-like cells in both E and M breast cell-lines. Importantly, the prognostic potential of the shared E/M signatures could be related to both intercellular (E:M) cooperativity, as well as enhanced malignancy of the E/M hybrid cells, each of which were proposed as novel therapeutic targets independent of breast cancer subtype. Huang et al. [2013] showed that intermediate mesenchymal and mesenchymal subsets were prognostic for worse progression-free survival in an ovarian cancer cohort, compared to either intermediate epithelial or epithelial.
As detailed in the Modelling section above, an inferential model built on NCI-60 cell line gene expression data screened multiple E/M hybrid predictors against TCGA data across multiple tumour types [George et al., 2017]. The VIM:CDH1 gene expression ratio combined with the expression of CLDN7 provided the best assignment of various tumour cells into 3 phenotypes – epithelial, E/M hybrid, and mesenchymal. Using this approach, metastatic breast tumours could be categorised as either having an epithelial or E/M hybrid phenotype with the E/M hybrid score, which has tumour type-specific predictive power. Interestingly, for breast cancer, most series showed better outcomes for partial or fully EMT transitioned tumours relative to epithelial ones when observed in cancer specimens taken before treatment, the exception being a cohort that received neoadjuvant chemotherapy where E/M hybrid tumours post-chemotherapy had lower DFS. This raises the possibly that E/M hybrids could have context-dependent significance, at least relative to chemotherapy in breast cancer.
Several recent studies have further demonstrated potential prognostic significance of the E/M hybrid state with various methodologies. Gonzalez et al. [2018] identified clades of E-cadherin/vimentin co-positive cells in high-grade serous ovarian cancer, which correlated with metastatic trajectory, and provided a link to a rarer subset of vimentin/cMyc/HE4 co-positive cells that correlated with poor prognosis. Karacosta et al. [2019] used mass cytometry time-course analysis in experimental systems to resolve lung cancer EMT and MET states. They constructed a lung cancer reference map of EMT and MET states referred to as the EMT-MET PHENOtypic STAte MaP (PHENOSTAMP), allowing them to characterise the EMT-MET profile of clinical lung cancers with single-cell resolution. They identified 3 epithelial states and 3 partial EMT states, 1 mesenchymal state and 1 MET state, and identified a “continuous sweep” of these states when assessing clinical NSCLC samples, which resonated with different NSCLC cell lines, although the impact of these findings on prognosis was not explored. A further study [Navas et al., 2020] used high-resolution digital microscopic immunofluorescence analysis (IFA) of β-catenin to quantitate and colocalise E-cadherin and vimentin at subcellular resolution, in an assay they dubbed EMT-IFA. EMT-IFA analysis of core-needle biopsies from various advanced metastatic carcinomas identified E/M hybrid cells, the proportions of which were often increased by different carcinoma-specific therapies. The EMT-IFA was proposed as a method for clinical monitoring of tumour adaptation to therapy. Focal adhesion kinase inhibition was identified in vitro as a potential therapy to prevent tumour regrowth from E/M hybrid cells. Most recently, Godin et al. [2020] reported a sequential chromogenic immunohistochemical multiplex (SCIM) technique to quantify E/M hybrid status through pan-CK and VIM analysis in urothelial carcinomas, in relation with cancer aggressiveness [Godin et al., 2020]. Using this methodology, they demonstrated E/M hybrid phenotypes in urothelial carcinomas at the time of diagnosis, and found them to be strongly associated with poor prognosis independent of standard clinicopathological features. Although dichotomous scoring of the presence of the E/M hybrid was associated with worse OS (64 vs. 100%, p < 0.0001) and DFS (20 vs. 70%, p < 0.0001), more detailed quantitation of the E/M hybrid did not offer any further stratification.
CTC phenotype analysis has shown an association between E/M axis CTC sub-types and poorer OS and DFS survival of PDAC patients [Sun et al., 2019]. Epithelial (E), hybrid (H), mesenchymal (M) and total CTCs all correlated strongly with worse DFS and OS. In terms of OS, E-CTCs showed the strongest correlation, whereas H-EMT CTCs had a slightly less strong but still significant association. In contrast, a similar but smaller study using the CanPatrol RNA-ISH methodology in colorectal cancers found both total CTCs and M-CTCs, but not E− or E/M hybrid-CTCs, were significantly associated with unfavourable progression-free survival in univariate analysis, and only M-CTCs remained significantly associated in multivariate analysis [Hou et al., 2020]. Only M-CTCs were associated with OS, and then only in univariate analysis; however, with only 19 and 10 deaths in these studies, respectively, larger cohorts are required to fully delineate CTC sub-type implications. Nonetheless, CTCs are an abundant source of E/M hybrid cells with strong overall clinical implications, although biological impact may be tumour type-specific.
The concept that E/M hybrids contribute to tumour progression prompts the hypothesis that “phenotypic stability factors” that promote the E/M hybrid state, could also correlate with poor prognosis and, by extension, could prove useful therapeutic targets. In reality, this could depend on whether a given marker is just expressed in the hybrid state or whether it actually contributes to maintaining that state. In modelling, OVOL1/2 proteins inhibited EMT but variably expanded the E/M hybrid phenotype [Jia et al., 2015] or enhanced the movement to the epithelial state [Watanabe et al., 2014]. Possibly reflecting this variability, they were relatively highly expressed in non-small cell lung carcinoma E/M hybrid cells, which were resistant to EGFR blockade, expression levels only falling in fully mesenchymal cells, although the association of OVOL1/2 with survival was not reported in this study [Fustaino et al., 2017]. However, OVOL2A expression correlated with sensitivity to Palbociclib in breast cancer [Finn et al., 2009]. Further to this, high OVOL2 mRNA levels in breast cancer patients correlated with a better metastasis-free survival (HR 0.415, p = 0.0002) and overall survival (HR 0.465, p = 0.0046). It could be hypothesized that it is the prevention of full mesenchymal transition rather than the promotion of the E/M hybrid state that drives this association with better prognosis [Watanabe et al., 2014].
Also in breast cancer, intermediate levels of pluripotency factor OCT4 (POU5F1) associate with stem cell behaviour and self-renewal [Niwa et al., 2000]. This range of OCT4 is enriched as cells moved from either an E or M state to an E/M hybrid position [Pasani et al., 2020], this hybrid E/M state correlating with worse prognosis [Grosse-Wilde et al., 2015]. Consistently, OCT4 also correlates with worse outcome in prostate cancer [Kerr and Hussain, 2014] and oesophageal cancer [Nagaraja and Eslick, 2014] but not in glioma [Dahlrot et al., 2013].
Integrin subunit beta 4 (ITGB4 - CD104) can identify mesenchymal-like cells in culture that have stem cell-like properties and reside in an intermediate E/M state [Bierie et al., 2017]. In patients with “triple-negative breast cancers” who received chemotherapy, elevated ITGB4 expression was associated with a worse 5-year relapse-free survival. This observation extended to inferior progression-free survival for high ITGB4 expression in lung adenocarcinoma, stage 4 serous ovarian cancer, and gastric cancer [Bierie et al., 2017]. A separate study found ITGB4 to also associate with poorer OS in colorectal carcinoma [Li M et al., 2019].
P-cadherin has also been postulated as an E/M hybrid marker. P-cadherin can interfere with epithelial cell-cell adhesion and promote invasion and metastasis, but only in the presence of E-cadherin. A 3-protein EMT signature applied to 500 breast cancers, using E-cadherin and vimentin combined with P-cadherin expression, showed that P-cadherin identified an intermediate state between E and M phenotypes, associating with the expression of breast cancer stem cell markers and a poor prognosis in breast cancer [Ribeiro and Paredes, 2015]. This concurred with an earlier study also showing P-cadherin correlated with poorer 10 years OS, disease-specific survival and distant relapse-free interval in univariate analysis, although the association was not seen in multivariate analysis [Turashvili et al., 2011].
Taken together, there is relatively good data linking specific factors associated with the hybrid state with treatment resistance and poor outcome. However, whether the pathogenesis of the hybrid state relates to traits of the hybrid state itself, such as cancer stem cell features, or whether it is the ability of hybrid cells to readily access to both E and M phenotypes through plasticity, remains unknown. Further work is required to assess the utility of targeting or manipulating these molecules for therapeutic gain. Collectively, the E/M hybrid phenotype appears to offer considerable promise as a unique state with prognostic potential, and utility in monitoring therapy response, for which specific assays can be developed using different technologies, which are summarised in Table 3.
Conclusions, State of Play, and Potential Needs
Clearly, there is growing evidence for a preponderance of E/M hybrid states in carcinoma systems, and that these states often are associated with more aggressive malignant or metastatic capacity. As detailed above, a growing number of computational approaches have been developed to illustrate and characterise E/M hybrid states, and theoretical prediction and support for the stability of particular intermediates has been gleaned from mathematical modelling around some of the well-characterised molecular feedback systems. The E/M hybrid state, in a carcinoma context, is somewhat unique, such that opportunities exist in the prognostic and predictive biomarker setting that show promise, and new technologies are being developed to assess this. To our knowledge, whether these E/M hybrid states show specific sensitivity/resistance profiles to cancer therapies remains to be seen; however, it is likely that many of the studies implicating EMT in therapy resistance have been carried out with cells that exhibit an E/M hybrid phenotype. Studies that shed further light on the reasons behind the metastatic advantages of the E/M hybrid state, their biomarker potential based on unique co-expression of epithelial and mesenchymal markers, and opportunities for specificity in therapeutic targeting, will be helpful in determining and achieving the exploitation of EMP in cancer care.
Acknowledgement
The manuscript has been uploaded to the Preprint server:doi: 10.20944/preprints202008.0023.v1 [Jolly et al., 2020].
Conflict of Interest Statement
None of the authors have any conflicts of interest.
Funding Sources
M.K.J. and E.W.T. were supported by a SPARC grant awarded by MHRD, Government of India (SPARC/2018-2019/P303/SL). M.J.D. and E.W.T. received foundational support from the National Breast Cancer Foundation (CG-10-04). M.J.D. is funded by Cancer Council Victoria APP1187825, Cure Brain Cancer and the National Breast Cancer Foundation (CBCNBCF-19-009), and the Betty Smyth Centenary Fellowship in Bioinformatics. S.B. was supported by a QUT Postgraduate Research Award scholarship. The Translational Research Institute is funded by a grant from the Australian Government.
Author Contributions
The review structure was conceived by E.W.T., M.K.J., M.J.D., A.R., and all authors contributed equally to the writing.
References
Additional information
M.J.D. and E.W.T. share senior authorship.