Chromosome 18 Loss of Heterozygosity in Small Intestinal Neuroendocrine Tumours: Multi-Omic and Tumour Composition Analyses

Introduction: Small intestinal neuroendocrine tumours (siNETs) are rare neoplasms which present with low mutational burden and can be subtyped based on copy number variation (CNV). Currently, siNETs can be molecularly classified as having chromosome 18 loss of heterozygosity (18LOH), multiple CNVs (MultiCNV), or no CNVs. 18LOH tumours have better progression-free survival when compared to MultiCNV and NoCNV tumours, however, the mechanism underlying this is unknown, and clinical practice does not currently consider CNV status. Methods: Here, we use genome-wide tumour DNA methylation (n = 54) and gene expression (n = 20 matched to DNA methylation) to better understand how gene regulation varies by 18LOH status. We then use multiple cell deconvolution methods to analyse how cell composition varies between 18LOH status and determine potential associations with progression-free survival. Results: We identified 27,464 differentially methylated CpG sites and 12 differentially expressed genes between 18LOH and non-18LOH (MultiCNV + NoCNV) siNETs. Although few differentially expressed genes were identified, these genes were highly enriched with the differentially methylated CpG sites compared to the rest of the genome. We identified differences in tumour microenvironment between 18LOH and non-18LOH tumours, including CD14+ infiltration in a subset of non-18LOH tumours which had the poorest clinical outcomes. Conclusions: We identify a small number of genes which appear to be linked to the 18LOH status of siNETs, and find evidence of potential epigenetic dysregulation of these genes. We also find a potential prognostic marker for worse progression-free outcomes in the form of higher CD14 infiltration in non-18LOH siNETs.


Introduction
Small intestinal neuroendocrine tumours (siNETs) represent a rare form of tumour which develops from neuroendocrine cells. siNETs present with unusually low mutational burden compared to most neoplasms, the only recurrent gene mutation being in CDKN1B, with the karger@karger.com www.karger.com/nen largest scale analysis of these siNET mutations finding 8.5% of tumours with pathogenic CDKN1B mutations [1]. Chromosomal arm-level copy number variation (CNV) has been used to subtype siNETs into chromosome 18 loss of heterozygosity (18LOH), multiple CNV (MultiCNV) excluding 18LOH, and individuals without any CNVs (NoCNV). Patients with 18LOH tumours are observed to have the best progression-free survival, however, the underlying mechanisms are not known and there are no targeted treatments based on 18LOH status currently used in the clinic [2].
DNA methylation (DNAm) is a reversible DNA modification that is known to influence gene expression levels, with increased DNAm at gene promoters commonly associated with the repression of gene activity [3]. DNAm disruption is also a hallmark in the development and progression of many cancers, through mechanisms such as the silencing of tumour suppressor genes [4] or the activation of oncogenes [5]. Although DNAm has been used to subtype siNETs, the role of DNAm in the development or progression of tumours within each subtype is unknown. Recently, the immune landscape of siNETs has been studied, with significant heterogeneity discovered [6], and avenues for immune-based treatment research options have been proposed [7]. However, little remains known about the immune landscape of siNETs, if this impacts clinical outcomes, or how it relates to 18LOH status.
Here, we integrate multi-omic data, DNAm, and gene expression, in primary tumour samples to investigate independent and combined contribution to siNET 18LOH status. We investigate tumour microenvironment with respect to 18LOH status using DNAm proxies of tumour cell type composition and its prognostic impact within non-18LOH tumours.

Methods
Study Population 54 primary siNET tumours were analysed from a previously published siNET study of individuals from the United Kingdom, the USA, and Canada [2]. Table 1 describes the cohort in detail.

Data Details
Illumina HumanMethylation450 BeadChip (450k) DNAm data and Illumina cDNA-mediated annealing, selection, extension, and ligation (DASL) gene expression data were used in the following analyses. Data preparation is described in Karpathakis et al. [2], and full DNAm data are available at the gene expression omnibus accession: GSE73832. Only primary tumour samples were used in our analyses: 54 DNAm profiles, and a subset of 20 of these tumours which also had overlapping DASL array data available.

DNAm Analysis
The pipeline in the CHAMP R package [8,9] was used to process the 450k DNAm data. DNAm levels were estimated as beta values, represented by a continuous value between 0 and 1. The pipeline filtered out probes with detection p value >0.01 or with <3 beads in at least 5% of samples, targeting non-CpG sites, or with non-specific binding [8]. BMIQ was used for normalization [10]. Following probe filtering, 381,052 CpG sites were retained for analysis.
An epigenome-wide association study (EWAS) was applied to test for differences in DNAm levels between CNV status of tumours. Linear modelling as implemented in the Limma R package [11] was used to test for associations, controlling for the covariates of age and sex. A false discovery rate <0.05 was used to identify associations at CpG sites called differentially methylated positions (DMPs). Missing age and sex values in 31% (n = 17) and 26% (n = 14) of patients, respectively, were imputed from DNAm array data. Sex was imputed using the EWASex R package [12] and age using the DNAm age estimate by the online Horvath Methylation Age Calculator [13]. Online supplementary Figure 1 (for all online suppl. material, see https://doi.org/10.1159/000530106) compares imputed ages against known ages in the samples, of these known ages there was a correlation of r = 0.46 (p = 0.004) using Pearson's product-moment correlation. Of 40 individuals with sex recorded, crossover between these analyses and the sex and age EWAS we use in the following analyses. The CNA function from the CHAMP package was used to estimate CNV from the 450k probe intensities followed by GIS-TIC2.0 [14] to determine arm-level copy number (length threshold = 80%, segment mean threshold = ±0.2). Tumours were subtyped into three copy number groups as described previously [2]; chromosome 18LOH (n = 23), NoCNV (n = 15), and MultiCNV (n = 16). The combined MultiCNV and NoCNV subtypes are referred to as non-18LOH (n = 31). The promoter region of genes was defined as between 1500bp upstream and 200bp downstream of the ENSEMBL (GRCh 37) transcriptional start site.

Gene Expression Analysis
Illumina DASL array RNA expression data were processed according to standard protocols as previously described [2]. The Limma R package was used to carry out differential expression analyses; no covariates were accounted for in regression models due to small sample sizes. An FDR threshold of <0.05 was used to identify differentially expressed genes. Sex chromosome genes were excluded from this analysis.

Gene Ontology Analysis
The MissMethyl R package [15] function "gometh" was used to test enrichment of DMPs with annotated Gene Ontology (GO) categories. Two enrichment analyses were run, one using DMPs with higher DNAm in the 18LOH group and another using DMPs with lower methylation in the 18LOH group. The background for this analysis was the 381,052 CpG site probes included in the EWAS. A FDR <0.05 was used as the threshold for GO term enrichment.

Differentially Methylated Regions
Since differential expression is more likely to be influenced by multiple rather than a single DMP in a gene promoter, DMRcate [16] (with default settings) was used to identify differentially methylated regions (DMRs) in the promoters of differentially expressed genes. DMRs of interest were defined as FDR <0.05 as calculated by DMRcate.

Significance-Based Modules Integrating the Transcriptome and Epigenome Analyses
The Significance-based Modules Integrating the Transcriptome and Epigenome (SMITE) R package [17] was used to integrate epigenetic and transcriptomic data together to identify networks of differentially regulated genes between 18LOH and non-18LOH primary tumours. We set the potential effect directions of methylation in the promoter and gene body and H3Kme1 to bidirectional (there is no H3Kme1 data present, SMITE incorporates methylation at chromosomal H3Kme1 regions into its analysis). Data types were weighted to increase focus on transcription and promoter methylation (methylation_promoter = 0.45, expression = 0.45, methylation_body = 0.05, methyl-ation_h3k4me1 = 0.05). We defined SMITE modules of interest as those including at least one differentially expressed gene (identified in gene expression analysis) and having a SMITE module p < 0.05. The SMITE p value for a module indicates the extent to which a connected subnetwork within the REACTOME protein-protein interaction network is enriched with differentially expressed and methylated genes [17]. Pathway enrichment for each module was performed by the SMITE package using the KEGG [18] database of genetic pathways.

Tumour Composition Analyses
We used a variety of tools to estimate tumour cell composition in the 54 primary siNETs from DNAm data. MEPurity [19,20] provided direct estimates of tumour purity. The epiSCORE R package [21] estimated proportions of broad cell types (e.g., immune cell, epithelial cell, etc.). For EpiSCORE, we were unable to find a siNET-specific reference, so we used the colon and pancreas reference datasets provided by EpiSCORE. The "constAVBetaTSS" function was used to calculate average DNAm across gene promoters for each individual, and "wRPC" was used to estimate cell type proportions. The MethylCIBERSORT R package [22] was also used to estimate cell type proportions using several built-in references including immune cell types. We used "V2" signatures for the large intestine, pancreas, thyroid, and liver. In total, proportions were estimated for 9 cell types: CD8+, CD14+, CD19+, CD56+, CD4 effector, eosinophils, neutrophils, regulatory T, and fibroblast cells.
Linear models were used to compare cell type proportions between 18LOH and non-18LOH subtypes. Cell type proportions were modelled as exposures, 18LOH status as the outcome, and sex and age included as covariates. An unadjusted p value <0.05 was used to identify cell types of interest. Differing variance between 18LOH and non-18LOH tumours for cell type predictions was analysed with F tests, using a p value threshold of <0.05.

Tumour Composition Survival Analysis
We use Cox proportional hazards models to analyse progression-free survival outcomes in non-18LOH tumours based on tumour composition subgroups. Using the 4 Methyl-CIBERSORT CD14+ estimates, we created a single CD14+ estimate via the addition of all CD14+ estimates for each individual. Individuals were then split on the median into two groups called "CD14+ Negative-Low" and "CD14+ High." A Cox proportional hazard model was used with the CD14+ groupings as exposures and progression-free survival data as the outcome, and a multivariable model was also used to control for age and sex as covariates.

Study Population Characteristics
The 54 siNET primary tumours with DNAm data included both 18LOH (n = 23) and non-18LOH (n = 31) tumours as defined by chromosome 18 copy number. Within the non-18LOH subtype, there were 15 tumours from the NoCNV subtype and 16 from the MultiCNV subtype. We note that in cases where an individual has MultiCNV, including 18LOH, they are only included in the 18LOH group. Table 1 describes the cohort.

DNAm Differences between siNET Subtypes
An EWAS comparing the 18LOH and non-18LOH subtypes identified 27,464 DMPs (FDR <0.05) (Fig. 1a), of which 6,258 were found in gene promoters. Most DMPs (25,279) were less methylated in 18LOH compared to non-18LOH tumours. DMPs were underrepresented on chromosome 18 compared to the rest of the genome. Only 1.9% (83) of all CpG sites tested on chromosome 18 were DMPs compared to 7.2% of CpG sites measured genome-wide.
Gene Ontology Enrichments among Sites Associated with 18LOH Status DMPs more methylated in 18LOH tumours were annotated to genes enriched for 15 Gene Ontology (GO) terms (FDR <0.05), and DMPs less methylated in 18LOH tumours were annotated to genes enriched for 77 GO terms. The sets of enriched terms appear to be quite different. For example, whereas less methylated DMP enrichments included one immune-related term (GO: 0006955 immune response, FDR = 0.002), more methylated DMP enrichments included none related to the immune system. Full results from the GO analyses are available in online supplementary Tables 1 and 2.

Gene Expression Differences between siNET Subtypes
A subset of 20 siNET tumours underwent gene expression analysis. We observed 12 differentially expressed genes (DEGs) between the 18LOH (n = 7) and non-18LOH (n = 13) subtypes (Fig. 1b), all of which had higher expression in 18LOH compared to non-18LOH.
None of these genes were located on chromosome 18. See online supplementary Figure 3 for gene specific expression plots.

Differentially Methylated and Expressed Genes
There were 94 CpG (43 promoter associated) sites with DNAm measurements annotated to the 12 DEGs associated with 18LOH status. These sites included 9 DMPs (3 promoter associated) from our EWAS of 18LOH status (see online suppl. Fig. 4), This DMP rate of 9.6% within the DEGs represents a 33% increase in DMP rate compared to the epigenome-wide DMP rate of 7.2%. We note that all 12 genes show higher expression in 18LOH tumours, whilst the expression of these 9 DMPs all show lower methylation in 18LOH tumours.
We also tested regional DNAm differences between 18LOH and non-18LOH tumours restricted to CpG sites with DNAm measurements annotated to the 12 DEGs. We identified three DMRs: SCRT1 (4 CpGs), AMPD3 (8 CPGs), GRAMD2 (10 CpGs). DNAm levels at individual CpG sites within the DMRs are available in online supplementary Table 3 and online supplementary Figure 5. subnetwork corresponding to a module including differentially expressed gene AMPD3. It contains 42 other genes, none of them DEGs, with 170 associated DMPs (including AMPD3 DMPs). Figure 2b shows the subnetwork of 19 genes containing DEG KCNMB2, containing 57 DMPs. Both subnetworks are enriched for KEGG metabolic pathways; details of the Kegg analysis of these networks are available in online supplementary Tables 4 and 5.
Immune Cell Type Differences between 18LOH and Non-18LOH MEPurity-based tumour purity estimates were not statistically different between 18LOH and non-18LOH tumours (p = 0.756), however, purity estimates were more variable in non-18LOH tumours (F = 0.415 [95% CI: 0.192, 0.942], p = 0.036) (online suppl. Fig. 6). epiSCORE provided proportion estimates for 5 cell types for the colon reference and another 6 for the pancreas reference. Estimated proportions were different between 18LOH and non-18LOH tumours for three of the 11 cell types (FDR <0.05, controlling for sex and age). Estimated epithelium proportions from the colon reference were higher in 18LOH tumours, immune cell proportions in the pancreas reference were higher in non-18LOH tumours, and endocrine cell proportions in the pancreas reference were higher in 18LOH tumours (Fig. 3a).
MethylCIBERSORT was used to estimate cell type proportions from four available references: large intestine, pancreas, thyroid, and liver. Each reference includes 9 cell types. Estimated CD14+ cell proportions were consistently higher in non-18LOH for all four references (p = 0.004-0.036). CD14+ estimates were also more variable in non-18LOH tumours (see Fig. 3b).

Immune Cell Type Variation among Non-18LOH Tumours
Estimated CD14+ cell type proportions by Methyl-CIBERSORT appeared bimodal, splitting non-18LOH tumours into two groups, one mode included estimates near zero [11] and the other estimates much larger than zero [20]. An EWAS comparing these two groups, however, identified no DMPs.

Discussion
Here, we integrate transcriptomic and DNAm data to carry out the first integrated multi-omic analysis of siNETs in the literature. We identified 12 differentially expressed genes and then determined whether the promoter DNAm of these genes was different between 18LOH and non-18LOH tumours. We identified 9 DMPs (3 promoter associated) within these gene regions (2 GRAMD2, 2 AMPD3, 2 FLYWCH2, 2 KCNMB2, 1 NR4A1) (Promoter associated: 2 FLYWCH2, 1 NR4A1), all inversely associated with gene expression levels, suggesting a role for DNAm in the regulation of these genes, most notably the inverse relationship between promoter expression in FLYWCH2 and NR4A1 and gene expression. We then also identified DMRs within three of these genes, a ten CpG DMR spanning the promoters of GRAMD2, an eight CpG DMR spanning the TSS200-GeneBody of AMPD3, and a three CpG DMR spanning the 3′UTR of SCRT1.  We further integrated DNAm and gene expression data using SMITE [17] to build gene modules that reflect altered cellular states between the 18LOH and non-18LOH tumours. We identified two modules of interest, which are built around the differentially expressed genes AMPD3 (Fig. 1a) and KCNMB2 (Fig. 1b). KEGG [18] based enrichment analysis of these two modules identified a number of metabolic pathways enriched with genes from both modules. The top result (as measured by over representation p value) for the AMPD3 module is "Purine metabolism -Homo sapiens (human)"; purines are involved in cell proliferation, and as such perturbations in purine metabolism have been associated with cancer progression [23]. In the KCNMB2 module, the top result is "cGMP-PKG signalling pathway -Homo sapiens (human)," a pathway which has been implicated in the inhibition of breast and colon cancer growth [24,25], and conversely, its suppression has also been shown to inhibit cervical cancer growth [26].
Taken together, these analyses suggest that AMPD3 and KCNMB2 may be differentially expressed between 18LOH and non-18LOH tumours due to an earlier event leading to it being differentially methylated. AMPD3 has not previously been linked to siNETs to our knowledge but is associated with gastrointestinal stromal tumours [27], and its methylation state has previously been identified as a biomarker for ovarian cancer [28]. The other differentially expressed and methylated genes have not yet been associated with cancer.
The immune system is often found to play a role in recognizing malignancies, and a number of therapies harness the immune system to treat cancer [29]. Programmed death-ligand 1 (PD-L1), has been shown to be differentially expressed among siNETs [6,30], and there is evidence that tumour-infiltrating lymphocytes can be activated by the presence of siNETs and lead to increased T lymphocyte degranulation [7]. There has, however, been little detail of the immune cell landscape of siNETs within the context of the CNV subtypes observed within siNETs.
Using a range of tools, we aimed to uncover tumour microenvironment differences between 18LOH and non-18LOH tumours. Average tumour purity estimates from MEPurity were similar between 18LOH and non-18LOH but were more variable in non-18LOH tumours. EpiSCORE was used to predict a range of cell types using colon and pancreatic cell references. We found that 18LOH tumours had higher predicted epithelium and endocrine cells counts than non-18LOH and that non-18LOH had higher predicted immune cell counts. The endocrine cell count prediction may be of particular interest given the nature of neuroendocrine tumours.
Given the EpiSCORE predictions of greater amounts of immune cells in the non-18LOH tumours, we then used methylCIBERSORT and CIBERSORTx [22,31] to estimate immune cell infiltration in the siNETs. We found that there is a significant difference between 18LOH and non-18LOH tumours for CD14+ cells, with higher values predicted in non-18LOH tumours (Fig. 4b).
Within non-18LOH tumours, there is also notably larger variance in CD14+ prediction, with individuals either predicted as having zero/near zero CD14+ cells (as is the case in 18LOH tumours), but there also appears to be a non-18LOH specific subgroup of individuals with nonzero predicted CD14+ cell counts. There is currently no literature implicating the presence of CD14+ cells in siNETs, but CD14+ cell infiltration is typically associated with poorer outcomes in some cancers [32][33][34], which we find to be the case in the non-18LOH tumours, although there is also evidence that infiltration leads to positive outcomes in colorectal cancer [35].
The initial evidence offered here for immune cellbased subtypes within non-18LOH tumours requires validation. Although we used a range of related nonetheless non-target cell type references to estimate cell type proportions, a siNET-specific reference is needed. Furthermore, small sample sizes within the non-18LOH survival analysis increase the chance of this finding being the result of a type 1 error. A higher powered study may also be able to examine further potential confounding effects, and/or stratify analyses by other measures of tumour composition such as grade, stage, and tumour location.
General limitations of our study include the relatively small sample size which we begin our study with, and the increasingly small size available as we begin to interrogate potential sub-subtypes present in the tumours. Use of bulk tissue analysis only provides us with averages for specific genes/CpG sites across the heterogeneous cellular landscape of the tumour analysed, which may mean that important, but rare cell types/states are not represented well in the analysis [36]. The use of bulk tissue-based analyses also limits our ability for biological interpretation of tumour microenvironment results, given they are estimates, and specific cell predictions may be including similar cell types, which could be restricted from analysis using single-cell technologies. Furthermore, we reiterate that cell composition predictions were made using references that were not specific to small intestinal/neuroendocrine tissues, and as such will require further validation, in particular to directly confirm the presence of CD14+ in the non-18LOH tumours. Due to these factors, it must be noted that this analysis is intended to be hypothesis generating, and all findings will require further evaluation in larger datasets of well-defined clinical samples.

Conclusions
Here, we have carried out the most detailed integrated analyses of siNETs to our knowledge in the literature, and we define a small number of potentially epigenetically regulated genes, which may be useful as drug targets in the future. We also offer initial evidence for tumour microenvironment differences between 18LOH and non-18LOH tumours and within non-18LOH tumours, including a potential immunological indicator of progression-free survival.

Statement of Ethics
All patients provided written informed consent for their tissue to be analysed in this study, which was approved by NHS Camden and Islington Community Research Ethics Committee (Ref: 09/ H0722/27).

Conflict of Interest Statement
No conflicts to report.

Funding Sources
S.W. is supported by Cancer Research UK (C18281/A30905). P.Y., C.R., M.S., and S.W. are supported via the following: Medical Research Council Integrative Epidemiology Unit at the University of Bristol (MC_UU_00011/5) and the Cancer Research UK Integrative cancer epidemiology programme (C18281/A29019). CT was supported via the Cancer Research UK -Clinician Scientist Grant (2011-2016) for the duration in which the data were procured and processed.

Author Contributions
S.W. and M.S. planned analyses. S.W. carried out all analyses and wrote the first draft. M.S., P.Y., C.R., A.W., and C.T. supervised the project and critically reviewed the manuscript.

Data Availability Statement
Raw data from the original study were deposited in GEO (accession number: GSE73832). Further enquiries can be directed to the corresponding author.