Abstract
The link between environmental exposure and onset of psychopathology has been well documented, yet the pathway is not fully understood. Epigenetic modifications are thought to play a role in the manifestation of disease as studies have shown that early environmental exposures can influence epigenetic variation in both humans and other animals. As a result, epigenetic epidemiology studies with a specific focus on psychopathology will play an important role in elucidating the pathway to disease onset. In order to gain a clear perspective of where this field currently stands, here we provide a brief review of important issues in epigenetic epidemiology studies of psychopathology, including causal inference, common study designs, challenges faced with current study designs, and the importance of a life course perspective. We provide the reader with relevant examples of studies when appropriate, with a particular focus on studies that have examined the epigenetic modification of DNA methylation. Implications for future research are also discussed.
Introduction
The onset of psychopathology has been linked to heritable factors and environmental influences [1]. Twin studies have shown the heritability of schizophrenia to be approximately 83%, with the remaining 17% due to individual environmental exposure(s) [2]. Similarly, twin studies have reported heritabilities of 30-42% for major depressive disorder (MDD) [3] and post-traumatic stress disorder (PTSD) [4,5], and 50-86% for anorexia nervosa [6] and alcohol dependence [7]. These findings suggest that genes and/or the environment play a role in the development of mental disorders [8]; yet to date, relatively few studies have identified specific genes that determine the risk for such disorders. Recent work indicates that epigenetic factors play an important role in shaping the risk for mental disorders due in large part to their ability to translate environmental exposures into phenotypic outcomes. Epigenetic modifications include DNA methylation (DNAm), histone modifications, and noncoding RNA [9,10,11], all of which regulate DNA accessibility and subsequent transcription via chemical modifications that do not alter the underlying DNA sequence. Growing research has implicated epigenetic modifications in psychopathology [12,13,14,15]. Much of this evidence has been provided by studies grounded in epidemiology, which can provide multilayered datasets that assess both biological and social factors shaping the risk for mental disorders. The goal of this review is to provide a brief overview of epigenetic epidemiology studies focusing on psychopathology, including a discussion of causal interference, common study designs and their accompanying challenges, and the importance of a life course perspective. Studies discussed in this review will focus on DNAm, which has been the focus of the majority of work to date pertaining to psychopathology.
Causal Inference
The overall goal of epigenetic epidemiology studies focused on psychopathology is to examine the factors thought to possess a casual role in the disease. Several challenges such as confounding and reverse causation [16] are important issues to consider as epigenetic changes may in fact be a consequence of the disease rather than the cause (fig. 1) [17]. Animal studies have shown that epigenetic changes occur in response to experience and impact physiological reactivity [18]. The bidirectional influences of epigenetics and the environment pose challenges for epigenetic studies, and inferring cause versus consequence for mental disorders (as well as other complex diseases) will remain a challenge until a baseline is established for population-level epigenetic variation in future work. Despite the progress made by genome-wide association (GWAS) and candidate gene studies in establishing a partial genetic etiology for many psychiatric disorders [19,20,21,22], the variability associated with the risk for psychopathology is not yet fully accounted for, permitting well-designed epigenetic studies to make a substantial contribution toward characterizing this risk more completely.
Study Designs
One of the most common epigenetic epidemiology study designs is the retrospective case-control design, matching unrelated participants with known mental disorders to unrelated controls on characteristics such as age and gender. Many of these studies have utilized whole blood tissue samples to study DNAm in Alzheimer's disease [23], PTSD [24,25,26], schizophrenia [27], and depression [28], amongst others, reporting significant differences in DNAm between cases and controls. Other studies have used postmortem brain samples to examine DNAm patterns in schizophrenia [29,30,31,32], bipolar disorder (BD) [29,32,33], and suicide [34,35]. One drawback of the retrospective case-control design is that the measures are taken after ‘insult' or onset of disease, and researchers therefore have no baseline (pre-disease) measures for the affected individual(s). Epigenetic studies that include assessments across multiple tissues are one way to address this challenge, since similar epigenetic marks across tissues are likely indicative of disease risk, as they would have had to arise before disease onset early in development. For example, one study reported lower DNAm at five CpG sites in the human leukocyte antigen complex group 9 gene (HCG9), previously implicated in psychosis [32], associated with BD in brain, blood, and germline (sperm) tissues [33]. Another study reported MB-COMT DNAm differences in saliva from schizophrenia and BD patients paralleled those previously observed in the brain [36]. These examples demonstrate that examining multiple tissues within a case-control framework is a robust approach that can be used to strengthen the causal inferences of observed associations beyond typical case-control studies.
Monozygotic twins (MZT), discordant for the psychopathology of interest, are the focus of another epigenetic study design. For MZT, discordance suggests nonshared environmental factors - specifically, epigenetic factors - are essential to understanding the etiology of the disorders. A recent MZT study examining MDD reported significantly greater global DNAm variance in MDD-affected versus unaffected twins, suggesting widespread dysregulation of DNAm in those with MDD [37]. A genome-wide DNAm study linked SLC6A4 to BD, reporting significant DNAm differences between twins [38], while another found eight loci associated with BD and schizophrenia [39]. Other MZT studies have adopted a more focused candidate gene approach [40,41].
Longitudinal study designs are particularly advantageous as they allow for the identification of epigenetic variants associated with the disease prior to onset, thereby increasing the likelihood that the variants are etiological factors for the disease of interest. For example, the use of biobanked samples from a military cohort has led to the identification of relatively higher Alu repetitive sequence DNA pre-trauma PTSD cases versus controls pre-deployment [42]. Examination of PTSD-associated genes revealed on average significantly lower DNAm pre-deployment in cases versus controls for IL18 [43]. These pre-deployment findings reveal differences existed pre-exposure and therefore may be epigenetic risk factors for PTSD.
Included within the longitudinal framework are intervention studies, which allow researchers to measure participants pre- and post-intervention (frequently after the onset of disease). Few studies to date have employed this study design. One study of the dopamine transporter gene SLC6A3 reported no DNAm differences in alcohol-dependent patients in cases and controls at baseline or baseline versus post-intervention (psychotherapy) [44]. Conversely, work in brain-derived neurotrophic factor (BDNF) revealed psychotherapy-associated changes in DNAm in borderline personality disorder (BPD) [45]. At baseline, BPD cases exhibited increased DNAm at two BDNF loci compared to controls. Over the course of the treatment with intensive dialectical behavior therapy, significant decreases in depression measures and on average increased DNAm were observed for BPD patients; however, further analysis revealed the increased DNAm resulted from nonresponders, whereas responders exhibited decreased DNAm [45]. Earlier BDNF studies reported similar heterogeneity with pharmacotherapeutic response to treatment [46,47]. For PTSD patients, Yehuda et al. [48] reported pre-treated psychotherapy DNAm levels predicted treatment outcome in the glucocorticoid receptor (GR) NR3C1 but not in FKBP5, and that a decrease in FKBP5 DNAm was associated with recovery. These studies highlight the potential to develop epigenetic biomarkers for treatment response based on both pharmacotherapeutic and psychotherapeutic interventions. Epigenetic epidemiology study designs and references to example studies relevant to psychopathology are summarized in table 1.
Shortcomings with Current Methods
Methodological considerations unique to epigenetic epidemiology are reviewed in this section. Initial studies examining portions of the human genome found tissue-specific DNAm [49,50]. Later studies showed that correction for cell-specific variation is imperative as measuring average tissue-specific epigenetic profiles may miss cell-specific variations [51,52] contributing to disease risk. Houseman et al. [53] developed an analytic strategy to adjust for cell-specific variation to estimate the proportion of immune cells in unfractionated whole blood. This method does not require fresh blood samples, works on a DNAm microarray platform, and uses differentially methylated regions as markers of immune cell identity (independently validated [54]). This approach has not yet been used for mental disorders; however, it has been used in epigenome-wide association studies (EWAS) for arthritis [17] and smoking [55], providing support for the application of this method in blood-based DNAm studies aiming to elucidate etiological factors in psychopathology. Similar methods exist for brain tissues [56].
Tissue selection is an important consideration as the ideal tissue - brain - is not readily accessible within an epidemiologic context. Most population-based epigenetic studies rely on blood [24,26,28,31,42,43], with a smaller number of studies utilizing lymphoblastoid cell lines [57,58], saliva [36], and buccal samples [59] for DNA. Recently, blood and buccal samples were compared as a surrogate for EWAS, finding buccal cells were more significantly hypomethylated (verified by ENCODE data [60]), indicating potential hotspots in gene regulation. In addition, buccal cells had a greater overlap with hypomethylated sites from other tissues than did blood. This finding suggests that inclusion of buccal cells in epigenetic epidemiology studies stands to enhance EWAS investigations of non-blood-based diseases such as mental disorders. A recent study identified and adjusted for cellular heterogeneity in different cell types for DNAm studies without the use of a reference dataset [61], providing further support for the use of buccal cells in EWAS.
Epigenetic studies should ideally consider all layers of genomic information (DNA sequence, epigenetic, and gene expression) to elucidate the pathways involved in complex diseases. This approach has not yet been widely applied in psychiatric disorders; however, one recent study examined the impact of genomic variation on epigenetic marks in BD, reporting an enrichment of methylation quantitative trait loci (mQTL) and expression quantitative trait loci in cis to the implicated single-nucleotide polymorphism (SNP). Stronger associations were observed when mQTLs were used to restrict the number of SNPs examined [62], identifying one SNP previously associated with BD [63]. These results highlight the importance of integrated functional genomics in identifying relevant epigenetic variation of mental disorders. In the future, it will become increasingly important to identify epigenetic marks actively associated with disease phenotype and gene expression versus those passively linked to better define the biomarkers of disease and the targets for interventions [64].
A challenge not yet discussed is the intra- and interindividual epigenetic variation and its stability over time. Without a firm understanding of these parameters, the assessment of the expected effect size, i.e. the association of an epigenetic mark with the disease risk, will remain a challenge. Even though global DNAm has been shown to change over time [65], specific loci can be more stable [66,67]. Small (<5%) changes currently characterize DNAm differences associated with disease [68]; therefore, it is imperative to define the intra- and interindividual epigenetic variability of putative disease-associated loci. To establish expected ranges of epigenetic variation in normal and diseased states, epigenetic databases such as the epigenomics roadmap will be vital. In addition, future EWAS will likely aid in increasing our knowledge of epigenetic variation [69].
As mentioned above, small (<5%) DNAm differences are reported to be significant in many epigenetic studies; however, these small changes fall within the technical limitations of certain methods [39,70,71,72,73,74]. Specifically, both Pyromark [75] and Sequenom [76] report a 5% variability of assays, while high-throughput methods such as Illumina's HM450K platform advertise reliable detection of Δβ = 0.2 with a false-positive rate of <1% [77]. Limitations of reliable detectable differences make the study design, replication, and validation of results crucial in epigenetic studies. Including replicates within a study helps determine the variability of an assay as well as confirm results. Obtained results may then be validated with an additional method and/or replicated with the same method in an independent population. These approaches serve to increase confidence for small detected DNAm differences, while also providing vital data for establishing the variability within a particular dataset.
Sample size and multiple hypothesis testing are serious challenges in epigenetic epidemiology studies. Typically, it is not uncommon to come across a broad range of sample sizes in such studies, even among those mentioned in this review. Smaller samples, however, may be underpowered to detect small effect sizes. In addition, the genome scale, yet relatively low-cost, of array-based methods have made them the method of choice in many population-based epigenetic studies. However, the risk of type I errors increases in such studies given the hundreds of thousands of loci (i.e. hypotheses) being tested. Increasing the sample size can provide more power to detect smaller effect sizes, and adjusting for the family-wise error rate or applying a false discovery rate can serve to screen out false-positives in array-based methods. A further approach to alleviate concerns regarding multiple hypothesis testing may be the development of epigenetic risk scores (ERS), modeled after genetic risk scores utilized by GWAS of psychiatric disorders over the last 5 years [78,79,80,81]. The creation and implementation of ERS for epigenetic studies would enable researchers to test the effect of multiple loci implicated in a disease of interest, but as a single, one-tailed hypothesis within their dataset, thus reducing the multiple hypothesis test burden. The successful use of genetic risk scores in GWAS in psychiatric studies [82,83,84,85] provides a plausible model for the development of ERS in psychiatric epigenetic epidemiology studies.
Life Course Perspective
Early-life experiences, specifically adverse events, increase the prevalence of mental disorders later in life [86,87,88], suggesting a biologic pathway between early-life adversity (ELA) and psychopathology. However, the epigenetic factors that may contribute to the link between ELA and subsequent mental disorders remain poorly understood, despite evidence of their likely importance in BPD [45], depression, and PTSD [25]. Animal studies have shown direct evidence for the influence of ELA on epigenetic factors that shape the risk for mental illness over the life course, in particular within stress-relevant genes such as the glucocorticoid receptor NR3C1, a key effector of hypothalamic-pituitary-adrenal axis function [18,89,90]. Subsequent studies in humans have found similar associations between ELA and NR3C1 DNAm [73,91,92,93,94,95]. Recent ELA-related studies have expanded to include the investigation of another hypothalamic-pituitary-adrenal axis gene, FKBP5, a negative regulator of GR activity [74,96,97]. Results show that childhood maltreatment is associated with reduced DNAm in glucocorticoid response elements within this locus [98], and that this association is moderated by genotype. These results help elucidate our understanding of how ELA increases the risk of later psychopathology. As much of the reported epigenetic-ELA associations have been based on samples obtained during adulthood, future work would benefit from obtaining samples more proximate to the actual stressor(s), provided follow-up biological and psychopathological data into adulthood is also collected.
Conclusions
This brief review has provided an overview of epigenetic epidemiology studies pertaining to psychopathology, summarizing current literature and discussing challenges associated with these studies. Most population-based studies will continue to utilize DNAm as their epigenetic measure of interest, as it is a relatively stable epigenetic mark for which there exist affordable methods for obtaining genome-scale data. An ongoing challenge in the field is the selection of an appropriate sample size with respect to DNAm and other epigenetic measures. The power of these studies will depend on the spectrum of DNAm in cases and controls [69], which is currently unknown for both mental and other complex disorders. Future studies will aid in establishing this baseline and likely incorporate multiple measures of genomic variation to better understand the functional relevance of epigenetic variation [68]. Standards for reporting results, controlling for multiple hypotheses, and replication in independent cohorts need to be established for EWAS, as they have for GWAS of psychiatric disorders. Finally, existing data should be periodically reviewed as more information is learned about the extent of the intra- and interindividual epigenetic variation over time, across tissues, and in disease/non-disease states to more accurately identify epigenetic variation that is truly causative of disease.