Abstract
It is emerging that the environment, particularly in early life, can cause long-lasting epigenetic changes that are accompanied by latent effects on health outcomes. Furthermore, the reversibility of some epigenetic marks in animal models indicates that, once recognised in early life, such epigenetic changes may be reversible, providing potential avenues for disease prognosis, prevention and treatment. Twin studies, which were originally used to calculate gross components of genetic and environmental influence on phenotype, are now being used to associate specific genetic and epigenetic variants with specific human conditions and diseases. Epigenome-wide association studies of phenotypically discordant, genetically ‘identical' monozygotic twins have the power to focus solely on epigenetic association with disease and are providing information about gene-environment interaction and variable penetrance. Going beyond association, such studies can help generate epigenetic biomarkers to predict disease long before clinical onset, which has important implications for disease prevention. Furthermore, when epigenetic information on a genome scale is combined with other ‘omics', twin studies will be a strong force for the improvement of human health.
Epigenetics and Human Disease
Epigenetics is defined as both ‘the interactions of genes with their environment, which bring the phenotype into being' [1] and ‘the structural adaptation of chromosomal regions so as to register, signal or perpetuate altered activity states' [2]. The former is simplistic but encapsulates the essence of epigenetics, and the latter encompasses changes that happen within and between cell cycles and incorporates the various mechanisms involved, including factors affecting chromatin accessibility, gene regulation and three-dimensional nuclear organisation. Pathways to epigenetic change start with the binding of sequence-specific proteins or non-coding RNA to DNA, and/or chromatin remodelling events that loosen the interactions between DNA, RNA and proteins [3]. Epigenetic marks such as DNA methylation and covalent histone modifications can be perpetuated through cell division to provide an epigenetic ‘memory' of an initial event. Paradoxically, such perpetuation may be interrupted and even reversed by stochastic or environmental events [4,5]. Such events occur during human development at a frequency generally correlated to the rates of cell division and growth. These epigenetic changes may be neutral, they may contribute directly to human disease or they may represent latent effects that, through further environmental interactions, predispose to later disease. These ideas are central to the Developmental Origins of Health and Disease (DOHaD) hypothesis, for which there is growing evidence [6,7,8].
Epigenome-Wide Association Studies
Central to revealing associations between environment and disease are epigenome-wide association studies (EWAS). Like their genetic counterpart (genome-wide association studies), EWAS test the null hypothesis that each of thousands to millions of genomic locations are not associated with a disease, phenotype or environment, and statistical significance is often adjusted due to the effects of multiple testing [9]. DNA methylation at the CpG dinucleotide has been the most common epigenetic mark studied in EWAS, mainly because of its robustness to the method of storage. Through treatment with sodium bisulphite, which deaminates only unmethylated CpG to uracil, these changes can be analysed using a number of different downstream technologies [10]. Whole peripheral blood and buccal epithelium have been the most studied tissues, mainly due to the ease of collection. However, other tissues assayed include sorted subpopulations of blood cells [11,12], adipose tissue [13,14], muscle biopsies [15] and placenta [12].
EWAS have led to the discovery of disease-associated epigenetic changes in cancer (reviewed in [16,17]) and in other complex diseases such as diabetes [18,19,20,21], obesity [22] and in phenotypes such as post-traumatic stress disorder [23] and ageing [24,25]. EWAS have also been used to discover associations between the environment and the epigenome. One notable example of this is studies demonstrating associations between smoking and methylation at the AHRR (aryl hydrocarbon receptor repressor) gene either during pregnancy [26] or in adult smokers [27,28]. Other environmental EWAS have focused on the effects of in vitro fertilization on DNA methylation in placenta and cord blood [29]; plasma homocysteine and methylation in cord blood [30]; institutionalisation at an early age with methylation in whole blood [31]; early life parental stress [32] and, in adipose tissue, response to exercise in adults [13]. It is important to note that, apart from the association of smoking with AHRR methylation, there have been very few EWAS associations replicated in multiple, independent studies. Cancers are the only diseases for which systematic reviews and meta-analyses have demonstrated consistent associations of methylation with disease [33,34,35,36].
Notes of Caution for EWAS
The issues associated with interpreting EWAS have been reviewed elsewhere [9,37,38]. The most important issue is whether a disease-associated epigenetic difference is the cause or consequence of disease. This can be overcome via longitudinal studies that assess the epigenetic state in individuals before and after the onset of disease [19], although this can be a costly approach. Alternatively, neonatal dried blood spots can be used as ‘before' samples [39,40,41,42]. However, caution is needed in interpreting findings from such studies because blood cell heterogeneity can change over time. Another way to address cause versus effect is by using Mendelian randomisation, in which the association of either an environment or an epigenetic state with a randomly segregating genetic polymorphism is used [43]. Epigenetic marks are also highly tissue-specific, so ideally, the tissue examined in EWAS will be associated with the disease under study, e.g. blood for immune-related disorders. However, for conditions such as obesity, in which adipocytes and pancreatic β islets would be the most appropriate tissues, or brain regions for psychiatric disorders, sampling the appropriate tissue is much less feasible unless post-mortem samples are used. Tissue heterogeneity may also be a problem if the level of cell heterogeneity differs between cases and controls, for example in immune disorders. Cell sorting of tissues such as blood to specific lineages can address this issue [24], as can adjusting for levels of heterogeneity in samples [44,45,46]. Thirdly, unlike DNA sequence, epigenetic profiles change over time, especially in early life, so it is important that age matching is performed wherever possible. Other notes of caution include where in the genome to look for epigenetic associations, which of the multitude of different epigenetic marks to study, the accuracy of the technology used, small effect sizes and finding the appropriate method of data analysis [9,47].
The Clinical Use of Epigenetic Biomarkers
A biomarker is any biological characteristic - molecular, physiological, or biochemical - that can be measured objectively and act as a predictor or indicator of a normal or a diseased state [48,49]. The ultimate aim of studies of the relationship between specific environments and epigenetic state is to produce biomarkers for such environments in the absence of environmental data. The second class of biomarker, and probably the most important for the study of human health, is the predictive biomarker. For this type of biomarker, forewarned is forearmed; if a disease is detected before the onset of overt symptoms, then attempts could be made to reverse the disease-associated epigenetic marks and/or to tailor early interventions that may lessen the severity of the disease. Similarly, epigenetic state can be used to aid in determining prognosis before disease onset, during disease or following treatment [16,48,49,50,51]. As mentioned above, some of these biomarkers have been validated across many studies [34,35,36] and are destined for clinical trials. Outside cancer, notable examples that are yet to be fully validated include predictors of child adiposity [52], progression to overt cardiovascular disease in adults [53] and response to weight loss programs [54,55]. Finally, to capitalize on the reversibility of epigenetic marks, epigenetic therapies have been advocated and a number, based on manipulation of DNA methylation and histone modification, are undergoing clinical trials for cancer [56,57,58]. Notably, there may also be future opportunities for non-pharmaceutical interventions, for example diet [54], exercise [13] and even relaxation [59].
Twin Studies
Dizygotic (DZ) twins, who arise from two separate fertilisation events, share half their genetic polymorphisms on average, while monozygotic (MZ) twins, who result from the splitting of a single embryo, share all of theirs. MZ and DZ twins provide a natural study design for determining the contribution of nature (heredity) and nurture (environment) to the variation of complex disease. Classic twin studies enable partitioning of phenotypic variance within a population into additive genetic variation (also known as heritability), shared environmental and non-shared environmental variation. In twin pregnancies, shared environment can be viewed as factors such as nutrition that originate from the mother, and non-shared environment as factors specific to each foetus, umbilical cord or placenta [12,60]. MZ twins almost always have their own umbilical cord and amniotic sac, even though roughly two thirds of them share the same placenta. Twin studies have the unique advantage of providing insights into the importance of such non-shared risk factors; for example, birth weight discordance within MZ twins is more likely to reflect differences in placental function or blood and nutrient supply to the individual foetus rather than shared maternal factors [12].
Twins and Epigenetics
Twin studies have begun to move from identifying relative effects of genes and environment on phenotype to identifying specific genetic and environmental factors associated with human disease [61]. They have also highlighted some important phenomena related to disease causation and severity. These issues will be covered in the remainder of this review.
Twins, Disease Latency and Liability
Early studies of twins concordant for leukaemia but discordant for time of onset discovered that in the majority of cases of paediatric leukaemia, most initiating chromosome translocations arise in utero and are succeeded by secondary genetic events that may occur at different time points, which can cause a discordance in the time of disease onset [62,63]. In addition, a pair of leukaemia-discordant twins has been described, in which the affected twin had a constitutive increase in methylation of BRCA1 (breast cancer 1, early onset) [64]. Similarly, in a pair of twins with the autosomal dominant neurofibromatosis type 1 (NF1), phenotypic severity was found to correlate with the level of methylation at the NF1 gene in lymphocytes [65]. Recently, a female twin pair was described in which both twins exhibited symptoms of Rett syndrome, an X-linked neurodevelopmental disorder, with only one twin meeting the full diagnostic criteria [66]. Although no evidence was found for genetic differences within the pair, a small number of differences in methylation and expression were found in fibroblasts in genes associated with brain function and skeletal development. The degree to which these phenomena are widespread is not yet known, although it has been proposed that environment-induced epigenetic change may cause the crossing of a disease-specific liability threshold, resulting in an apparently increased level of disease penetrance [61].
The Discordant MZ Twin Approach to Reveal Epigenetic Contribution to Disease
Studying epigenetic differences between disease-discordant MZ twins enables epigenetic factors to be studied while controlling for genetics, age, sex and shared environmental exposures [37,67,68,69]. Early studies in this area focused on candidate genes (reviewed by Bell and Saffery [68]). A notable example is the association between discordance for caudal duplication anomaly and methylation in the AXIN1 promoter within a pair of MZ twins [70]. Individuals with this disorder can have duplications of different organs in the caudal region (towards the base of the spine), and although no genetic differences in individuals with caudal duplication have been found within the human AXIN1 gene, mutations of Axin1 are associated with caudal duplication in mice. Of interest, both twins from the discordant pair had methylation levels higher than those of control twins, possibly indicating a high liability threshold reached only by the affected twin. In a second notable example, methylation at the PPARGC1A (peroxisome proliferator-activated receptor γ, coactivator 1 α) and HNF4A (hepatocyte nuclear factor 4, α) was associated with type 2 diabetes in muscle and adipose tissue, respectively [14].
EWAS of Disease-Discordant MZ Twins
Candidate gene approaches for studying epigenetic differences in disease-discordant MZ twins can be informed by known genetic associations, animal models or known biology of disease. However, they frequently lead to negative results, making genome-scale approaches preferable [68]. Indeed, it has been suggested that all studies of the role of epigenetics in disease should start with those involving discordant MZ twins before case-control studies [9]. As in genome-wide association studies, most EWAS have involved adjusting p values for multiple testing and have generally found less than 50 differentially methylated loci (e.g. [18,23]), while in a minority of studies no associations have been found (e.g. [71]). Some studies have not adjusted for multiple testing and have typically identified hundreds of disease-associated loci (e.g. [19,20,31]).
Many disease-discordant twin EWAS have focused on autoimmune disorders such as systemic lupus erythematosus, rheumatoid arthritis, dermatomyositis, multiple sclerosis (all reviewed in Bell and Saffery [72]). In addition, Rakyan et al. [19] measured DNA methylation in monocytes in 15 discordant MZ twins and found 132 diabetes-associated loci. Although no adjusting for multiple testing was performed, top differentially methylated genes were confirmed in an independent cohort and in presymptomatic individuals with diabetes-associated autoantibodies.
Neurodevelopmental and neurocognitive conditions have also featured strongly in disease-discordant twin EWAS. Two studies have focused on bipolar disorder, which is characterized by recurring manic and depressive episodes. Kuratomi et al. [73] studied DNA methylation in lymphoblastoid cell lines from a single discordant twin pair using a low resolution technique and replicated their top differentially methylated gene, PPIEL (peptidylprolyl isomerase E-like gene) in a case-control study of singletons; they also found an inverse correlation between expression and methylation. A larger study looked at methylation in DNA from peripheral blood from 11 pairs discordant for bipolar disorder, together with 11 pairs discordant for schizophrenia [45]. The ST6GALNAC1 (α-N-acetylgalactosaminide α-2,6-sialyltransferase) gene was found to be significantly differentially methylated in bipolar disorder and in both psychotic conditions combined. Results for this gene were verified using an independent technique and some evidence was presented for limited validation in post-mortem brains from an independent case-control psychosis cohort.
Autism spectrum disorders (ASD) are disorders of neural development that are characterized by impairments in social interaction, verbal and non-verbal communication, and by restricted, repetitive or stereotyped behaviour. Following studies showing RNA and microRNA expression differences in lymphoblastoid cell lines from 3 MZ pairs discordant for ASD symptoms, which identified known ASD-associated genes [74,75], Wong et al. [76] performed a discordant twin EWAS with a difference. In addition to analysing genome-scale methylation in whole blood from 10 ASD-discordant MZ pairs, they also studied MZ pairs discordant for the ASD-related phenotypes of social autistic traits (n = 9), autistic restricted repetitive behaviours and interests (n = 8) and communication autistic traits (n = 8), together with a case-control analysis in 100 individuals. Of interest, genes such as NFYC (nuclear transcription factor Y, γ) and MBD4 (methyl-CpG binding domain protein 4) were identified by multiple analyses. No parallel analysis was performed on post-mortem samples. However, as for the EWAS for psychosis [45], they found epigenetic heterogeneity between families, which correlates with previous findings of genetic heterogeneity in ASD [77].
Epigenetic Biomarkers from Discordant MZ Twin EWAS
As mentioned above, it is important that EWAS move beyond association studies to reveal epigenetic biomarkers that are predictive of disease in asymptomatic individuals. This has been done for type 1 diabetes [19] and recently for breast cancer [78]. Using whole blood from 15 MZ twin pairs discordant for breast cancer, a high resolution EWAS identified 403 differentially methylated CpG sites, many of which were associated with known breast cancer genes. The top differentially methylated gene was DOK7 (docking protein 7), which codes for a protein important for neuromuscular synaptogenesis, and results for this gene were validated in an independent cohort of discordant MZ twins. DOK7 was also differentially methylated in breast cancer tissues and cell lines and, most importantly, was differentially methylated in blood, collected on average 5 years prior to diagnosis. If validated in other cohorts, methylation of DOK7 will join a small number of validated cancer biomarkers that are currently undergoing clinical trials [34,35,36].
It is well known that low birth weight is associated with elevated risk for cardiovascular and metabolic disease [79,80]. In an EWAS of blood mononuclear cells, umbilical cord endothelial cells and placentas from 22 MZ and 12 MZ new-born twins, regression analysis of methylation on birth weight revealed a general association between methylation of genes involved in metabolism and biosynthesis [12]. Taken together with a candidate gene study linking methylation of RXRA (retinoid X receptor-α) to adiposity in late childhood in two independent cohorts [52], these finding suggest that biomarkers for childhood cardiovascular disease risk may one day be used to assess risk at or soon after birth.
Epigenetics and Development
MZ and DZ twins can be discordant for DNA methylation at birth, at the level of a single genetic locus [81] and throughout the whole genome [12]. This implies that a combination of stochastic and intrauterine environmental factors can influence the epigenome during pregnancy [4,82]. As twins age, evidence from cross-sectional [83] and longitudinal studies [84] has indicated that older twins could be more epigenetically discordant than younger twins. However, two longitudinal studies of DNA methylation in the buccal epithelium from twins have indicated that the story is not that simple, at least in childhood, with some pairs even becoming more similar with age, possibly influenced by environment [85,86]. Twins could converge epigenetically when transiting from discordant prenatal environments to a concordant postnatal environment [86] and twins could diverge if they experience discordant environments as they age [83]. Further, larger studies are needed to clarify these issues.
Heritability of DNA Methylation
Knowledge of epigenetic variance components will aid in discovering regions of the genome associated with specific shared and non-shared environments and possibly eliminating regions of high heritability from such studies. Recent genome-scale studies linking variation in DNA sequence with variation in gene expression [87], DNA methylation [88] and nuclease accessibility [89] have shown that 10-20% of epigenetic marks are under tight genetic control. In twin studies, epigenetic state at any particular locus can be viewed as a phenotype, and therefore the twins model can be used to partition its variance. Genome-wide analysis of DNA methylation in twins has found evidence for a lower and tissue-dependent level of heritability of DNA methylation of neonates [12], teenagers [90] and adults [91,92]. One such study found that at birth, the average heritability of DNA methylation was 5% in placenta, 7% in umbilical vein endothelial cells and 12% in mononuclear cells, with non-shared environment being the largest component of variation of DNA methylation [12]. However, studies are needed to show how variance components may vary between tissues and change with age.
Conclusions and Future Directions
Epigenome-wide studies of discordant MZ twins are essential to understand the effects of environmental and stochastic factors in human disease. Longitudinal studies will also enable the discovery of epigenetic biomarkers that can be used for prediction, diagnosis and prognosis of disease and will have the opportunity to improve the management and lessen the impact of disease. Such biomarkers, when validated across cohorts, will also have an important economic impact, with emphasis changing from treatment to detection and prevention. To reach this goal, other epigenetic marks need to be analysed in the same way as DNA methylation and integrated with other ‘omics' such as metabolomics [93,94], proteomics [95] and metagenomics [96].
Acknowledgements
J.M.C. is supported by the Australian National Health and Medical Research Council, the Murdoch Childrens Research Institute and by the Victorian Government's Operational Infrastructure Support Program. J.M.C. thanks Sara Hassan and Jane Loke for critical reading of the manuscript.