Abstract
Introduction: Sex steroid hormone fluctuations may underlie both reproductive disorders and sex differences in lifetime depression prevalence. Previous studies report high comorbidity among reproductive disorders and between reproductive disorders and depression. This study sought to assess the multivariate genetic architecture of reproductive disorders and their loading onto a common genetic factor and investigated whether this latent factor shares a common genetic architecture with female depression, including perinatal depression (PND). Method: Using UK Biobank and FinnGen data, genome-wide association meta-analyses were conducted for nine reproductive disorders, and genetic correlation between disorders was estimated. Genomic Structural Equation Modelling identified a latent genetic factor underlying disorders, accounting for their significant genetic correlations. SNPs significantly associated with both latent factor and depression were identified. Results: Excellent model fit existed between a latent factor underlying five reproductive disorders (χ2 (5) = 6.4; AIC = 26.4; CFI = 1.00; SRMR = 0.03) with high standardised loadings for menorrhagia (0.96, SE = 0.05); ovarian cysts (0.94, SE = 0.05); endometriosis (0.83, SE = 0.05); menopausal symptoms (0.77, SE = 0.10); and uterine fibroids (0.65, SE = 0.05). This latent factor was genetically correlated with PND (rG = 0.37, SE = 0.15, p = 1.4e−03), depression in females only (rG = 0.48, SE = 0.06, p = 7.2e−11), and depression in both males and females (MD) (rG = 0.35, SE = 0.03, p = 1.8e−30), with its top locus associated with FSHB/ARL14EP (rs11031006; p = 9.1e−33). SNPs intronic to ESR1, significantly associated with the latent factor, were also associated with PND, female depression, and MD. Conclusion: A common genetic factor, correlated with depression, underlies risk of reproductive disorders, with implications for aetiology and treatment. Genetic variation in ESR1 is associated with reproductive disorders and depression, highlighting the importance of oestrogen signalling for both reproductive and mental health.
Introduction
Female reproductive disorders such as endometriosis and polycystic ovary syndrome (PCOS) are common debilitating conditions. Eugonadal reproductive disorders have a complex aetiology, which may include an abnormal response to hormones. In recent years, genome-wide association studies have been successful in identifying common genetic variants associated with risk for endometriosis and uterine leiomyomata [1, 2], also known as uterine fibroids (UF), PCOS, and ovarian ageing [3]. In addition to providing insights into key biological pathways underlying each of these disorders, these studies have also demonstrated that there is shared aetiology between reproductive disorders, an association supported by epidemiological data such as shared symptoms across the disorders that can confuse diagnosis [4, 5] (e.g., menstrual pain, heavy menstrual bleeding, gynaecological cysts). Further understanding of the shared aetiology of reproductive disorders will have implications for diagnosis and treatment and can be leveraged to increase power of gene discovery.
Genetic studies have also found evidence of a shared genetic aetiology between reproductive disorders and depression [6, 7]. These findings build on previous literature showing evidence of comorbidity between depression and reproductive disorders, described as reproductive mood disorder [8‒13]. Differences in reproductive hormone levels have been suggested as an explanation for the disparity in lifetime prevalence of depression between males and females visible after the onset of puberty [14‒16], and depression is more common at certain points in a woman’s reproductive lifecycle, also encapsulated using the term “reproductive depression” [17] or, more recently, “reproductive-related depressive episodes” [18]. In particular, the perinatal period, a time of high hormonal fluctuation, is associated with increased risk of depression.
Genomic Structural Equation Modelling (GSEM) [19] is a recently developed multivariate method that evaluates the joint genetic architecture of complex traits and disorders from genome-wide association study (GWAS) summary statistics. This method has been used to identify latent genetic factor structures underlying risk to multiple disorders and traits, e.g., psychiatric disorders and externalising traits [20], substance abuse [21], and reading and language-related skills [22]. Understanding the shared genetic risk across traits or disorders can boost power of gene discovery and provide important mechanistic insights relevant for treatment of related disorders.
In this study, we further explore the shared aetiology among reproductive disorders and between reproductive disorders and depression by firstly evaluating the lifetime comorbidity among nine reproductive disorders and between reproductive disorders and depression using data from the UK Biobank (UKB). We then conduct GWAS analyses of the reproductive disorders in UKB and meta-analyse with results from FinnGen, two large publicly available genetically informative population-based studies, and compare the genetic correlations between individual reproductive disorders and lifetime depression and perinatal depression (PND). Lastly, we apply GSEM to investigate the shared genetic architecture among reproductive disorders, identify genetic variants associated with an underlying latent factor, and investigate the genetic correlation between this latent factor and lifetime depression and PND.
Methods
UKB Sample
The UKB is a community-based cohort established to investigate risk factors for the major diseases of middle and old age [23]. Upon recruitment, detailed information about sociodemographics, lifestyle, current health status, and diagnosed diseases was collected using a standardised questionnaire in a nurse-led interview, and saliva samples were taken for genetic sampling. The UKB data are also linked to the Hospital Episode Statistics database which provides ICD-10 [24] or ICD-9 [25] codes for all hospital admissions to NHS hospitals in the UK between 1997 and 2015. Data from a follow-up Mental Health Questionnaire (MHQ) [26], completed by 157,366 people, were made available in 2017.
UKB has approval from the North West Multi-Centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB) approval. This approval means that researchers do not require separate ethical clearance and can operate under the RTB approval. This approval was granted initially in 2011 and is renewed every 5 years; hence, UKB successfully applied to renew it in 2016 and 2021.
Reproductive Disorders: Cases and Controls
Cases with reproductive disorders were identified in UKB using summary ICD-9 and ICD-10 hospital inpatient admission diagnoses or by self-report during the nurse-led interview. Reproductive disorders were included in the analysis according to the following criteria (1) sufficient sample size (at least 1,000 cases) (2) clear diagnosis (i.e., disorders described using descriptors such as “unspecified” or “other” were omitted unless the diagnosis was clear). An exception to the 1,000-case criterion was made in the case of PCOS, and a further exception to clarity of diagnosis was made in the case of dysmenorrhoea since women who had “pain and other conditions associated with female genital organs and menstrual cycle” were not differentiated. The nine reproductive disorders considered, with ICD-10 codes and comments, are PCOS; ovarian cysts; endometriosis; dysmenorrhoea; pelvic inflammatory disease; menopausal symptoms; menorrhagia; uterine fibroids; and polyps (online suppl. Tables S1, S2; for all online suppl. material, see https://doi.org/10.1159/000533413). Both ICD-10 and ICD-9 codes were used to specify the disorder. Criteria for cases were self-report of the disorder in the nurse-led interview or a record of hospital diagnosis of the disorder. No age range was applied, but only women of European ancestry, based on their genetic similarity to predefined sets of individuals from the 1000 Genomes Project, using an analysis of forty principal components, were used in this analysis. To estimate the sample prevalence, case numbers were compared to the full set of females of European ancestry for whom data were available (n = 247,524), with the exception of menopausal symptoms, where case numbers were compared to the full set of UKB European female participants who had experienced menopause. For each reproductive disorder apart from menopausal symptoms, the same set of controls was used for genetic analysis: European women who did not self-report or receive a hospital diagnosis for any gynaecological or reproductive disease/disorder (n = 93,712). For menopausal symptoms, the additional criteria of having experienced menopause at the baseline interview were applied to controls (n = 55,151).
Depression Cases and Controls
Only participants who had completed the Mental Health Questionnaire (MHQ) [26], which included the Composite International Diagnostic Interview – Short Form (CIDI-SF) [27], measuring lifetime depression, were eligible for inclusion. Female depression cases (n = 15,843) met the DSM-5 criteria for lifetime depression. Of women who had experienced at least one live birth, PND cases (n = 2,085) were female depression cases who stated that their depression was “probably related to childbirth” (data field 20445), while non-perinatal depression (NPD) cases (n = 9,094) were women who met criteria for lifetime depression but answered “no” to this question. No information was available with respect to the timing of depression within the perinatal period (before or after delivery). Depression controls (n = 25,016) were women who did not meet MHQ screening criteria for lifetime depression, did not have current depression, had no history of depression or bipolar disorder according to the derived data field 20126, had not seen a medical professional about their mental health, did not meet criteria for post-traumatic stress disorder using the Post-Traumatic Stress Disorder Check List – Civilian Short Version (PCL-S) [28] (data fields 20,494–20,498), and answered “no” to the question “Have you ever been prescribed medication for unusual or psychotic experiences?” (data field 20,466). PND and NPD controls met the additional criteria of having at least one live birth (n = 19,931). The term “female depression” is used to encompass all women meeting criteria for major depression, including parous PND and NPD cases, as well as nulliparous cases.
Lifetime Comorbidity
Lifetime comorbidity of reproductive disorders was evaluated using univariate logistic regression to find the association of lifetime diagnosis of each reproductive disorder with every other reproductive disorder. Comorbidity for menopausal symptoms was assessed using only women who had experienced menopause. Comorbidity between each reproductive disorder and lifetime depression and PND was also assessed using univariate logistic regression, including age as a covariate for depression, and age and number of births as covariates for PND.
Genetic Analysis of Individual Reproductive Disorders
GWAS of Each Reproductive Disorder and Depression Groups
A GWAS was conducted for the 9 reproductive disorders and lifetime depression, PND, and NPD in UKB using fastGWA-GLMM-binary [29] (https://yanglab.westlake.edu.cn/software/gcta/index.html#fastGWA-GLMM), an efficient tool for genetic analysis of binary phenotypes that uses the saddlepoint approximation to adjust for inflation in test statistics due to case/control imbalance and includes batch, assessment centre, and 40 genetic principal components as covariates, with cases and controls of white European ancestry.
SNP-Based Heritability
Individual-level genetic data were used with the Haseman-Elston method to estimate the SNP-based heritability (h2SNP) [30] (https://yanglab.westlake.edu.cn/software/gcta/index.html#Haseman-Elstonregression) of each reproductive disorder on the observed scale. SNP-based heritability on the liability scale was estimated using equation 23 in Lee et al. [31], which requires a user-provided estimate of population lifetime prevalence (online suppl. Table S2).
Reproductive Disorder Meta-Analysis: UKB and FinnGen Data Sets
The FinnGen project (https://www.finngen.fi/fi/tutkijalle) is an academic-industrial scientific research project that aims to analyse the health register and genotyping data of up to 500,000 Finnish biobank donors. Details of the project, including its ethical protocols, approved by the Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS), have been described elsewhere [32]. Individuals were genotyped using Illumina and Affymetrix chip arrays, which were then imputed using the Finnish population-specific SISU reference panel built from six research cohorts, altogether comprising 3,775 whole genome sequences (https://finngen.gitbook.io/documentation/v/r7/methods/genotype-imputation/sisu-reference-panel). Results from GWAS of PCOS, ovarian cysts, endometriosis, UF, menorrhagia, menopausal symptoms, pelvic inflammatory disease, and genital polyps were available in the DF6 release (January 2022) [32]. In addition, summary statistics for “Pain and other conditions associated with female genital organs and menstrual cycle” were used to correspond with dysmenorrhoea. Summary statistics for each FinnGen GWAS were meta-analysed with the corresponding UKB summary statistics using Metal [33] with fixed effects, standard error-based model (https://genome.sph.umich.edu/wiki/METAL_Documentation). Quality control filtered out SNPs with multiple alleles and those not included in both cohorts. For each disorder, summary statistics for the meta-analysis of UKB and FinnGen GWAS were uploaded to FUMA [34] (Functional Mapping and Annotation of Genome-Wide Association Studies, https://fuma.ctglab.nl/, RRID:SCR_017521) for downstream identification of independent genomic loci and lead SNPs with genome-wide significance (p < 5.0e−08) and LD r2 < 0.1 over a range of 1,000 kb. A gene-based test of association was conducted using MAGMA [35] (MAGMA, http://snp-magma.sourceforge.net, RRID:SCR_005757) (version 1.08), with SNPs within a 10-kb window upstream or downstream assigned to a gene. Statistical significance was set at 0.05/number of genes tested (0.05/19,120 = 2.6e−06). MAGMA gene-set analysis was further used to investigate if there is enrichment of association signal in genes differentially expressed in tissues using GTExV8 RNA-seq data [36].
Genetic Correlation
Using summary statistics for each reproductive disorder meta-analysis, Linkage Disequilibrium Score Regression (LDSC) [37] was used to calculate the genetic correlation between each reproductive disorder and every other reproductive disorder, as well as between each reproductive disorder and each of four depression groups: PND, NPD, and all female depression cases (including PND, NPD, and nulliparous cases) using UKB data; and a recent MD meta-analysis [38] including both males and females. An unconstrained LDSC intercept corrected for sample overlap.
Genetic Analysis of Underlying Latent Factor
Genomic Structural Equation Modelling
Given the high rates of comorbidity and genetic correlation between the reproductive disorders, structural equation modelling (SEM) analysis was applied to the UKB/FinnGen meta-analyses of all disorders, using GenomicSEM [19], to investigate whether a common genetic factor contributes to risk for all reproductive disorders. GenomicSEM uses a genetic covariance matrix, expanded to include individual SNP effects, to develop a structural equation model, which, applied to each SNP, defines the effect size of that SNP on latent genetic factors underlying all phenotypes. In addition, a measure of heterogeneity for each SNP, QSNP, indicates the extent to which the null hypothesis that the SNP acts entirely through the common factor is violated. Results were filtered to include only SNPs with non-genome-wide significant Q values (Q_Pval >5.0e−08) and restricted to MAF limits of 10% and 40%, as recommended to produce more stable estimates (https://github.com/GenomicSEM/GenomicSEM/wiki/5.-User-Specified-Models-with-SNP-Effects). FUMA [34] was used to identify loci with independent genome-wide significance (p < 5.0e−08; LD r2 < 0.1) across reproductive disorders, and to conduct a gene-based test of association using MAGMA. In addition, stratified LDSC was used with a wide range of tissue types from GTEx and the Franke lab to identify whether genes with the highest expression in specific tissues were located within regions of enriched heritability [39].
Relationship of Latent Common Factor with Depression
To investigate whether SNPs significantly associated with depression are also significantly associated with the reproductive latent factor, summary statistics for the latent factor were filtered for the set of SNPs with nominal significance (p < 0.05) for each of three depression groups (PND, female depression, and general MD), and loci with genome-wide significance for the latent reproductive factor were identified for each set of SNPs. To investigate the aetiology of the relationship between reproductive disorders and depression, genes situated within these loci, associated with a specific depression group and the latent reproductive factor, were identified. Following the same procedure, a sensitivity analysis used a GWAS of male depression to identify SNPs with genome-wide significance for the latent reproductive factor and significantly associated with the UKB male depression group, selected according to the same criteria as the UKB female depression group (nCases = 8,047; nControls = 31,817).
Statistical Methods
Table 1 provides a summary of all statistical methods utilised in this report. A more detailed description of each procedure has been provided within the individual sections. Figures were generated using ggplot2 [40], qqman [41], and gliffy [42] software.
Section . | Procedure . | Method . |
---|---|---|
Lifetime comorbidity | Association of reproductive disorders with each other and with female MD, PND, and NPD | Logistic regression using R version 4.0.5, including age as a covariate for female MD, and age and number of births as covariates for PND and NPD |
Genetic analysis of reproductive disorders | Conducting GWAS for each reproductive disorder and depression group | FastGWA-binary [29] |
Assessing SNP-based heritability for each reproductive disorder | Haseman-Elston method [30] | |
Meta-analyses of reproductive disorders using UKB and FinnGen summary statistics | Metal [33], using fixed effects, standard error-based model | |
Downstream analyses of GWAS summary statistics: Identification of significant loci and significantly associated genes/tissues | All summary statistics submitted to FUMA [34], which utilises MAGMA [35] for gene-based testing | |
Genetic correlations between reproductive disorders, and between reproductive disorders and depression groups | LDSC [37] used with GWAS summary statistics for calculation of genetic correlations | |
Identifying underlying latent factor structure | Identifying latent factor structure | GenomicSEM [19] used to develop a model of an underlying genetic factor for 5 reproductive disorders |
Downstream analysis of latent factor summary statistics | Summary statistics submitted to FUMA [34], stratified LDSC used to identify regions of enhanced heritability and tissues enriched in expression of associated genes | |
Genetic correlations between latent factor and depression groups | LDSC [37] used with summary statistics for calculation of genetic correlations |
Section . | Procedure . | Method . |
---|---|---|
Lifetime comorbidity | Association of reproductive disorders with each other and with female MD, PND, and NPD | Logistic regression using R version 4.0.5, including age as a covariate for female MD, and age and number of births as covariates for PND and NPD |
Genetic analysis of reproductive disorders | Conducting GWAS for each reproductive disorder and depression group | FastGWA-binary [29] |
Assessing SNP-based heritability for each reproductive disorder | Haseman-Elston method [30] | |
Meta-analyses of reproductive disorders using UKB and FinnGen summary statistics | Metal [33], using fixed effects, standard error-based model | |
Downstream analyses of GWAS summary statistics: Identification of significant loci and significantly associated genes/tissues | All summary statistics submitted to FUMA [34], which utilises MAGMA [35] for gene-based testing | |
Genetic correlations between reproductive disorders, and between reproductive disorders and depression groups | LDSC [37] used with GWAS summary statistics for calculation of genetic correlations | |
Identifying underlying latent factor structure | Identifying latent factor structure | GenomicSEM [19] used to develop a model of an underlying genetic factor for 5 reproductive disorders |
Downstream analysis of latent factor summary statistics | Summary statistics submitted to FUMA [34], stratified LDSC used to identify regions of enhanced heritability and tissues enriched in expression of associated genes | |
Genetic correlations between latent factor and depression groups | LDSC [37] used with summary statistics for calculation of genetic correlations |
Results
Lifetime Comorbidity
Phenotype: Prevalence, Comorbidity, and Association with Female Depression/NPD/PND
After all quality control steps, the sample sizes for the nine reproductive disorders are listed in online supplementary Table S2. All sample sizes refer to cases with white European ancestry identified using UKB data and a lifetime diagnosis of the disorder, rather than current morbidity. Online supplementary Table S2 provides the lifetime prevalence of each disorder reported in the literature, as well as the sample prevalence of each disorder (number of cases/total number of female participants). Sample prevalence is generally much lower than literature estimates, which may reflect that the disorders are not being diagnosed in hospitals, as well as the generally above-average health of UKB participants [43, 44]. Lifetime comorbidities, including odds ratios and significance, are illustrated in Figure 1, with details provided in the online supplementary Table S3.
With the exception of the association of PCOS with menopausal symptoms, every reproductive disorder was significantly associated with every other reproductive disorder. The strongest lifetime comorbid association was for dysmenorrhoea and menorrhagia: of all women in UKB, those with lifetime experience of menorrhagia were 13.4 times more likely to have experienced dysmenorrhoea than those without menorrhagia (CI = 12.7–14.2, p < 0.001). Other strong comorbid associations were menopausal symptoms and genital polyps (odds ratio [OR] = 9.6, CI = 9.2–10.0, p < 0.001), endometriosis with ovarian cysts (OR = 7.7, CI = 7.3–8.2, p < 0.001), and pelvic inflammatory disease with both endometriosis (OR = 8.9, CI = 8.3–9.4, p < 0.001) and ovarian cysts (OR = 7.5, CI = 7.0–8.0, p < 0.001) (Fig. 1; online suppl. Table S3).
Every reproductive disorder was also significantly associated with lifetime risk of female depression, NPD, and PND, with the exception of PND with PCOS, for which there were substantially fewer cases than the other disorders (Fig. 2; online suppl. Table S4). There was no significant difference in the rate of comorbidity of PND compared to depression outside the perinatal period. Dysmenorrhoea had the highest OR for PND (OR = 2.6, CI = 1.9–3.4, p < 0.001).
Genetic Analysis of Individual Reproductive Disorders
SNP-Based Heritability
The SNP-based heritability of each reproductive disorder for observed data, provided in online supplementary Table S5, varied from 0.01 (SE = 0.003, p < 0.001) for PCOS to 0.10 (SE = 0.003, p < 0.001) for menorrhagia. Two values for SNP-based heritability on the liability scale were estimated, based on sample prevalence or prevalence reported in the literature (online suppl. Table S2). Based on sample prevalence, h2SNP (liability) ranged from 0.04 (SE = 0.006, p = 2.1e−12) for menopausal symptoms to 0.21 (SE = 0.05, p < 0.001) for PCOS; and using prevalence reported in the literature, h2SNP (liability) ranged from 0.05 (SE = 0.007, p = 6.9e−12) for pelvic inflammatory disease to 0.51 (SE = 0.05, p < 0.001) for PCOS. Online supplementary Figure S1 illustrates the changing value of h2SNP (liability) over the prevalence range from sample to literature estimate.
Meta-Analysis of Reproductive Disorders
The DF6 release of the FinnGen project included GWAS for the following reproductive disorders: PCOS, ovarian cysts, endometriosis, menopausal symptoms, menorrhagia, pelvic inflammatory disease, genital polyps, UF, and “pain associated with female genital organs and menstruation” which, as for UKB, was used as a representation of dysmenorrhoea. Table 2 shows the sample sizes and the number of SNPs included for each contributing GWAS and for the meta-analysis.
Disorder . | UKB . | Finn_Gen . | Meta-analysis . | ||||
---|---|---|---|---|---|---|---|
nSNPs . | cases . | controls . | nSNPs . | cases . | controls . | nSNPs . | |
Ovarian cysts | 8,545,606 | 9,548 | 93,712 | 16,353,357 | 13,096 | 81,593 | 7,311,541 |
Endometriosis | 7,185,829 | 7,656 | 93,712 | 9,030,356 | 10,029 | 81,593 | 6,498,330 |
UF | 7,186,494 | 17,936 | 93,712 | 9,030,338 | 21,518 | 103,441 | 6,498,168 |
Menopausal symptoms | 8,546,268 | 12,948 | 70,970 | 16,352,545 | 1,491 | 81,593 | 7,312,291 |
Dysmenorrhoea | 8,545,631 | 5,863 | 93,714 | 16,352,700 | 4,062 | 81,593 | 6,831,522 |
Menorrhagia | 8,545,682 | 17,729 | 93,712 | 15,353,278 | 14,230 | 81,593 | 7,311,590 |
Pelvic inflammatory disease | 7,185,996 | 7,167 | 93,712 | 9,030,977 | 14,202 | 132,859 | 6,499,068 |
Genital polyps | 8,545,809 | 13,559 | 93,712 | 16,352,571 | 1,768 | 81,593 | 7,311,717 |
Polycystic ovarian syndrome | 8,545,748 | 563 | 93,712 | 16,354,669 | 797 | 140,558 | 6,832,454 |
Disorder . | UKB . | Finn_Gen . | Meta-analysis . | ||||
---|---|---|---|---|---|---|---|
nSNPs . | cases . | controls . | nSNPs . | cases . | controls . | nSNPs . | |
Ovarian cysts | 8,545,606 | 9,548 | 93,712 | 16,353,357 | 13,096 | 81,593 | 7,311,541 |
Endometriosis | 7,185,829 | 7,656 | 93,712 | 9,030,356 | 10,029 | 81,593 | 6,498,330 |
UF | 7,186,494 | 17,936 | 93,712 | 9,030,338 | 21,518 | 103,441 | 6,498,168 |
Menopausal symptoms | 8,546,268 | 12,948 | 70,970 | 16,352,545 | 1,491 | 81,593 | 7,312,291 |
Dysmenorrhoea | 8,545,631 | 5,863 | 93,714 | 16,352,700 | 4,062 | 81,593 | 6,831,522 |
Menorrhagia | 8,545,682 | 17,729 | 93,712 | 15,353,278 | 14,230 | 81,593 | 7,311,590 |
Pelvic inflammatory disease | 7,185,996 | 7,167 | 93,712 | 9,030,977 | 14,202 | 132,859 | 6,499,068 |
Genital polyps | 8,545,809 | 13,559 | 93,712 | 16,352,571 | 1,768 | 81,593 | 7,311,717 |
Polycystic ovarian syndrome | 8,545,748 | 563 | 93,712 | 16,354,669 | 797 | 140,558 | 6,832,454 |
The FinnGen summary statistics included rare variants with MAF <0.01, which were not included in the UKB summary statistics, and, because of Finland’s unique heritage, including a small founder population and little admixing, some risk variants are unique to Finns and were not included in the meta-analysis. The Manhattan plot for each of nine reproductive disorders is illustrated in online supplementary Figure S2. Of the disorders, endometriosis, UF, ovarian cysts, menorrhagia, polyps, menopausal symptoms, and PCOS reported genome-wide significant loci. These disorders are listed in Table 3, which includes the number of loci found, the most significant locus with associated genes, and other disorders that reported genome-wide significance for the same locus.
. | Endometriosis . | Fibroids . | Menorrhagia . | Ovarian cysts . | Polyps . | PCOS . | Menopausal Symptoms . |
---|---|---|---|---|---|---|---|
Num. loci | 18 | 62 | 7 | 7 | 10 | 2 | 1 |
Top locus: genomic region | 6q25.2 | 17p13.1 | 11p14.1 | 11p14.1 | 3q21.3 | 9q33.3 | 20p12.3 |
Top SNP | rs58415480 | rs78378222 | rs7929660 | rs11031006 | rs2977562 | rs3945628 | rs16991615 |
CHR | 6 | 17 | 11 | 11 | 3 | 9 | 20 |
BP (hg19) | 152562271 | 7571752 | 30317914 | 30226528 | 128106267 | 126535553 | 5948227 |
P | 6.78E−40 | 1.10E−102 | 7.51E−23 | 7.91E−17 | 1.45E−11 | 2.83E−11 | 1.13E−10 |
Top locus: start | 151672185 | 7045529 | 29265662 | 30145730 | 127711997 | 126405933 | 5948227 |
Top locus: finish | 153069721 | 8164352 | 31311001 | 30395871 | 128122820 | 126714111 | 5948227 |
Associated genes | ESR1, SYNE1 | TP53 | FSHB, ARL14EP | FSHB, ARL14B | EEFSEC | DENND1A | MCM8 |
Gene function | Reproductive health and genome integrity | Genome integrity | Reproductive health, neuron connectivity | Reproductive health, neuron connectivity | Genome integrity | Reproductive health | Genome integrity |
Other disorders with same locus | Fibroids | Menorrhagia ovarian cysts | Fibroids ovarian cysts | Fibroids menorrhagia | Endometriosis menorrhagia |
. | Endometriosis . | Fibroids . | Menorrhagia . | Ovarian cysts . | Polyps . | PCOS . | Menopausal Symptoms . |
---|---|---|---|---|---|---|---|
Num. loci | 18 | 62 | 7 | 7 | 10 | 2 | 1 |
Top locus: genomic region | 6q25.2 | 17p13.1 | 11p14.1 | 11p14.1 | 3q21.3 | 9q33.3 | 20p12.3 |
Top SNP | rs58415480 | rs78378222 | rs7929660 | rs11031006 | rs2977562 | rs3945628 | rs16991615 |
CHR | 6 | 17 | 11 | 11 | 3 | 9 | 20 |
BP (hg19) | 152562271 | 7571752 | 30317914 | 30226528 | 128106267 | 126535553 | 5948227 |
P | 6.78E−40 | 1.10E−102 | 7.51E−23 | 7.91E−17 | 1.45E−11 | 2.83E−11 | 1.13E−10 |
Top locus: start | 151672185 | 7045529 | 29265662 | 30145730 | 127711997 | 126405933 | 5948227 |
Top locus: finish | 153069721 | 8164352 | 31311001 | 30395871 | 128122820 | 126714111 | 5948227 |
Associated genes | ESR1, SYNE1 | TP53 | FSHB, ARL14EP | FSHB, ARL14B | EEFSEC | DENND1A | MCM8 |
Gene function | Reproductive health and genome integrity | Genome integrity | Reproductive health, neuron connectivity | Reproductive health, neuron connectivity | Genome integrity | Reproductive health | Genome integrity |
Other disorders with same locus | Fibroids | Menorrhagia ovarian cysts | Fibroids ovarian cysts | Fibroids menorrhagia | Endometriosis menorrhagia |
Large GWAS of endometriosis [45, 46], UF [1, 47‒50], and PCOS [51] have previously been conducted, and so the results for these meta-analyses are presented using supplementary tables and figures only, with details of their genome-wide significant loci and MAGMA-identified genes provided in online supplementary Tables S6–S10 and enriched tissues illustrated in online supplementary Figures S3, S4. Full details of the results for meta-analyses of ovarian cysts, menopausal symptoms, menorrhagia, pelvic inflammatory disease, genital polyps, and dysmenorrhoea are described in the supplementary note, with details of their genome-wide significant loci and MAGMA-identified genes provided in online supplementary Tables S11–S15 and enriched tissues illustrated in online supplementary Figures S5–S7.
Genetic Correlations
The genetic correlation of each reproductive disorder with every other reproductive disorder is illustrated in Figure 3, with details provided in online supplementary Table S16. With the exception of PCOS, all reproductive disorders are significantly genetically correlated with each other. The correlation of polyps with menopausal symptoms is particularly strong (rG = 0.97, SE = 0.17, p < 0.001), as well as dysmenorrhoea with pelvic inflammatory disease (rG = 0.95, SE = 0.12, p < 0.001), although this may be due to the non-specific nature of the diagnosis that was used for dysmenorrhoea. PCOS shows the least genetic correlation with other reproductive disorders.
Most reproductive disorders are also significantly genetically correlated with each of four depression groups (PND, female depression, NPD, and the MD meta-analysis) (Fig. 4; online suppl. Table S17). PND is significantly associated with ovarian cysts, dysmenorrhoea, endometriosis, pelvic inflammatory disease, and UF, and there was suggestive evidence that the genetic correlation for UF, endometriosis, and pelvic inflammatory disease with PND is higher than for depression outside the perinatal period. For all other disorders, the genetic correlation was similar across depression groups.
Genetic Analysis of Underlying Latent Factor
Common Genes
Table 2 identified genes associated with top SNPs that were also associated with genome-wide significant loci from other disorders (ESR1, SYNE1, TP53, FSHB, ARL14EP, and EEFSEC). Altogether, eleven genes were associated with genome-wide variants in at least two disorders (online suppl. Table S18). Of these, four were associated with three disorders: EEFSEC (endometriosis, menorrhagia, polyps); FSHB (UF, ovarian cysts, endometriosis); GREB1 (UF, ovarian cysts, endometriosis); and TP53 (UF, ovarian cysts, and menorrhagia). Of MAGMA-identified genes, 24 were common to at least 2 disorders, including four (SYNE1, GREB1, ARL14EP, and GNAT1) that were common to 3 disorders (online suppl. Table S18).
Structural Equation Modelling Results
Confirmatory factor analysis of the latent model underlying the nine reproductive disorders found moderate model fit (χ2 (27) = 90.08; AIC = 126.08; CFI = 0.96; SRMR = 0.10). Online supplementary Table S19 provides details of unstandardised and standardised factor loadings, based on genetic covariances and genetic correlations as input, respectively, including residual variances for each disorder after removing variance explained by the latent factor. Confirmatory factor analyses of different model permutations, with inclusions based on high factor loadings (ovarian cysts, menorrhagia, and endometriosis) as well as previously established genetic relationships (endometriosis and UF [1]), resulted in a common genetic factor loading on ovarian cysts, endometriosis, UF, menopausal symptoms, and menorrhagia, with excellent model fit (χ2 (5) = 6.3; AIC = 26.4; CFI = 1.00; SRMR = 0.03). Standardised factor loadings for the 5-factor model are illustrated in Figure 5, with details provided in online supplementary Table S19. Using standardised estimates, menorrhagia had the highest loading on the common factor (0.96, SE = 0.05, p = 1.7e−79), followed by ovarian cysts (0.94, SE = 0.05, p = 1.6e−73).
Independent Genome-Wide Significant Loci
Summary statistics for the common genetic factor were filtered according to the minor allele frequency (MAF) of each SNP (0.4 > MAF > 0.1), as well as non-significant Q values, a measure of how strongly the SNP loads on the common factor. This process reduced the number of SNPs from 6,205,552 to 3,055,497. The Manhattan plot of p values associated with the remaining SNPs is provided in Figure 6. FUMA reported 20 independent genome-wide significant loci. Online supplementary Table S20 provides a summary of each locus, including the most significant lead SNP and its closest gene, as well as all genes included in the locus. The top SNP was rs11031006 (OR [G allele] = 1.07, CI = 1.06–1.08, p = 9.1e−33; FREQ = 0.14), located in the genomic region 11p14.1, upstream of both FSHB and ARL14EP, followed by rs7705526 (OR [C allele] = 0.96, CI = 0.95–0.97, p = 2.4e−22; FREQ = 0.34) intronic to TERT in the region 5p15.33 and rs4669753 (OR [G allele] = 1.04, CI = 1.03–1.05, p = 2.8e−19; FREQ = 0.39), exonic to GREB1 in the genomic region 2p25.1. The fourth locus (genomic region 6q25.2) included the genes ESR1 and SYNE1, with its top SNP, rs17803970 (OR [A allele] = 1.05, CI = 1.04–1.07, p = 3.1e−15; FREQ = 0.10), intronic to SYNE1. Other loci with genome-wide significance included WT1 transcription factor (WT1), EEFSEC, renalase (RNLS), dedicator of cytokinesis 5 (DOCK5), ski/dach domain containing 1 (SKIDA1), and runt-related transcription factor 1 (RUNX1). These results were echoed in the MAGMA-identified genes significantly associated with the latent factor (online suppl. Table S21). Stratified LDSC found regions surrounding genes highly expressed in the fallopian tube contribute significantly to the common genetic factor underlying the 5 reproductive disorders (p = 3.1e−06, online suppl. Fig. S8; online suppl. Table S22).
Relationship with Depression: Genetic Correlation and Significant SNPs
Genetic correlation of the latent factor with PND was 0.37 (SE = 0.15, p = 1.4e−03), with general female depression was 0.48 (SE = 0.06, p = 7.2e−11) and with the MD meta-analysis (including both females and males) was 0.35 (SE = 0.03, p = 1.8e−30). Loci with genome-wide significance for the latent factor filtered for SNPs with nominal significance for each of three depression groups: MD, female depression, and PND, are illustrated in Figure 6, with details provided in online supplementary Table S23.
The 6q25.2 locus was nominally associated with all of the depression groups, with the top SNPs intronic to ESR1 (PND: rs851981, pPND = 5.7e−03; female depression: rs3020339, pfemaleDep = 4.1e−03; MD: rs3020342, pMD = 2.0e−04). For PND and MD, this was also the most significant locus that had genome-wide significance for the latent factor. As a comparison, the latent factor was also filtered for SNPs nominally significant for male depression, using the UKB sample. The Chr6 locus was not apparent for this group, with only one SNP nominally associated with male depression (rs17803503), intronic to SYNE1, reporting genome-wide significance for the latent factor (Fig. 6). For female depression, the most significant locus occurred in the region 1p36.12 (rs12045139, pfemaleDep = 3.1e−03), upstream from WNT4. Two loci showed evidence of association specific to PND: 2p25.1 (rs 974163, pPND = 2.6e−02) intronic to GREB1; and 10p12.31 (rs11012732, pPND = 3.0e−02) intronic to MLLT10.
Discussion
In this study, we investigated the common genetic underpinnings of reproductive disorders, finding high genetic correlation for each pair of disorders, as well as common risk variants and genes across disorders and a latent factor underpinning risk for five reproductive disorders. We also found a highly significant genetic correlation of depression with many reproductive disorders, as well as the common genetic factor.
With the exception of PCOS, we found high comorbidity between all disorders, echoed in high genetic correlations, and the identification of independent lead SNPs for the meta-analyses found common risk variants and genes across the disorders. While PCOS results are limited by their small sample size, they demonstrate a genetic profile distinct from other reproductive disorders, particularly endometriosis and dysmenorrhoea, as evidenced by low, non-significant genetic correlations, associated variants unique to the disorder, and low factor loading for the 9-factor structural equation model, and results are supported by a large meta-analysis that reported a hyperandrogenous phenotype for PCOS [51]. The most significant SNP for ovarian cysts (rs11031006), upstream from both FSHB and ARL14EP, which has previously been associated with PCOS [3], provided opportunity to distinguish the genetic underpinnings of general ovarian cysts from PCOS. The previously identified association of PCOS with rs11031006 [52] did not reach genome-wide significance in our small sample (OR [A allele] = 1.24, CI = 1.11–1.37, p = 7.5e−05), but its minor A allele has been previously associated with increased risk for PCOS, infertility, and premature ovarian failure [53]. Here, for ovarian cysts, the A allele was risk decreasing (OR [A allele] = 0.89, CI = 0.86–0.91), indicating a common genetic architecture with endometriosis, risk for which is decreased by one or more A alleles [54], rather than with PCOS. The inverse comorbidity model of endometriosis and PCOS [55, 56], with endometriosis and UF associated with elevated levels of oestrogen [57], and PCOS with hyperandrogenism [51, 58], is sustained by this finding that associated genes in common have opposing direction of effect.
The menorrhagia meta-analysis identified nine independent loci, with rs7929660, also upstream of both FSHB and ARL14EP, the most significant independent lead SNP. The same locus achieved genome-wide significance in a previous GWAS of heavy menstrual bleeding (HMB) [1]. FSHB, which encodes the beta subunit of follicle-stimulating hormone (FSH), plays a crucial role in steroidogenesis and folliculogenesis [53, 59]. Although analyses of this locus in the context of reproductive disorders have tended to focus on the role of FSHB, an adjacent gene, ARL14EP, associated with autoimmunity and neuronal connectivity, as well as MD, has also come under scrutiny [3, 6, 60, 61], and the suggestion that the specific effect of the risk variant may be tissue dependent [3] provides a pleiotropic explanation for the association between reproductive disease and depression. Another common variant is rs78378222, located in the 3′UTR of TP53, which reached genome-wide significance for both ovarian cysts and menorrhagia, as well as UF. TP53, which has been termed “the guardian of the genome,” plays an important role in the DNA damage control pathway, promoting cell cycle arrest so that damage can be repaired and apoptosis if this is not possible [62].
A major focus of this analysis was the identification of SNPs significantly associated with different reproductive disorders. That the same variants and associated genes underlie many reproductive traits and disorders has been previously observed [1, 3], also highlighted by the 27 genes found to be associated with at least 2 disorders. Here we used structural equation modelling to investigate the joint genetic architecture across reproductive disorders and identified a latent factor that underlies multiple disorders, indicating that many of the same genetic variants underly risk for menorrhagia, ovarian cysts, endometriosis, menopausal symptoms, and UF. Previous studies of UF have noted that associated genes have functions related to genitourinary development or genome stability [63, 64], and SNPs associated with the latent reproductive factor also implicate genes that fulfil these functions. Here, the importance of rs11031006 and its associated genes, FSHB and ARL14EP, was highlighted by its most highly significant association with the latent reproductive factor. FSHB, as well as GREB1, ESR1, SYNE1, WT1, WNT4, and TERT, whose effect is largely mediated through this latent reproductive factor, have a strong association with reproductive health [2, 65‒70], while RUNX1, RNLS, DOCK5, and EEFSEC promote genome stability [71‒75].
Genetic correlation between specific reproductive disorders and depression groups was significant for all groups and disorders with the exception of genital polyps and PCOS with PND, although given consistent findings of comorbidity with depression in a recent systematic literature review [13], PCOS results are likely due to its small sample size. It has been suggested that comorbidity between MD and reproductive disease is due to chronic pain, which induces depression [76]. However, a cross-disorder meta-analysis of MD and endometriosis [6], which reported a causal association of MD with reproductive disease, reported a shared genetic aetiology that did not support the putative role of pain as the only determinant of depression/endometriosis comorbidity and implicated genes involved in sex steroid hormone pathways, including ESR1 and ARL14EP.
Gene expression analysis during late pregnancy for women who developed postpartum depression, compared to women who did not, noted that 39 of 116 differentially expressed genes were involved with oestrogen signalling, oestrogen metabolism, or were oestrogen responsive [77]. Since no significant difference in absolute oestrogen levels was found, the study concluded that the pertinent factor for PND was response to oestrogen. Later studies by the same group supported this conclusion [78, 79], and a recent study reported dysregulated cellular stress signalling genes following exposure to oestrogen within cell lines taken from women with a history of PND compared to controls [80]. The 11p14.1 locus, containing FSHB and ARL14EP and associated with the release of oestrogen, ranked highest for the latent reproductive factor and was nominally significant for both female depression and MD, and the 6q25.2 locus, containing ESR1, was nominally associated with three depression groups (MD, female depression, and PND). Neither locus was nominally significant for the exclusively male depression group.
Bound to oestrogen, ESR1 functions as a transcription factor, playing a crucial role in reproductive health through interaction with GREB1, TERT, FSHB [64], and WNT4 [81, 82], but can also disrupt genomic stability through inducing DNA damage via the production of oxidative metabolites, and downregulating damage response genes [83, 84]. ESR1 is also associated with neuronal differentiation [85], differentially expressed in the suicidal and non-suicidal brain [86], with risk variants associated with lifetime female depression [87]. Interestingly, the first genome-wide significant locus associated with postpartum depression, reported in both the DF7 and DF8 FinnGen releases (July 2022; December 2022), is located in the 6q25.2 genomic region, with its most significant SNP (rs2347923) intronic to ESR1.
Limitations, Strengths and Conclusion
An important limitation of this study is the sample size of the PND and female depression groups, as well as PCOS cases, which resulted in large standard errors in the calculation of genetic correlations. This was overcome by using the MD sample that excluded the UKB cohort, but this sample introduced sexual heterogeneity, highlighting the need for sexual differentiation in depression research. PND is defined by self-report only as “probably related to childbirth”, and dates of all depression episodes and childbirths were not available for all participants for verification. There was also no information available with respect to the timing of PND onset: during pregnancy or postpartum. A further limitation is the substantial difference in prevalence of disorders between the UKB sample and those reported in the literature, highlighting the difference in health status between UKB participants and the general British population, although it is also likely that many UKB cases were undiagnosed, compromising the calculation of SNP-based heritability on the liability scale. The UKB endometriosis sample also did not exclude women who only reported endometriosis of the uterus (adenomyosis) which may have introduced heterogeneity into the sample [88‒90], and the dysmenorrhoea sample was also non-specific in its diagnosis. Since the study was of women of European ancestry only, conclusions may not be transferable to other populations, and, given the unique Finnish history, there may be important differences in risk variants and alleles between the FinnGen and UKB samples.
Nevertheless, our results for the meta-analyses of specific reproductive disorders are consistent with previous findings. This study draws on very large data sets to analyse a large number of reproductive disorders and utilises recent advances in genetic analyses to provide insight into the underlying genetic architecture that is common to these disorders. In addition, our investigation of a common aetiology for female reproductive disorders has identified some putative risk loci that may prove to be pertinent for female depression and PND, particularly highlighting the key role of ESR1.
Statement of Ethics
We thank the participants and investigators of the FinnGen and UK Biobank studies for their invaluable contributions to this work. This research was approved by The University of Queensland, Human Research Ethics Committee under Project: 2020/HE002938 – Statistical Methods and Algorithms for Analysis of and Application to Genetic Data Sets. The use and analysis of data from the UK Biobank has been conducted using the UK Biobank Resource under Application Number 12505.
Conflict of Interest Statement
All authors declare that they have no relevant financial or non-financial interests to disclose.
Funding Sources
This work was funded by NHMRC grant 1145645 and generously supported by a donation from the Axelsen family. J.K. has been supported by a UQ Research Training Program scholarship, and E.M.B. received funding from the NHMRC Centre for Research Excellence (APP1198304). S.M.B. receives sponsored research grants from Sage Therapeutics.
Author Contributions
Jacqueline Kiewa and Enda M. Byrne designed the study, and Jacqueline Kiewa analysed the data. Jacqueline Kiewa and Enda M. Byrne drafted the manuscript. Naomi R. Wray, Samantha Melzter-Brody, Christel Middeldorp, and Sally Mortlock revised the manuscript for intellectual content. All authors have read and approved the final version.
Data Availability Statement
U.K.B. and FinnGen data are available for public access. All significant findings, including genome-wide significant results for GWAS and GSEM analysis, are provided in online supplementary Tables. Full data are available from the corresponding author upon reasonable request.