Abstract
Objectives: Endometriosis (EM) is a chronic disease severely impacting reproductive health, with its exact cause still unclear. In-depth understanding of the etiology and pathogenesis of EM from the perspective of genetics and exploring individualized treatment strategies can improve the health and quality of life of patients. Design: This study combined genetic data analysis with experimental validation to provide novel biomarkers and drug targets for the diagnosis and treatment of EM. Participants/Materials, Setting, and Methods: Whole blood cis-expression quantitative trait loci (eQTL) data were used as exposure data, and data from the FinnGen database EM1-2 and EM3-4 were used as outcomes. Summary-data-based Mendelian randomization (SMR) methods were used to select genes with causal relationship to the disease. These genes were validated through bioinformatics analysis and real-time quantitative polymerase chain reaction (RT-qPCR) analysis of clinical samples, and potential diagnostic and drug targets were screened through colocalization and molecular docking. Results: Through SMR analysis, seven genes were selected as potential diagnostic markers of EM, namely Eukaryotic Elongation Factor, Selenocysteine-TRNA-Specific (EEFSEC), INO80 complex subunit E (INO80E), RAP1 GTPase-activating protein (RAP1GAP), Lipid Droplet-Associated Hydrolase (LDAH), Ring Finger And SPRY Domain Containing 1 (RSPRY1), HLA Complex Group 22 (Non-Protein Coding) (HCG22), and Adenosine Kinase (ADK). Colocalization analysis showed that EEFSEC, HCG22, INO80E, and RSPRY1 could be used as potential drug targets. Limitations: SMR is limited by dependence on publicly available summary data, potential confounding biases in genetic variant-phenotype associations, the presence of underlying horizontal pleiotropy, and issues related to insufficient statistical power. Colocalization analysis cannot assess undiscovered genetic variants. The in vitro experiments in this study utilized clinical samples but were validated only at the expression level. The accuracy of molecular docking analysis largely depends on the quality of protein structures and ligands. Conclusions: The study identifies potential diagnostic markers and drug targets for EM from a genetic perspective, providing new directions for drug development and precision medicine for EM treatment.
Introduction
Endometriosis (EM), a chronic inflammatory disorder, is defined as the presence of endometrial tissue outside the uterine cavity, leading to pelvic pain and infertility [1]. EM affects about 10%–15% of women of reproductive age and is considered a public health problem that affects women’s quality of life and poses a significant economic burden [2]. However, the pathogenesis of EM is still unclear, and the disease lacks effective treatment [3]. Traditional diagnostic methods include clinical symptom assessment, pelvic examination, ultrasound, magnetic resonance imaging, and other imaging examinations, as well as surgical biopsy confirmation. Currently, there is no single conclusive biomarker that can be highly sensitive and specific for the diagnosis of EM [4]. Because of the variety and complexity of its symptoms, diagnosing and treating EM remains a huge challenge [5]. Finding sensitive biomarkers and effective drug targets for EM is a challenging task at this stage.
Recent studies have found that epigenetic effects are crucial in the pathogenesis of EM. These genetic changes not only affect chromatin structure and deoxyribonucleic acid (DNA) methylation but also affect gene expression independently of gene sequence, which may play an important role in the development of the disease and the mechanism of infertility [6]. More and more evidence shows that epigenetic disorders and genetic mutations are thought to contribute to the development of EM [7]. Unfortunately, due to the diverse phenotypes of patients, it is difficult to make accurate diagnoses and identify specific mechanisms of this infertility; thus, research in this area is quite limited [8].
Fortunately, genetic studies provide a way to evaluate causality through Mendelian randomization (MR) and to find effective drug targets [9]. This approach is able to identify risk factors for EM and provide insights into the cause of the disease [9]. A recent genome-wide association study (GWAS) of 58,115 cases and 733,480 controls revealed 27 loci associated with EM, providing us with the possibility to genetically analyze EM [10]. Expression quantitative trait loci (eQTLs) have become a common tool to interpret the regulatory mechanisms of variants identified by GWAS. In particular, cis-eQTLs, where gene expression levels are affected by a gene-proximal (<1 megabases; Mb) single nucleotide polymorphism (SNP), have been widely used for this purpose [11]. GWAS and eQTL studies are systematically biased toward different types of variant and support the use of complementary functional approaches alongside the next generation of eQTL studies [12]. Some recent studies have used summary-data-based MR (SMR) to analyze the causal relationship between genes and diseases. For example, 87 genes in age-related macular degeneration have been identified using QTL mapping of human retinal DNA methylation [13]. One study used genome-wide survival studies to identify PARL as a novel site for clinical progression and neurodegeneration of Alzheimer’s disease [14], and these findings have given us direction for our research.
Our aim in this study is to analyze key genes in EM using SMR and verify them in clinical samples, and to verify the role of potential drug targets of these genes through colocalization analysis and molecular docking. While further research is needed to confirm these findings, these advances certainly offer new hope for patients and medical professionals alike.
Materials and Methods
The study design is shown in Figure 1.
Data Source
We selected the eQTLGen consortium (https://www.eqtlgen.org) [11] for exposure, which contained whole blood eQTL data from 31,684 samples. The GWAS data for EM were obtained from the FinnGen database (https://www.finngen.fi) [15] as an ending. We selected the data of EM stage 1–2 and EM stage 3–4, which contained the GWAS data of 223,920 and 221,995 samples, respectively (Table 1). Reanalysis of previously collected and published data was involved in this study and no further ethical approval was required. The use of publicly available data is exempt from additional ethical approval.
The data in the article
Exposure or outcome . | Data source . | Sample size . | Ancestry . | Data URL . |
---|---|---|---|---|
eQTL | eQTLGen | 31,684 | European | https://www.eqtlgen.org |
Endometriosis: ASRM stages 1, 2 | Finngen_R10 | 223,920 | European | https://www.finngen.fi |
Endometriosis: ASRM stages 3, 4 | Finngen_R10 | 221,995 | European | https://www.finngen.fi |
Exposure or outcome . | Data source . | Sample size . | Ancestry . | Data URL . |
---|---|---|---|---|
eQTL | eQTLGen | 31,684 | European | https://www.eqtlgen.org |
Endometriosis: ASRM stages 1, 2 | Finngen_R10 | 223,920 | European | https://www.finngen.fi |
Endometriosis: ASRM stages 3, 4 | Finngen_R10 | 221,995 | European | https://www.finngen.fi |
eQTL, expression quantitative trait locus; URL, uniform resource location.
Summary-Data-Based Mendelian Randomization Analysis
We download the SMR analysis software developed by the University of the West Lake (https://yanglab.westlake.edu.cn). The SMR software tool was originally developed to implement the SMR and heterogeneity in dependent instruments (HEIDI) methods to test for pleiotropic association between the expression level of a gene and a complex trait of interest using summary-level data from GWAS and eQTL studies [16]. Then, the eQTL data were used as exposure factors, and the EM stage 1–2 and stage 3–4 data from FinnGen database were used as results for SMR analysis. In order to make the result more accurate, our selection is very strict and three conditions need to be met at the same time, P-SMR <0.05, P-HEIDI >0.05, and false discovery rate <0.05. After selecting the genes that meet the conditions, we use R packages (magick) and R packages (TeachingDemos) to draw scatter plots of single genes and observe the causal relationship between genes and EM.
Gene Ontology and Kyoto Encyclopedia of Genes and Genomes Analysis
To further explore the biological mechanisms of the screened genes, we performed Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation using the “ClusterProfiler” R software package [17].
Real-Time Quantitative Polymerase Chain Reaction
Ten patients diagnosed with EM stage 3–4 who underwent laparotomy or laparoscopic surgery at the Department of Obstetrics and Gynecology, Hangzhou Women’s Hospital from January 2023 to January 2024 were selected (shown in online suppl. Fig. S1; for all online suppl. material, see https://doi.org/10.1159/000543707). Samples of ectopic (EC) endometrium of ovary and eutopic (EU) endometrium were collected. All endometrial samples were pathologically confirmed to be in the proliferative phase, with evident endometrial glands and stromal cells in the ovarian ectopic lesions. EM staging was performed according to the revised American Fertility Society (rAFS) criteria. Exclusion criteria included adenomyosis, malignant diseases, recent hormone therapy within the previous 3 months, concurrent intrauterine lesions, irregular menstrual cycles, breastfeeding, or pregnancy within the last 6 months. The study was approved by the Ethics Committee of Hangzhou Women’s Hospital (Hangzhou, China) (protocol code: [2022] Medical Ethics Review A (7)-05; date of approval: October 12, 2022), and informed consent was obtained from all individual participants.
The tissue samples were ground in liquid nitrogen, cracked with Trizol, violently shaken with chloroform (ratio 5:1) for 30 s, centrifuged for supernatant, isopropyl alcohol precipitated ribonucleic acid (RNA), washed with 75% ethanol, dried, and dissolved in RNase-free water. The SPARKscript II RT Plus Kit (With gDNA Eraser) (Sparkjade, China) was used to remove the genome, and then the removed genomic DNA was reverse-transcribed into cDNA. Finally, real-time quantitative polymerase chain reaction (RT-qPCR) was detected using the SYBR Green method using the ABI7500 fluorescent quantitative PCR instrument (ABI, USA). After the RT-qPCR reaction, the relative levels of the target genes were compared with β-actin by the 2−ΔΔCt method.
Colocalization Analysis
For the screened genes, colocalization analysis of EM risk was conducted using the R package “coloc” [18]. The default prior probabilities were set as follows: p1 = 1E−4, p2 = 1E−4, and p12 = 1E−5. Here, p1, p2, and p12 represent the predefined probabilities of significant associations between SNPs in the test region and either gene expression, EM risk, or both. Posterior probabilities from the colocalization analysis correspond to one of five hypotheses: posterior probability hypothesis (PPH)0, where the SNP is unrelated to either feature; PPH1, where the SNP is associated with gene expression but not EM risk; PPH2, where the SNP is associated with EM risk but not gene expression; PPH3, where the SNP is associated with both EM risk and gene expression but driven by different SNPs; and PPH4, where the SNP is associated with both EM and gene expression, driven by a common SNP. A significance threshold for colocalization was set at PPH4 >0.80, and genes colocalized with EM were considered potential drug target genes. Graphs were generated using the R package “locuscomparer.”
Candidate Drug Prediction
Assessing the interaction between proteins and drugs is crucial for understanding whether target genes can serve as practical drug targets. In this study, the goal was achieved by utilizing the database of the food and drug administration (FDA). The FDA database contains extensive information on compounds, intended to support drug development and related research. It collects data on approximately 1,729 known compounds, including structural details, physicochemical properties, and toxicological data. Researchers can utilize this database to identify potential drug candidate compounds, assess drug toxicity and safety, and conduct other relevant studies. The FDA compound library plays a vital role in drug development by expediting the drug discovery process and enhancing the safety and efficacy of medications.
Molecular Docking and Prediction of Drug Targets
This study employed Autodock tools and Openbabel software for the preprocessing and molecular docking simulations of proteins and small molecules. Initially, the protein structures underwent preprocessing, which included adding hydrogen ions, removing metal salts and water molecules, and conformational optimization. Similar preprocessing was conducted for the small molecules from the FDA compound library. Subsequently, two rounds of molecular docking were performed. First, blind docking was conducted to identify the optimal binding sites, followed by the selection of the lowest binding energy 20 compounds for each target protein in the first round screening (exhaustiveness = 10). Then, the lowest binding energy 3 compounds were selected for a second round of more refined molecular docking (exhaustiveness = 25). Finally, drug candidate compounds with high binding affinity and favorable interaction patterns were selected through the analysis of simulation results.
Results
Summary-Data-Based Mendelian Randomization Analysis
First, SMR analysis was conducted using eQTL data and EM stage 1–2 data. After our condition screening, it was found that there were no genes that met the requirements (online suppl. Table 1). Then, SMR analysis was conducted on EM stage 3–4 data, and 7 genes were found to meet our screening conditions. They are, respectively, Eukaryotic Elongation Factor, Selenocysteine-TRNA Specific (EEFSEC), INO80 complex subunit E (INO80E), RAP1 GTPase-activating protein (RAP1GAP), Lipid Droplet-Associated Hydrolase (LDAH), Ring Finger And SPRY Domain Containing 1 (RSPRY1), HLA Complex Group 22 (Non-Protein Coding) (HCG22), and Adenosine Kinase (ADK) (Fig. 2), meeting the three conditions of P-SMR <0.05, P-HEIDI >0.05, and false discovery rate <0.05 (Table 2). Furthermore, based on the β values from SMR, the odds ratios (95% CI) for these 7 genes were calculated as follows: EEFSEC 0.533 (0.298–0.768), INO80E 0.787 (0.683–0.891), RAP1GAP 1.105 (1.060–1.150), LDAH 0.878 (0.817–0.938), RSPRY1 1.584 (1.366–1.801), HCG22 1.152 (1.083–1.220), and ADK 0.928 (0.892–0.964).
Manhattan plots for associations of eQTL data with EM in SMR analysis. Labeled and red genes refer to MR findings with P-SMR <0.05, P-HEIDI >0.05, and FDR <0.05.
Manhattan plots for associations of eQTL data with EM in SMR analysis. Labeled and red genes refer to MR findings with P-SMR <0.05, P-HEIDI >0.05, and FDR <0.05.
The SMR results of eQTLs for 7 genes with EM
Gene . | topSNP . | B-SMR . | P-SMR . | P-HEIDI . | FDR . |
---|---|---|---|---|---|
EEFSEC | rs6414310 | −0.62809 | 1.64E−07 | 0.406096 | 0.000459 |
INO80E | rs9925915 | −0.23899 | 7.09E−06 | 0.083118 | 0.012435 |
RAP1GAP | rs17423390 | 0.100087 | 1.30E−05 | 0.090135 | 0.017094 |
LDAH | rs10170771 | −0.12989 | 2.42E−05 | 0.309448 | 0.026080 |
RSPRY1 | rs11076181 | 0.460048 | 3.33E−05 | 0.094359 | 0.031177 |
HCG22 | rs126505 | 0.141713 | 4.97E−05 | 0.115826 | 0.03875 |
ADK | rs67594352 | −0.07389 | 5.40E−05 | 0.159026 | 0.039899 |
Gene . | topSNP . | B-SMR . | P-SMR . | P-HEIDI . | FDR . |
---|---|---|---|---|---|
EEFSEC | rs6414310 | −0.62809 | 1.64E−07 | 0.406096 | 0.000459 |
INO80E | rs9925915 | −0.23899 | 7.09E−06 | 0.083118 | 0.012435 |
RAP1GAP | rs17423390 | 0.100087 | 1.30E−05 | 0.090135 | 0.017094 |
LDAH | rs10170771 | −0.12989 | 2.42E−05 | 0.309448 | 0.026080 |
RSPRY1 | rs11076181 | 0.460048 | 3.33E−05 | 0.094359 | 0.031177 |
HCG22 | rs126505 | 0.141713 | 4.97E−05 | 0.115826 | 0.03875 |
ADK | rs67594352 | −0.07389 | 5.40E−05 | 0.159026 | 0.039899 |
SMR, summary-data-based Mendelian randomization; eQTL, expression quantitative trait locus; EM, endometriosis; P, p value; HEIDI, heterogeneity in dependent instruments; FDR, false discovery rate.
With a scatter plot of single genes, we learn the causal relationship between each gene and EM. We found that there are four genes that have a protective effect on the disease, namely ADK, EEFSEC, INO80E, and LDAH; that is, with the increase of gene content, the risk of EM is reduced. The other three genes, HCG22, RAP1GAP, and RSPRY1, promote EM. With the increase of gene content, the risk of EM morbidity also increases (Fig. 3). The single gene scatter map also clearly shows the location of each top SNP.
Scatter plot of causal relationship between 7 genes and EM. a Scatter plot of negative correlation between ADK gene and EM. b Scatter plot of negative correlation between EEFSEC gene and EM. c Scatter plot of positive correlation between HCG22 gene and EM. d Scatter plot of negative correlation between INO80E gene and EM. e Scatter plot of negative correlation between LDAH gene and EM. f Scatter plot of positive correlation between RAP1GAP gene and EM. g Scatter plot of positive correlation between RSPRY1 gene and EM.
Scatter plot of causal relationship between 7 genes and EM. a Scatter plot of negative correlation between ADK gene and EM. b Scatter plot of negative correlation between EEFSEC gene and EM. c Scatter plot of positive correlation between HCG22 gene and EM. d Scatter plot of negative correlation between INO80E gene and EM. e Scatter plot of negative correlation between LDAH gene and EM. f Scatter plot of positive correlation between RAP1GAP gene and EM. g Scatter plot of positive correlation between RSPRY1 gene and EM.
GO and KEGG Analysis
The 7 genes obtained were preliminarily analyzed by bioinformatics method. The distribution of these 7 genes in biological process, cellular component, and molecular function, as well as the proportion of each gene in them, could be seen through the circle diagram of GO (Fig. 4a). Through the bubble map of GO, we could see the biological processes that the 7 genes are mainly involved, such as microvilli tissue regulation, nucleotide synthesis and metabolism, and specific mechanisms of protein synthesis. The cellular component is mainly located in the nucleus and is involved in complexes ranging from the basic structure of the cell (such as neuronal cell bodies and nuclear chromosomes) to specific protein complexes and organelles (such as adenosine triphosphate [ATP] synthase complex and lipid droplets). Their molecular functions mainly focus on energy generation and DNA transcription, including enzyme activity, molecular binding activity, and regulatory activity during protein synthesis (Fig. 4b). In the bubble map of KEGG, seven gene sets were enriched in purine metabolism, ATP-dependent chromatin remodeling, and nucleotide metabolism. It is not difficult to see that these gene sets play an important role in regulating cellular metabolic processes and gene expression. In particular, enrichment of purine metabolism might point to functions associated with DNA and RNA synthesis, while ATP-dependent chromatin remodeling might be associated with gene regulatory mechanisms (Fig. 4c).
GO and KEGG analysis of 7 genes. a The circle graph of GO shows the distribution of these seven genes in biological process, cellular component, and molecular function. b The bubble map of GO shows the details of the 7 genes involved in biological processes, cellular components, and molecular functions. c The bubble map of KEGG shows the pathways involved in enrichment of these 7 genes.
GO and KEGG analysis of 7 genes. a The circle graph of GO shows the distribution of these seven genes in biological process, cellular component, and molecular function. b The bubble map of GO shows the details of the 7 genes involved in biological processes, cellular components, and molecular functions. c The bubble map of KEGG shows the pathways involved in enrichment of these 7 genes.
RT-qPCR Analysis
The results of RT-qPCR indicated that there were differences in the expression of 7 genes between the EU group and the EC group. Compared with the EU group, the expressions of four gene phases, ADK, EEFSEC, INO80E, and LDAH, were decreased in the EC group (Fig. 5a–c, e). On the contrary, HCG22, RSPRY1, and RAP1GAP showed increased expression in the EC group (Fig. 5d, f, g). This is consistent with the trend we obtained through single-gene SMR analysis.
RT-qPCR analysis 7 gene expression difference between the EU group and the EC group. a The expression of ADK gene was decreased in the EC group. b The expression of EEFSEC gene was decreased in the EC group. c The expression of INO80E gene was decreased in EC group. d The expression of HCG22 gene was increased in the EC group. e The expression of LDAH gene was decreased in the EC group. f The expression of RSPRY1 gene was increased in the EC group. g The expression of RAP1GAP gene was increased in the EC group.
RT-qPCR analysis 7 gene expression difference between the EU group and the EC group. a The expression of ADK gene was decreased in the EC group. b The expression of EEFSEC gene was decreased in the EC group. c The expression of INO80E gene was decreased in EC group. d The expression of HCG22 gene was increased in the EC group. e The expression of LDAH gene was decreased in the EC group. f The expression of RSPRY1 gene was increased in the EC group. g The expression of RAP1GAP gene was increased in the EC group.
Colocalization Analysis
Through colocalization analysis, four genes were selected from the seven genes, namely EEFSEC, HCG22, INO80E, and RSPRY1 (Table 3), according to the screening of the colocalization significance threshold PPH4 >0.80. These four genes could be considered as potential drug target genes. In addition, a graph was drawn with R package (locuscomparer) to show the relationship between the expression level of each gene and the risk of EM disease, and to colocate the information and location of the SNP with the highest expression. EEFSEC is located on chromosome 3, and the SNP colocated with EM is rs2811479 (Fig. 6a). HCG22 is located on chromosome 6, and the SNP colocated with EM is rs2428516 (Fig. 6b). INO80E is located on chromosome 16, and the SNP colocated with EM is rs3814883 (Fig. 6c). RSPRY1 is located on chromosome 16, and the SNP colocated with EM is rs7197204 (Fig. 6d).
Colocalization results of eQTLs for 4 genes with EM-associated SNPs
Id.exposure . | Gene . | Outcome . | H0 . | H1 . | H2 . | H3 . | H4 . |
---|---|---|---|---|---|---|---|
eqtl-a-ENSG: 00000132394 | EEFSEC | Endometriosis | 3.95e−64 | 1.03e−04 | 4.57e−61 | 1.18e−01 | 8.82e−01 |
eqtl-a-ENSG: 00000228789 | HCG22 | Endometriosis | 4.17e−180 | 4.05e−03 | 1.30e−178 | 1.26e−01 | 8.70e−01 |
eqtl-a-ENSG: 00000169592 | INO80E | Endometriosis | 1.30e−196 | 6.06e−02 | 2.38e−196 | 1.10e−01 | 8.30e−01 |
eqtl-a-ENSG: 00000159579 | RSPRY1 | Endometriosis | 1.04e−71 | 5.13e−02 | 2.05e−72 | 9.17e−03 | 9.40e−01 |
Id.exposure . | Gene . | Outcome . | H0 . | H1 . | H2 . | H3 . | H4 . |
---|---|---|---|---|---|---|---|
eqtl-a-ENSG: 00000132394 | EEFSEC | Endometriosis | 3.95e−64 | 1.03e−04 | 4.57e−61 | 1.18e−01 | 8.82e−01 |
eqtl-a-ENSG: 00000228789 | HCG22 | Endometriosis | 4.17e−180 | 4.05e−03 | 1.30e−178 | 1.26e−01 | 8.70e−01 |
eqtl-a-ENSG: 00000169592 | INO80E | Endometriosis | 1.30e−196 | 6.06e−02 | 2.38e−196 | 1.10e−01 | 8.30e−01 |
eqtl-a-ENSG: 00000159579 | RSPRY1 | Endometriosis | 1.04e−71 | 5.13e−02 | 2.05e−72 | 9.17e−03 | 9.40e−01 |
eQTL, expression quantitative trait locus; EM, endometriosis; SNPs, single nucleotide polymorphisms; PPH0–PPH4, posterior probabilities of different hypotheses; PPH4 >0.8, strong colocalization between gene expression and EM risk.
The colocalization map shows the relationship between SNPs expression levels of 4 genes and EM disease risk. a EEFSEC colocates results with EM on chromosome 3. b HCG22 colocates results with EM on chromosome 6. c INO80E colocates results with EM on chromosome 16. d RSPRY1 colocates results with EM on chromosome 16.
The colocalization map shows the relationship between SNPs expression levels of 4 genes and EM disease risk. a EEFSEC colocates results with EM on chromosome 3. b HCG22 colocates results with EM on chromosome 6. c INO80E colocates results with EM on chromosome 16. d RSPRY1 colocates results with EM on chromosome 16.
Molecular Docking and Prediction of Drug Targets
By screening the FDA library, the top three small compounds with the highest scores capable of binding to EEFSEC were netupitant, bedaquiline, and probucol, with binding energies (kcal/mol) of −7.1, −6.2, and −5.8, respectively (Fig. 7a–c). The top three small compounds with the highest scores capable of binding to HCG22 were avapritinib, bedaquiline, and probucol, with binding energies (kcal/mol) of −6.2, −5.8, and −5.8, respectively (Fig. 7d–f). The top three small compounds capable of binding to INO80E were midostaurin, bedaquiline, and probucol, with binding energies (kcal/mol) of −6.8, −5.9, and −5.4, respectively (Fig. 7g–i). Finally, the top three small compounds capable of binding to RSPRY1 were ubrogepant, midostaurin, and fexofenadine, with binding energies (kcal/mol) of −9.8, −8.9, and −8.7, respectively (Fig. 7j–l). Notably, probucol and bedaquiline were found to bind to the EEFSEC, HCG22, and INO80E proteins, while midostaurin was found to bind to the INO80E and RSPRY1 proteins (Table 4).
Docking results of 4 proteins small molecules. a EEFSEC docking netupitant. b EEFSEC docking bedaquiline. c EEFSEC docking probucol. d HCG22 docking avapritinib. e HCG22 docking bedaquiline. f HCG22 docking probucol. g INO80E docking midostaurin. h INO80E docking bedaquiline. i INO80E docking probucol. j RSPRY1 docking ubrogepant. k RSPRY1 docking midostaurin. l RSPRY1 docking fexofenadine.
Docking results of 4 proteins small molecules. a EEFSEC docking netupitant. b EEFSEC docking bedaquiline. c EEFSEC docking probucol. d HCG22 docking avapritinib. e HCG22 docking bedaquiline. f HCG22 docking probucol. g INO80E docking midostaurin. h INO80E docking bedaquiline. i INO80E docking probucol. j RSPRY1 docking ubrogepant. k RSPRY1 docking midostaurin. l RSPRY1 docking fexofenadine.
Docking results of four proteins with small molecules
Target . | Structure ID . | Drug . | PubChem . | Binding energy, kcal/mol . |
---|---|---|---|---|
EEFSEC | 5IZK (PDB) | Netupitant | 6451149 | −7.1 |
Bedaquiline | 5388906 | −6.2 | ||
Probucol | 4912 | −5.8 | ||
HCG22 | HCG22 (AlphaFold) | Avapritinib | 118023034 | −6.2 |
Bedaquiline | 5388906 | −5.8 | ||
Probucol | 4912 | −5.8 | ||
INO80E | INO80E (AlphaFold) | Midostaurin | 9829523 | −6.8 |
Bedaquiline | 5388906 | −5.9 | ||
Probucol | 4912 | −5.4 | ||
RSPRY1 | RSPRY1 (AlphaFold) | Ubrogepant | 68748835 | −9.8 |
Midostaurin | 9829523 | −8.9 | ||
Fexofenadine | 63002 | −8.7 |
Target . | Structure ID . | Drug . | PubChem . | Binding energy, kcal/mol . |
---|---|---|---|---|
EEFSEC | 5IZK (PDB) | Netupitant | 6451149 | −7.1 |
Bedaquiline | 5388906 | −6.2 | ||
Probucol | 4912 | −5.8 | ||
HCG22 | HCG22 (AlphaFold) | Avapritinib | 118023034 | −6.2 |
Bedaquiline | 5388906 | −5.8 | ||
Probucol | 4912 | −5.8 | ||
INO80E | INO80E (AlphaFold) | Midostaurin | 9829523 | −6.8 |
Bedaquiline | 5388906 | −5.9 | ||
Probucol | 4912 | −5.4 | ||
RSPRY1 | RSPRY1 (AlphaFold) | Ubrogepant | 68748835 | −9.8 |
Midostaurin | 9829523 | −8.9 | ||
Fexofenadine | 63002 | −8.7 |
The lower the binding energy, the better the binding effect and the higher the affinity.
ID, identifier.
Discussion
Using whole blood cis-eQTL data as exposure and FinnGen EM1-2 and EM3-4 data as outcomes, our SMR analysis identified 7 genes (EEFSEC, INO80E, RAP1GAP, LDAH, RSPRY1, HCG22, and ADK) potentially causally linked to endometriosis. Among these, 4 genes, ADK, EEFSEC, INO80E, and LDAH, showed a negative correlation with EM, indicating a protective role. The remaining 3 genes, HCG22, RAP1GAP, and RSPRY1, exhibited a positive correlation with EM, suggesting a promotive role. These findings were supported by RT-qPCR analysis of clinical samples. We propose that these 7 genes could serve as potential diagnostic markers in a clinical setting. Through colocalization analysis (PPH4 >0.80), we selected 4 genes from the initial 7, namely EEFSEC, HCG22, INO80E, and RSPRY1, as potential drug target genes. Finally, screening and molecular docking in the FDA library led to the identification of three important compounds: probucol, bedaquiline, and midostaurin.
EEFSEC, a protein-coding gene, is a special translation extension factor in eukaryotes that promotes the incorporation of selenocysteine into selenoproteins [19]. Selenoproteins are essential for protecting cell membranes and DNA from oxidative damage, maintaining REDOX balance and selenium homeostasis, and aiding protein folding and gene expression [20]. Seryl-tRNA synthetase (SerRS) can bind and mediate TR of specific mRNAs by recruiting certain components of the selenocysteine incorporation machinery, including tRNASec and the corresponding elongation factor, EEFSEC, for ribosomal delivery [21]. The elongation of eukaryotic selenoproteins relies on a poorly understood process of interpreting in-frame UGA stop codons as selenocysteine (Sec) [22]. Selenoprotein deficiency and selenoprotein mutations can cause systemic disease in humans, often fatal disease [23]. A phenome-wide association study demonstrates associations of EEFSEC with anthropometric, cardiovascular, diabetes, bone health, and sex hormone traits that are related to endometrial cancer risk factors [24]. Our SMR analysis suggests that EEFSEC has a protective effect on EM, and as gene expression decreases, the risk of disease increases, and the underlying mechanism is likely selenoprotein deficiency and selenoprotein mutation, which affect the function of resisting oxidative stress and maintaining a stable intracellular environment. Through colocalization analysis, H4 was 0.882, which can be used as a drug target.
HCG22 is an RNA gene, and is affiliated with the lncRNA class, which plays a role in the process of gene epigenetic inheritance, gene translation, gene transcription, and posttranscriptional regulation and is closely related to many diseases [25]. A study has revealed HCG22 as a novel genetic susceptibility locus, emphasizing the role of genetically driven inflammatory processes in the pathogenesis of idiopathic dilated cardiomyopathy [26]. Another study indicates that it can also alleviate symptoms of lupus nephritis by affecting inflammation and immune complex deposition [27]. Research has shown that inflammation and immune function play a crucial role in the development of EM [5]. Our study demonstrates a positive correlation between HCG22 and the risk of EM. HCG22 likely participates in inflammation and immune function by acting as a “molecular sponge” for miRNA, and further exploration through molecular experiments is necessary. In colocalization analysis, H4 has a value of 0.87, indicating its potential as a candidate drug target.
INO80E is a protein-coding gene located in the nucleolar and nucleoplasm, part of the INO80 complex [28]. The INO80 complex is involved in the regulation of gene transcription, DNA repair, chromatin remodeling, and various other biological processes [29]. A study has indicated that the INO80 chromatin-remodeling complex plays a crucial role in controlling the inheritance of epigenetic traits and maintaining genome stability by regulating histone modification transmission. This research has also opened up a new avenue for a deeper understanding of the regulation mechanism of epigenetic memory [30]. Specifically, the INO80 complex plays a significant role in DNA replication and repair processes through its alteration of chromatin structure [31]. The mechanism of its influence on EM may be closely related to the cellular processes involved in chromatin remodeling, DNA repair, and transcriptional regulation. Our study suggests that INO80E has a protective effect on EM. Through colocalization analysis, the H4 value is 0.83, indicating its potential as a drug target, and it exhibits high binding affinity with three important compounds: probucol, bedaquiline, and midostaurin.
RSPRY1 is a protein-coding gene that encodes a glycoprotein. It consists of two structural domains: the ring finger domain, located at the N-terminus, which possesses RNA-binding activity, and the SPRY domain, located at the C-terminus, which exhibits phosphorylation activity [32]. The primary function of RSPRY1 is to bind to RNA and participate in signal transduction pathways. It is involved in various signaling pathways within cells, including cell cycle regulation, cell proliferation, apoptosis, and cell survival. During the cell cycle, RSPRY1 binds to RNA in the G1 and G2 phases, contributing to cell growth and DNA replication. In apoptosis, RSPRY1 participates in the apoptotic signaling pathway, promoting cell apoptosis [33]. Our study suggests that elevated expression of the RSPRY1 gene may increase the risk of EM development. Through SMR analysis, a strong causal relationship between the two has been identified, indicating its potential as a diagnostic marker. Through the colocalization analysis, the H4 value of 0.94 is the highest among four genes, highlighting its potential as a valuable drug target.
The ADK gene plays a crucial role in genetic regulation and cellular metabolism. It catalyzes the transfer of phosphate between ATP and adenosine, controlling the cellular concentration of adenosine [34]. More importantly, it ensures the smooth progress of intracellular methylation reactions by participating in maintaining methylation reactions, thereby avoiding the accumulation of methylation products, which is essential for maintaining normal DNA methylation status [35]. It also regulates hormone secretion and fat metabolism, modulates insulin function, and influences the onset of many diseases [36]. A study found that ADK can increase liver DNA methylation, decrease fatty acid oxidation, and enhance macrophage proinflammatory polarization by increasing interferon-stimulated gene expression [37]. Although there is limited research on the association between ADK and EM at present, our study suggests that a decrease in ADK expression increases the risk of developing EM, indicating its protective role in EM. The potential mechanism involves a decrease in the regulatory role of ADK in methylation and macrophage polarization, resulting in changes in hormone levels and exacerbation of inflammation as ADK expression decreases, thereby increasing the likelihood of EM occurrence. This suggests that ADK may serve as a promising diagnostic marker in clinical practice.
LDAH is a protein-coding gene located in the endoplasmic reticulum that enables lipase activity and is involved in lipid storage [38]. Lipid droplets are dynamic organelles that accumulate under different pathological conditions, especially in cells associated with inflammatory stress [39]. Upregulation of LDAH increases the hydrolysis and efflux of cholesterol esters in macrophage lipoproteins, promoting atherosclerotic lesions [40]. EM, as a chronic inflammatory disease, is closely related to the pathophysiology of macrophages [41]. LDAH affects the function of macrophages, thereby influencing the outcome of chronic inflammation. This is likely to be an important factor in the occurrence and development of EM, although further experimental validation is needed.
RAP1GAP is a protein-coding gene that encodes a GTPase-activating protein (GAP), with its main function being the downregulation of the activity of RAP1 proteins associated with the Ras family. It plays a crucial role in various physiological and pathological processes such as cell growth, differentiation, migration, and tumor development [42]. As a kinase in the RAP1GAP/MEK/ERK/HIF-1α signaling pathway, it exerts important functions in colorectal cancer [43]. Increasing evidence suggests a close relationship between RAP1GAP and gynecological diseases, as it can inhibit the tumor progression of endometrial cancer by affecting cell migration and invasion [44]. In cervical cancer cells, Rap1GAP may undergo autophagic degradation and affect cervical cancer through the ubiquitin-proteasome pathway mediated by E6AP [45]. Butyrate esters can inhibit the RAP1 signaling pathway in epithelial cells, thereby suppressing the growth of EM cells [46]. This is consistent with our conclusion that the increased expression of RAP1GAP gene reduces the incidence of EM, which may be related to the downregulation of RAP1 protein activity related to Ras family.
It is worth noting that, through virtual screening and molecular docking, we identified three important compounds: probucol, bedaquiline, and midostaurin. Probucol is an antioxidant primarily used to lower cholesterol levels and combat oxidative stress, thereby contributing to the prevention of cardiovascular diseases. Bedaquiline is an ATP synthase inhibitor used to treat multidrug-resistant tuberculosis by disrupting bacterial energy metabolism and inhibiting growth. Midostaurin is a tyrosine kinase inhibitor specifically targeting Fms-Related Receptor Tyrosine Kinase 3 (FLT3) gene mutations in acute myeloid leukemia, slowing down cancer cell proliferation. These compounds may play a role in the prevention and treatment of EM, although further experimental and clinical trials are necessary for validation.
Our study identified 7 genes associated with EM, of which 4 genes were pinpointed as potential drug targets. We also selected some compounds that could potentially be developed into drugs. Meaningful research has been conducted in exploring the etiology and pathogenesis of EM, discovering new therapeutic targets and personalized treatment strategies, as well as improving diagnostic methods and preventive measures. However, we must acknowledge the limitations of our research. Although SMR is a causal inference method based on genetic instrumental variables that can help determine causal relationships between variables and has important potential applications in causal inference and biological mechanism exploration [16], it also has its shortcomings. These include data limitations due to reliance on publicly available summary data, potential confounding bias from genetic variant-phenotype associations, presence of potential horizontal pleiotropy, and issues related to insufficient statistical power [47]. Most datasets are concentrated on populations of European descent, lacking GWAS data from nonethnic and different gender groups. Considering the differences in genetic backgrounds and linkage disequilibrium patterns, these population background discrepancies may introduce potential biases in MR effect estimation. Furthermore, our colocalization analysis is constrained by the availability of existing research findings, meaning that we cannot examine genetic variants that have not yet been discovered. Additionally, our in vitro experiments, although utilizing clinical samples, were only validated at the expression level without further in-depth investigation. Lastly, the accuracy of molecular docking analysis largely depends on the quality of protein structures and ligands [48]. While this method identifies potential drug targets, it does not guarantee their effectiveness in a clinical setting. Subsequent experimental validation and clinical trials are necessary to elucidate the therapeutic potential of the identified targets.
Conclusions
In summary, using the SMR method, we identified 7 genes (EEFSEC, INO80E, RAP1GAP, LDAH, RSPRY1, HCG22, ADK) that affect the outcome of EM and can serve as potential diagnostic markers. Colocalization analysis revealed that 4 of these genes (EEFSEC, HCG22, INO80E, RSPRY1) could be potential drug targets. Finally, through virtual screening and molecular docking, we identified three important compounds, probucol, bedaquiline, and midostaurin, which may be used for the prevention or treatment of EM. This research has identified potential diagnostic markers and drug targets for EM from a genetic perspective. However, further research is needed to fully elucidate the underlying mechanisms involved.
Acknowledgments
We want to acknowledge the participants and investigators in the original research of this MR study.
Statement of Ethics
The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki. The study was approved by the Ethics Committee of Hangzhou Women’s Hospital (Hangzhou, China) (protocol code: [2022] Medical Ethics Review A (7)-05; date of approval: October 12, 2022) and written informed consent was obtained from all individual participants.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
Grant support was provided by the Zhejiang Provincial Medical and Health Technology Plan (2023KY205), Zhejiang Provincial Natural Science Foundation of China (Q24H040012), Foundation of Zhejiang Provincial Education Department (Y202351214), and Administration of Traditional Chinese Medicine of Zhejiang Province, China (2024ZL740).
Author Contributions
Y.Z., L.Z., and X.Y. contributed to the material preparation, data collection, and analysis. F.C. and M.H. prepared the diagrams. Y.Z. and F.C. drafted the manuscript. W.W. conceived and designed the research and reviewed the manuscript. All authors reviewed the manuscript and approved the version to be published.
Additional Information
Yingjia Zhu and Feng Cheng contributed equally to this work.
Data Availability Statement
The original contributions presented in the study are included in the article/online supplementary material. Further inquiries can be directed to the corresponding authors.