Abstract
Background: This study aimed to screen and validate the crucial genes involved in osteoarthritis (OA) and explore its potential molecular mechanisms. Methods: Four expression profile datasets related to OA were downloaded from the Gene Expression Omnibus (GEO). The differentially expressed genes (DEGs) from 4 microarray patterns were identified by the meta-analysis method. The weighted gene co-expression network analysis (WGCNA) method was used to investigate stable modules most related to OA. In addition, a protein-protein interaction (PPI) network was built to explore hub genes in OA. Moreover, OA-related genes and pathways were retrieved from Comparative Toxicogenomics Database (CTD). Results: A total of 1,136 DEGs were identified from 4 datasets. Based on these DEGs, WGCNA further explored 370 genes included in the 3 OA-related stable modules. A total of 10 hub genes were identified in the PPI network, including AKT1, CDC42, HLA-DQA2, TUBB, TWISTNB, GSK3B, FZD2, KLC1, GUSB, and RHOG. Besides, 5 pathways including “Lysosome,” “Pathways in cancer,” “Wnt signaling pathway,” “ECM-receptor interaction” and “Focal adhesion” in CTD and enrichment analysis and 5 OA-related hub genes (including GSK3B, CDC42, AKT1, FZD2, and GUSB) were identified. Conclusion: In this study, the meta-analysis was used to screen the central genes associated with OA in a variety of gene expression profiles. Three OA-related modules (green, turquoise, and yellow) containing 370 genes were identified through WGCNA. It was discovered through the gene-pathway network that GSK3B, CDC42, AKT1, FZD2, and GUSB may be key genes related to the progress of OA and may become promising therapeutic targets.
Introduction
Osteoarthritis (OA) is a chronic degenerative disease that causes joint failure, pain and disability [1]. OA currently affects about 25% of people over the age of 18 and may be the biggest cause of disability among patients over the age of 40 [2, 3]. There are many factors for the development of OA, including abnormal mechanical stress, aging, obesity and genetic factors [4]. Articular cartilage degeneration, synovial inflammation and atypical bone formation are the main pathological features of OA [5]. Studies have shown that synovial tissue plays an important role in OA [6, 7]. Moreover, synovial lesions have been identified in many joint diseases, which may play a role in promoting the occurrence and development of the disease [8].
At present, more and more evidence indicate that the occurrence and development of OA may be mediated by many genes and signaling pathways [9]. In order to formulate clearer diagnostic criteria and more effective treatment options, the molecular mechanism of OA must be fully understood. With the development of bioinformatics, a lot of researches have been conducted on the gene expression profile of OA [10-12]. Although previous studies have used microarrays to identify some differentially expressed genes (DEGs) [13, 14], there may be some inconsistencies between the studies due to differences in sample size and quality. The meta-analysis method is a systematic method of researching and combining different public datasets, which can improve the accuracy and reliability of data analysis.
Weighted gene co-expression network analysis (WGCNA) is a novel system biology method for molecular interaction mechanism analysis and related network analysis [15-17]. Based on high-throughput microarray data, WGCNA can be used to find modules for co-expression and explore the relationship between gene networks and clinical phenotypes at the transcriptome level [18, 19], thereby providing novel insights for further OA research.
In this study, several microarray profiles of synovial tissue samples from OA patients and normal human synovial tissue samples were downloaded, and feature factors (DEGs) were screened by meta-analysis. WGCNA was further performed on DEGs to identify important modules and genes. Besides, a protein-protein interaction (PPI) network was established, and key genes that may be involved in the OA process were identified.
Materials and Methods
Processing of Microarray Data
The expression profile dataset was retrieved from the Gene Expression Omnibus (GEO) [20] (http://www.ncbi.nlm.nih.gov/geo/) with “osteoarthritis” and “Homo sapiens” as keywords (downloaded on May 3, 2020). The inclusion criteria were as follows: (i) synovial tissue in patients with OA; (ii) the samples were clearly grouped (OA vs. normal); and (iii) total number of samples ≥15. Finally, a total of 4 microarray datasets were screened, including GSE55457, GSE55235, GSE82107, and GSE12021 (Table 1).
Screening of Feature Genes
In order to screen out more reliable genes related to OA, a meta-analysis was performed on the above 4 microarray datasets. First, MetaQC package [21] was used to control the quality of the dataset based on the following standards: (i) Internal quality control (IQC, Homogeneity test of co-expression structure between studies); (ii) external quality control (EQC, consistency of co-expression pattern with pathway database); (iii) accuracy quality control (AQCg or AQCp, the accuracy of differentially expressed gene or enriched pathway identification); (iv) consistency quality control (CQCg or CQCp, consistency of differentially expressed gene detection or enriched pathway ranking). In addition, principal component analysis (PCA) and the standardized mean rank (SMR) were used to evaluate and filter the microarray datasets. By definition, 0 < SMR ≤1 and large SMR represented a likely problematic study [21].
Then, MetaDE. ES in the MetaDE package [22] was used to screen for significantly DEGs. The MetaDE.ES first performs a heterogeneity test on the expression value of each gene under different sequencing platforms to obtain τ2, and Q value. Q value >0.05 or τ2 = 0 indicated the expression of genes was homogenous and unbiased. Then, the differential expression of genes between groups was calculated to obtain p value, false discovery rate (FDR) and logFC (fold change). Finally, the DEGs were screened based on the following criteria: (i) the expression of genes was homogenous and unbiased (τ2 = 0 and Q value >0.05); (ii) FDR <0.05; and (iii) logFC of all dataset was consistently >0 or consistently <0. Based on DAVID 6.8 [23, 24], enrichment analysis of gene ontology (GO) categories and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were performed on the selected DEGs, and p < 0.05 was chosen as the screening threshold.
Screening Disease-Associated Modules Using WGCNA
WGCNA is a bioinformatics algorithm that builds a co-expression network to identify modules associated with disease and screen out important pathogenic mechanisms or potential therapeutic targets [25]. First, the GSE55457 with a relatively large number of samples and clinical information was set as the training set, and the remaining 3 data sets (GSE55235, GSE12021, and GSE82107) were regarded as the validation sets. Then, the WGCNA Version 1.61 package [16] (https://cran.r-project.org/web/packages/WGCNA/index.html) in the R3.4.1 language was used to screen for significant stable modules related to disease across datasets. Based on clustering and dynamic pruning, highly related genes were clustered into modules. The threshold was set to the number of RNAs in the module set ≥30 and cut height = 0.99. Finally, the WGCNA modules related to OA were determined.
PPI Network Construction
The STRING database (Version:10.0, http://string-db.org/) was used to search for the interactions between DEGs encoded proteins and those with interaction scores >0.6 were selected to construct a PPI network. The network was visualized through Cytoscape Version 3.6.1 (http://www.cytoscape.org/). Based on DAVID 6.8, GO functional annotation and KEGG pathway enrichment analysis were performed on DEGs-encoded proteins. p < 0.05 was selected as the significance threshold.
Then, the topology structure of the PPI network was analyzed. By calculating the topological structure parameters of Degree (the connection with other genes), betweenness centrality (BC), closeness centrality (CC), and Path length in the PPI network, the hub genes that play important roles in network connection were screened. Then, the descending order was sorted according to degree.
The Gene-Pathway Network Directly Correlated with OA Construction
Using “osteoarthritis” as a keyword, the KEGG pathways directly related to OA were searched from the Comparative Toxicogenomics Database Update 2019 database [26] (http://ctd.mdibl.org/). Then, the retrieved pathways were compared with pathways in which genes in the PPI network participate to obtain overlapping pathways. Genes involved in overlapping pathways were extracted from the PPI network to construct a gene-pathway network related to OA.
Results
Feature Gene Selection
A total of 11,812 common genes were detected on multiple platforms from the GEO. Table 2 shows the QC results for all 4 microarray datasets. The SMRs of the 4 datasets were very close, indicating that the datasets had low bias and high consistency. Further, the two-dimensional graph of PCA showed that the cumulative contribution rate of 1 and 2 principal components was >80%, and the 4 microarray patterns were all located in the parameter group, with little difference in position (Fig. 1a). These results indicated the 4 datasets were of good quality and were comparable.
Using the MetaDE software package, a total of 1,136 DEGs were identified from 4 microarray patterns. Subsequently, based on the expression values of DEGs in each data set, and bidirectional hierarchical clustering analysis was performed. The clustering analysis showed that DEGs could clearly distinguish the OA samples from the control samples (Fig. 1b), indicating that the expression of DEGs were significantly different. Besides, it can also be found from Figure 1b that the gene expression patterns of DEGs were relatively consistent in the 4 datasets.
Functional Analysis of Feature Genes
GO functional and KEGG pathway enrichment analysis was performed on DEGs. As a result, 109 significantly related GO annotations and 12 KEGG pathways were obtained. The results showed that the DEGs were mainly involved in the biological processes of “positive regulation of cell proliferation” (p = 6.420E−06), “regulation of cell motion” (p = 6.460E−06), and “regulation of locomotion” (p = 1.610E−05) (Fig. 2a). The most significant enrichment pathways were “pathways in cancer” (p = 1.790E−02), “adipocytokine signaling pathway” (p = 3.313E−03), and “lysosome” (p = 1.598E−03) (Fig. 2b).
Identification of Modules Related to OA Using WGCNA
In order to confirm whether the genes in each dataset were comparable, the gene expression levels and connectivity patterns in any 2 datasets were compared. As shown in Figure 3, the gene expression levels and connectivity patterns between each pair of datasets have a significant positive correlation (p < 0.05). Therefore, the data were comparable.
The dataset GSE55457 with relatively large number of samples and clinical information was used as the training set, and the remaining 3 datasets (GSE55235, GSE12021, and GSE82107) were used as the validation set. Then, the value of adjacency matrix weighting (power) was explored by using the samples in the training set of GSE55457. The relative balance of scale independence and average connectivity of WGCNA was determined. As shown in Figure 4, power = 7 was selected when the square value of the correlation coefficient reaches 0.9 for the first time. Next, the dissimilarity coefficients of DEGs were calculated and a systematic clustering tree was obtained. According to the standard of mixed dynamic shearing trees, the least gene number of each gene network was set to 30 and the cut-height was set to 0.99. As shown in the clustering dendrograms of Figure 5a, a total of 10 modules related to OA were finally obtained in the training set. In order to assess the robustness of the modules identified in the training set, the co-expression module generated in the training dataset were reproduced in the verification datasets (Fig. 5b–d) and the module preservation was among the 4 datasets calculated by summary preservation statistics in WGCNA. The Z-summary statistics of the 3 modules (green, turquoise, and yellow) were all >5 and p ≤ 0.05, indicating these modules were relatively stable (Table 3). This indicates that the 3 modules have been saved between the training data set and the verification data set.
To explore the clinical significance of the module, the correlation between module eigengenes (MEs) and gender, age, and status was analyzed. Figure 6 shows the heat map of the correlation between MEs and the disease factors of OA. The results showed that the gray (correlation = −0.87, p < 0.001), pink (correlation = −0.49, p = 1E−70), red (correlation = −0.64, p = 4E−132), brown (correlation = −0.52, p = 3E−79), and turquoise (correlation = −0.7, p = 8E−171) modules were negatively correlated with disease status, while magenta (correlation = 0.63, p = 6E−127), blue (correlation = 0.62, p = 6E−123), green (correlation = 0.7, p = 6E−165), black (correlation = 0.59, p = 3E−109), and yellow (correlation = 0.6, p = 2E−112) were positively correlated with disease status. These 3 modules, which contained a total of 370 genes, were selected as clinically important modules for further analysis.
Generation and Annotation of OA-Associated Module Network
The STRING database was used to search the interactions among the DEG-encoded proteins contained in the 3 modules, and proteins with an interaction score >0.6 were selected to construct a PPI network (Fig. 7). In addition, the top 10 hub genes were identified according to ascending order of Degree, including AKT1, CDC42, HLA-DQA2, TUBB, TWISTNB, GSK3B, FZD2, KLC1, GUSB, and RHOG (Table 4).
All these proteins have undergone enrichment analysis of GO function and KEGG pathway, and a total of 48 significantly related GO annotations and 6 KEGG pathways were obtained. Figure 8a shows the top 10 GO function entries sorted in ascending order of FDR value. The results showed that DEGs were mainly involved in the biological processes of “wnt receptor signaling pathway” (p = 5.160E−06), “protein amino acid phosphorylation” (p = 1.305E−03) and “cell surface receptor linked signal transduction” (p = 2.831E−03). As for the cellular component, the DEGs were mainly enriched in “vacuole” (p = 9.500E−04), “endoplasmic reticulum” (p = 1.147E−03) and “endomembrane system” (p = 2.705E−03). Regarding molecular function, the DEGs were mainly enriched in “purine ribonucleotide binding” (p = 2.100E−04), “ribonucleotide binding” (p = 2.100E−04), and “purine nucleotide binding” (p = 5.150E−04). Moreover, KEGG pathway enrichment analysis suggests that “lysosome” (p = 7.930E−04), “glycosaminoglycan degradation” (p = 5.149E−03), and “pathways in cancer” (p = 9.377E−03) were the most critical pathway in OA (Fig. 8b).
The Gene-Pathway Network Directly Correlated with OA
A total of 171 KEGG pathways directly related to OA were obtained from the CTD database. A total of 5 overlapping pathways including “lysosome,” “pathways in cancer,” “Wnt signaling pathway,” “ECM-receptor interaction,” and “focal adhesion” were obtained by comparing the retrieved pathways with those enriched from the PPI network. Based on overlapping pathways, an interaction network directly related to OA was extracted (Fig. 9). In addition, 5 hub genes related to OA (GSK3B, CDC42, AKT1, FZD2, and GUSB) were also included in this network. Among them, the GSK3B was simultaneously involved in multiple pathways, suggesting its crucial role in the network as well as the development of OA.
Discussion
In this study, a total of 1,136 DEGs were identified from 4 databases through MetaDE. Based on these DEGs, 3 OA-related modules (green, turquoise, and yellow) containing 370 genes were identified by WGCNA. The STRING database was used to search the interactions among the DEG-encoded proteins contained in the 3 modules, and proteins with an interaction score >0.6 were selected to construct a PPI network. Ten hub genes were detected in the PPI network, including AKT1, CDC42, HLA-DQA2, TUBB, TWISTNB, GSK3B, FZD2, KLC1, GUSB, and RHOG. The DEGs in the PPI network were enriched with 48 significantly related GO annotations and 6 KEGG pathways. Five pathways in the gene-pathway network, including “lysosome,” “pathways in cancer,” “Wnt signaling pathway,” “ECM-receptor interaction,” and “focal adhesion,” comprised 5 hub genes (GSK3B, CDC42, AKT1, FZD2, and GUSB) that were related to OA.
There are 5 top 10 hub genes in the gene-pathway network, namely GSK3B, CDC42, AKT1, FZD2, and GUSB. GSK3B is an essential regulator of articular cartilage destruction and plays multiple functions in the pathological process of osteoarthritis [27]. Studies have shown that GSK3B can regulate various cell activities including cell growth, cell survival, metabolism, and apoptosis gene expression [28, 29]. Wu et al. [30] showed that inhibition of GSK3B might prevent cartilage formation, and our results also indicate that GSK3B is down-regulated in OA. Therefore, GSK3B may play an important role in the cartilage formation of OA. CDC42 belongs to the small GTP-binding protein in the Ras superfamily [31]. Hu et al. [32] found that the genetic destruction of CDC42, knocking down CDC42 expression or inhibiting CDC42 activity can significantly restore the increase in the number of mesenchymal stem cells, osteoprogenitor cells, osteoblasts, osteoclasts, and new blood vessels forming. In this study, CDC42 was significantly down-regulated, which indicates CDC42 may play a role in the growth and development of OA cells.
AKT1 is a serine/threonine-specific protein kinase that mainly controls cell growth and survival [33]. In human cells, AKT1 activation occurs in response to growth factor stimulation. AKT1 can promote the cartilage calcification by inhibiting the accumulation of inorganic pyrophosphate in chondrocytes, thereby controlling the bone growth and osteophyte formation in OA [34]. FZD2 is a member of the Fzd receptor family of cell membranes on the Wnt pathway [35], which is involved in the activation and repair of skeletal muscle satellite cells [36, 37]. GUSB is a lysosomal enzyme that is important in the normal gradual degradation of glycosaminoglycans. Studies have shown that GUSB deficiency causes lysosomal storage disease mucopolysaccharide storage disease VII (MPS VII), and MPS VII can cause mental retardation, hepatomegaly and splenomegaly, and skeletal dysplasia [27]. AKT1, FZD2, and GUSB were all upregulated in OA, which may play an important role in the occurrence and development of OA.
The Wnt signaling pathway is a key regulator of endochondral ossification, which is the process of osteochondral formation. The Wnt signaling pathway is normally inhibited, but it will promote the development of OA when it is activated [38]. In addition, activation of the Wnt signaling pathway leads to increased chondrocyte catabolism, hypertrophy-like changes, and cartilage degradation [39]. Therefore, Wnt signal pathways may provide a new key to control the cartilage formation of OA to understand the conduction mechanism. Focal adhesion is a transmembrane protein complex, which can maintain multiple links with intracellular organelles through the cytoskeleton, and can attach chondrocytes to the surrounding extracellular matrix at the same time [40]. Studies have shown that the ECM-receptor interaction pathway and focal adhesion pathway are involved in the pathogenesis of OA [41]. Jang et al. [40] found that most acute chondrocyte death occurs in a focal adhesion kinase-dependent and Src family kinase-dependent manner, so inhibition of adhesion plaques can prevent cartilage cell death. Therefore, ECM-receptor interaction and focal adhesion pathway may help to inhibit chondrocyte death in OA. In addition, in this study, 4 top 10 hub genes (GSK3B, CDC42, AKT1, and FZD2) were involved in “pathways in cancer,” which indicates that the progression of OA is significantly related to the “pathways in cancer.” The limitation of this study is that the selected genes and pathways have not been experimentally verified, which may become the focus of future research.
Conclusion
In this study, the meta-analysis was used to screen the central genes associated with OA in a variety of gene expression profiles. Three OA-related modules (green, turquoise, and yellow) containing 370 genes were identified through WGCNA. In addition, 10 hub genes were identified in the PPI network, namely AKT1, CDC42, HLA-DQA2, TUBB, TWISTNB, GSK3B, FZD2, KLC1, GUSB, and RHOG. Finally, it was discovered through the gene-pathway network that GSK3B, CDC42, AKT1, FZD2, and GUSB may be key genes related to the progression of OA and may become promising therapeutic targets.
Acknowledgements
We are very grateful to the individuals who participated in this study.
Statement of Ethics
Not applicable because this study did not involve any in vitro and in vivo experiments.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
Funding Sources
The authors did not receive any funding.
Author Contributions
X.Z. performed the literature review, created the figures, and wrote the review article; W.G. contributed content expertise and performed critical revisions necessary for the final version of the review article.