Abstract
Introduction: Chronic kidney diseases (CKD) encompass a spectrum of complex pathophysiological processes. While numerous genome-wide association studies (GWASs) have focused on individual traits such as albuminuria, estimated glomerular filtration rate (eGFR), and eGFR change, there remains a paucity of genetic studies integrating these traits collectively for comprehensive evaluation. Methods: In this study, we performed individual GWASs for albuminuria, baseline eGFR, and eGFR slope utilizing data from non-diabetic individuals enrolled from the Taiwan Biobank (TWB). Subsequently, we employed principal component analysis to transform these three quantitative traits into principal components (PCs) and performed GWAS based on these principal components (PC-based GWAS). Results: The individual GWAS analyses of albuminuria, baseline eGFR, and eGFR slope identified 10, 13, and 210 candidate loci respectively, with 2, 3, and 99 of them representing previously reported loci. PC-based GWAS identified additional 20 novel candidate loci linked to CKD (p values ranging from 5.8 × 10−7 to 9.1 × 10−6). Notably, 4 of these 20 single nucleotide polymorphisms (rs9332641, rs10737429, rs117231653, and rs73360624) exhibited significant associations with kidney expression quantitative trait loci. Conclusion: To our knowledge, this study represents the first PC-based GWAS integrating albuminuria, baseline eGFR, and eGFR slope. Our approach found 20 novel candidate loci suggestively associated with CKD, underscoring the value of integrating multiple kidney traits in unraveling the pathophysiology of this complex disorder.
Plain Language Summary
Genome-wide association studies (GWASs) of chronic kidney disease (CKD) traits have been widely performed, but a comprehensive approach of integrating these individual traits is lacking. We applied principal component analysis to obtain principal components (PC) of albuminuria, estimated glomerular filtration rate (eGFR), and eGFR slope from Taiwan Biobank. Using these PCs as pseudotraits, we performed the first PC-based GWAS for evaluating genetic factors associated with CKD and twenty novel candidate loci associated with CKD were found. Our results provided new understanding of possible common pathogeneses for CKD.
Introduction
Chronic kidney disease (CKD) is an important medical concern affecting more than 10% of general population worldwide with complex underlying pathophysiology [1]. There were plenty of genome-wide association studies (GWASs) identifying risk loci for CKD, generally defined as estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2, or GWAS for eGFR per se [2‒10]. However, many CKD patients in stage 3–5 have rather stable kidney function for years, and thus, the change in eGFR over time is more important than a cross-sectional eGFR for predicting long-term prognosis [11]. Among the factors affecting eGFR trajectory, the degree of albuminuria is a major determinant [12]. The National Kidney Foundation has suggested using albuminuria and eGFR slope to replace the conventional outcome of 30% reduction of eGFR or kidney failure in clinical trials [13]. There were GWASs focused on identifying genetic factors associated with albuminuria [14‒16], and a few GWASs aimed to search risk loci for eGFR change [17‒19], whether a clear time interval was described or not. These traits were not seen as a whole and were analyzed separately in prior studies, maybe reflecting previous univariate GWASs of eGFR, albuminuria and eGFR change revealed associations with different loci.
A quantitative trait merely offers a partial understanding of a complex disease; therefore, GWAS focusing on a certain trait can only uncover a fraction of the SNPs linked to the disease. Even if summing up all the univariate GWAS results, some true loci associated with the disease will still be missed as they may not be significant in each GWASs of currently assessed traits. Since albuminuria, cross-sectional eGFR, and eGFR slope are quantitative traits, we can apply principal component analysis (PCA) to transform them into principal components (PCs). Using theses PCs as pseudotraits to perform GWAS (PC-based GWAS), we expect to identify additional candidate loci associated with CKD.
This study aimed to discover novel risk loci for CKD. Using data of non-diabetic participants from Taiwan Biobank (TWB), we have performed univariate GWASs for albuminuria, baseline eGFR, eGFR slope, and subsequent the multiple-trait PC-based GWAS. Twenty novel candidate loci associated with kidney function were found by PC-based GWAS.
Methods
Participants
The TWB is a prospective study of volunteers 30–70 years of age residing in Taiwan [20]. A total of 140,070 participants were recruited during January 2012 to December 2021. Biological specimens, personal, and clinical information as delinked data were analyzed. This study was approved by the Institutional Review Board of the National Taiwan University Hospital (No.: 202312110RINA). All subjects have provided written informed consent, and all methods were carried out in accordance with the Declaration of Helsinki and relevant regulations. Individuals with diabetes mellitus (either as stated in the questionnaire or having HbA1c ≥ 6.5%) were excluded from this study as it affected eGFR substantially.
Albuminuria Measurement
Urine albumin levels were measured using automatic quantitative turbidimetric immunoassays while urine creatinine levels were measured using the compensated Jaffe method and a chemistry analyzer (Hitachi LST008). Albuminuria is presented as urine albumin-to-creatinine ratio (UACR) and is log-transformed as it is not normally distributed.
Serum Creatinine Measurement, Cross-Sectional eGFR, and eGFR Slope Definition
Serum creatinine level was measured using a chemistry analyzer (Hitachi LST008) with compensated Jaffe method. eGFR was calculated using the 2021 CKD-EPI creatinine equation as follows: GFR (mL/min/1.73 m2) = 142 × min (serum creatinine/κ, 1)α × max (serum creatinine/κ, 1)−1.2 × 0.9938Age × 1.012 (if female), where κ is 0.7 for females and 0.9 for males, α is −0.241 for females and −0.302 for males; min indicates the minimum of Scr/κ or 1, and max indicates the maximum of Scr/κ or 1 [21]. eGFR slope is defined as change of eGFR, which is subtracting baseline eGFR (eGFR1) from latest follow-up eGFR (eGFR2), divided by follow up duration (expressed in years): ([eGFR2 – eGFR1]/duration).
Genotyping, Imputation, and Quality Control
The TWB undertook a comprehensive genotyping initiative involving the development of two custom single-nucleotide polymorphism (SNP) arrays. The aim was to enhance genotyping efficiency and accuracy. To harmonize the genotyping data from these two arrays, lifting over from Genome Reference Consortium Human Build 37 (GRCh37) to GRCh38 for TWB v1.0 array was done to be concordant with TWB v2.0 array. SHAPEIT4 [22] and IMPUTE2 [23] were utilized for phasing and imputation. Both the TWB dataset and the 1000 Genomes reference panel were used as imputation reference panels. This approach facilitated the integration of genotyping information from the two custom chips. To ensure data quality and reliability, a rigorous quality control procedure was meticulously executed. Markers located on the sex chromosomes were excluded from the analysis. Additionally, SNPs with a genomic control rate (GCR) below 5% were filtered out, as were those with a minor allele frequency (MAF) lower than 1%. Stringent Hardy-Weinberg equilibrium tests were applied, and SNPs failing these tests with a p value <1 × 10−8 were removed from consideration. Following these meticulous quality control steps, a total of 3,637,470 SNPs remained in the dataset, forming a robust foundation for subsequent association analyses.
Statistical Analyses
Age, eGFR, eGFR slope and follow-up duration are expressed as mean and standard deviation. UACR is expressed as a median and an interquartile range. The results of the GWAS are analyzed using PLINK v1.9 [24] with an additive genetic model. We utilized a linear regression model adjusting for age, sex, and the first ten principal components of ancestry to determine the association between SNPs and logUACR, baseline eGFR and eGFR slope. To avoid a substantial amount of false negative results by Bonferroni correction method, the p value threshold for candidate results were set at 1.0 × 10−5 in this study. A Manhattan plot and quantile-quantile plot (Q-Q plot) are generated using the qqman R package [25]. The GRCh38 is used for gene annotation. Principal component analyses were done using R software prcomp function [26].
Results
Characteristics of Study Population
After quality control, the numbers of participants available for albuminuria, baseline eGFR, eGFR slope GWAS, and PC-based GWAS are 11,280, 85,460, 29,480, and 10,887 respectively (Fig. 1). The demographics regarding each univariate GWASs and PC-based GWAS are shown in Table 1.
. | Albuminuria GWAS . | Baseline eGFR GWAS . | eGFR slope GWAS . | PC-based GWAS . |
---|---|---|---|---|
Participants, N | 11,280 | 85,460 | 29,480 | 10,887 |
Age, years | 48.8±10.4 | 49.1±10.7 | 49.7±10.3 | 48.9±10.3 |
Male, N (%) | 4,285 (38.0) | 31,040 (36.3) | 9,912 (33.6) | 3,950 (36.3) |
UACRa, mg/g | 8.355 (4.819–9.554) | - | - | 8.384 (4.839–9.587) |
Baseline eGFR, mL/min/1.73 m2 | - | 105.9±13.2 | 105.1±13.0 | 106.5±12.2 |
eGFR slope, mL/min/1.73 m2/year | - | - | −0.76±1.38 | −0.73±1.24 |
. | Albuminuria GWAS . | Baseline eGFR GWAS . | eGFR slope GWAS . | PC-based GWAS . |
---|---|---|---|---|
Participants, N | 11,280 | 85,460 | 29,480 | 10,887 |
Age, years | 48.8±10.4 | 49.1±10.7 | 49.7±10.3 | 48.9±10.3 |
Male, N (%) | 4,285 (38.0) | 31,040 (36.3) | 9,912 (33.6) | 3,950 (36.3) |
UACRa, mg/g | 8.355 (4.819–9.554) | - | - | 8.384 (4.839–9.587) |
Baseline eGFR, mL/min/1.73 m2 | - | 105.9±13.2 | 105.1±13.0 | 106.5±12.2 |
eGFR slope, mL/min/1.73 m2/year | - | - | −0.76±1.38 | −0.73±1.24 |
aExpressed as median (Q1–Q3).
Univariate GWASs Results
The Manhattan plots of albuminuria, baseline eGFR, and eGFR slope GWASs are shown in Figure 2 (the respective Q-Q plots in online suppl. Fig. S1; for all online suppl. material, see https://doi.org/10.1159/000541982). Individual GWAS of albuminuria, baseline eGFR and eGFR slope identified 10, 210, and 13 candidate loci with 2, 99, and 3 of them being previously reported to be associated with CKD. These candidate loci are listed in online supplementary Tables S1–S3.
PC-Based GWAS Results
A total of 10,887 participants possessed complete data on albuminuria, baseline eGFR, and eGFR slope. PCA of logUACR, baseline eGFR, and eGFR slope revealed the first three principal components explained the total variance. The vectors representing the original three kidney traits on PC1-PC2 are illustrated in Figure 3. Manhattan plots of PC1-PC3 GWASs are presented in Figure 4 (the respective Q-Q plots in online suppl. Fig. S2), with significant top loci listed in Table 2, in which the implicated genes include AGMAT, F5, OBSCN, TRIML2, AGO2, APBA1, BAZ1A, GCKR, ZNF512, AKAP6, CACNA2D3, SEC24B, RNF157, and PTPRT. Of the thirty significant top SNPs from PC-based GWAS, twenty of them were neither seen in our univariate GWASs results nor previous published GWASs regarding albuminuria, cross-sectional eGFR, or eGFR slope (Table 3). Among these twenty novel loci other than the intergenic SNPs, there was one near the F5 gene on chromosome 1 (rs9332641; p = 8.39 × 10−6), another near the OBSCN gene on chromosome 1 (rs10737429; p = 6.66 × 10−6), a third near the CACNA2D3 gene on chromosome 3 (rs76352818; p = 4.54 × 10−6), a fourth near the SEC24B gene on chromosome 4 (rs139161904; p = 4.21 × 10−6), a fifth near the TRIML2 gene on chromosome 4 (rs67785655; p = 1.96 × 10−6), a sixth near the AGO2 gene on chromosome 8 (rs4961274; p = 1.62 × 10−6), a seventh near the APBA1 gene on chromosome 9 (rs74445275; p = 1.84 × 10−6), an eighth near the BAZ1A gene on chromosome 14 (rs7147621; p = 5.02 × 10−6), a ninth near the RNF157 gene on chromosome 17 (rs73360624; p = 6.19 × 10−6), and the other near the PTPRT gene on chromosome 20 (rs8119626; p = 2.99 × 10−6).
Pseudotraits . | Chr . | BP . | SNP . | A1 . | A2 . | MAF . | p value . | Nearby gene . |
---|---|---|---|---|---|---|---|---|
PC1 | 1 | 15,586,492 | rs10159261 | G | T | 0.433 | 1.8E-07 | AGMAT |
1 | 169,524,475 | rs9332641 | T | C | 0.020 | 8.4E-06 | F5 | |
1 | 228,258,445 | rs10737429 | A | G | 0.434 | 6.7E-06 | OBSCN | |
2 | 123,677,383 | rs10171475 | C | T | 0.227 | 1.8E-06 | - | |
4 | 188,111,676 | rs67785655 | T | C | 0.188 | 2.0E-06 | TRIML2 | |
8 | 140,598,947 | rs4961274 | T | C | 0.087 | 1.6E-06 | AGO2 | |
9 | 69,531,003 | rs74445275 | T | A | 0.020 | 1.8E-06 | APBA1 | |
12 | 97,658,889 | rs4762397 | T | C | 0.399 | 8.6E-07 | - | |
14 | 34,806,498 | rs7147621 | A | G | 0.217 | 5.0E-06 | BAZ1A | |
18 | 70,501,770 | rs76982055 | C | T | 0.034 | 2.4E-06 | - | |
PC2 | 1 | 101,315,274 | rs77814797 | A | G | 0.056 | 5.1E-06 | - |
1 | 218,026,147 | rs193129422 | G | T | 0.011 | 6.7E-06 | LOC105372922 | |
2 | 27,512,105 | rs6547692 | G | A | 0.488 | 6.4E-06 | GCKR | |
2 | 27,598,615 | rs12989678 | T | C | 0.465 | 1.3E-06 | ZNF512 | |
2 | 51,728,241 | rs190931819 | G | C | 0.051 | 4.1E-06 | LOC730100 | |
4 | 60,669,935 | rs56188710 | T | C | 0.022 | 2.6E-06 | - | |
4 | 116,637,209 | rs10007743 | G | A | 0.384 | 6.7E-06 | - | |
4 | 126,787,643 | rs139816356 | C | T | 0.022 | 8.7E-06 | - | |
4 | 138,579,033 | rs1376095 | A | G | 0.209 | 1.0E-06 | - | |
4 | 138,588,684 | rs10461249 | C | G | 0.361 | 7.3E-06 | - | |
8 | 82,182,829 | rs79425178 | T | C | 0.031 | 7.7E-06 | LOC105375930 | |
14 | 32,571,580 | rs1155069 | C | T | 0.038 | 6.9E-07 | AKAP6 | |
PC3 | 3 | 54,613,722 | rs76352818 | T | C | 0.022 | 4.5E-06 | CACNA2D3 |
4 | 109,504,697 | rs139161904 | G | A | 0.021 | 4.2E-06 | SEC24B | |
6 | 54,486,360 | rs78828070 | G | A | 0.090 | 6.5E-07 | - | |
6 | 54,558,447 | rs146677015 | G | C | 0.015 | 2.9E-07 | - | |
6 | 141,371,205 | rs117153205 | G | A | 0.069 | 9.1E-06 | - | |
10 | 103,868,924 | rs117231653 | A | G | 0.078 | 5.8E-07 | - | |
17 | 76,196,746 | rs73360624 | G | C | 0.047 | 6.2E-06 | RNF157 | |
20 | 42,732,473 | rs8119626 | T | C | 0.013 | 3.0E-06 | PTPRT |
Pseudotraits . | Chr . | BP . | SNP . | A1 . | A2 . | MAF . | p value . | Nearby gene . |
---|---|---|---|---|---|---|---|---|
PC1 | 1 | 15,586,492 | rs10159261 | G | T | 0.433 | 1.8E-07 | AGMAT |
1 | 169,524,475 | rs9332641 | T | C | 0.020 | 8.4E-06 | F5 | |
1 | 228,258,445 | rs10737429 | A | G | 0.434 | 6.7E-06 | OBSCN | |
2 | 123,677,383 | rs10171475 | C | T | 0.227 | 1.8E-06 | - | |
4 | 188,111,676 | rs67785655 | T | C | 0.188 | 2.0E-06 | TRIML2 | |
8 | 140,598,947 | rs4961274 | T | C | 0.087 | 1.6E-06 | AGO2 | |
9 | 69,531,003 | rs74445275 | T | A | 0.020 | 1.8E-06 | APBA1 | |
12 | 97,658,889 | rs4762397 | T | C | 0.399 | 8.6E-07 | - | |
14 | 34,806,498 | rs7147621 | A | G | 0.217 | 5.0E-06 | BAZ1A | |
18 | 70,501,770 | rs76982055 | C | T | 0.034 | 2.4E-06 | - | |
PC2 | 1 | 101,315,274 | rs77814797 | A | G | 0.056 | 5.1E-06 | - |
1 | 218,026,147 | rs193129422 | G | T | 0.011 | 6.7E-06 | LOC105372922 | |
2 | 27,512,105 | rs6547692 | G | A | 0.488 | 6.4E-06 | GCKR | |
2 | 27,598,615 | rs12989678 | T | C | 0.465 | 1.3E-06 | ZNF512 | |
2 | 51,728,241 | rs190931819 | G | C | 0.051 | 4.1E-06 | LOC730100 | |
4 | 60,669,935 | rs56188710 | T | C | 0.022 | 2.6E-06 | - | |
4 | 116,637,209 | rs10007743 | G | A | 0.384 | 6.7E-06 | - | |
4 | 126,787,643 | rs139816356 | C | T | 0.022 | 8.7E-06 | - | |
4 | 138,579,033 | rs1376095 | A | G | 0.209 | 1.0E-06 | - | |
4 | 138,588,684 | rs10461249 | C | G | 0.361 | 7.3E-06 | - | |
8 | 82,182,829 | rs79425178 | T | C | 0.031 | 7.7E-06 | LOC105375930 | |
14 | 32,571,580 | rs1155069 | C | T | 0.038 | 6.9E-07 | AKAP6 | |
PC3 | 3 | 54,613,722 | rs76352818 | T | C | 0.022 | 4.5E-06 | CACNA2D3 |
4 | 109,504,697 | rs139161904 | G | A | 0.021 | 4.2E-06 | SEC24B | |
6 | 54,486,360 | rs78828070 | G | A | 0.090 | 6.5E-07 | - | |
6 | 54,558,447 | rs146677015 | G | C | 0.015 | 2.9E-07 | - | |
6 | 141,371,205 | rs117153205 | G | A | 0.069 | 9.1E-06 | - | |
10 | 103,868,924 | rs117231653 | A | G | 0.078 | 5.8E-07 | - | |
17 | 76,196,746 | rs73360624 | G | C | 0.047 | 6.2E-06 | RNF157 | |
20 | 42,732,473 | rs8119626 | T | C | 0.013 | 3.0E-06 | PTPRT |
Chr . | BP . | SNP . | A1 . | A2 . | MAF . | p value . | Nearby gene . | Kidney eQTL1 . | Involved gene expression in kidney tissue . |
---|---|---|---|---|---|---|---|---|---|
1 | 101,315,274 | rs77814797 | A | G | 0.056 | 5.14E-06 | - | No | |
1 | 169,524,475 | rs9332641 | T | C | 0.02 | 8.39E-06 | F5 | Yes | SELL (glo); ATP1B1 (ti) |
1 | 228,258,445 | rs10737429 | A | G | 0.434 | 6.66E-06 | OBSCN | Yes | TRIM17, OBSCN-AS1, IBA57-AS1, MRPL55 (ti) |
2 | 123,677,383 | rs10171475 | C | T | 0.227 | 1.82E-06 | - | No | |
3 | 54,613,722 | rs76352818 | T | C | 0.022 | 4.54E-06 | CACNA2D3 | No | |
4 | 60,669,935 | rs56188710 | T | C | 0.022 | 2.61E-06 | - | No | |
4 | 109,504,697 | rs139161904 | G | A | 0.021 | 4.21E-06 | SEC24B | No | |
4 | 116,637,209 | rs10007743 | G | A | 0.384 | 6.73E-06 | - | No | |
4 | 126,787,643 | rs139816356 | C | T | 0.022 | 8.72E-06 | - | No | |
4 | 138,588,684 | rs10461249 | C | G | 0.361 | 7.29E-06 | - | No | |
4 | 188,111,676 | rs67785655 | T | C | 0.188 | 1.96E-06 | TRIML2 | No | |
6 | 141,371,205 | rs117153205 | G | A | 0.069 | 9.14E-06 | - | No | |
8 | 140,598,947 | rs4961274 | T | C | 0.087 | 1.62E-06 | AGO2 | No | |
9 | 69,531,003 | rs74445275 | T | A | 0.02 | 1.84E-06 | APBA1 | No | |
10 | 103,868,924 | rs117231653 | A | G | 0.078 | 5.82E-07 | - | No | |
12 | 97,658,889 | rs4762397 | T | C | 0.399 | 8.64E-07 | - | No | |
14 | 34,806,498 | rs7147621 | A | G | 0.217 | 5.02E-06 | BAZ1A | Yes | CFL2 (ti) |
17 | 76,196,746 | rs73360624 | G | C | 0.047 | 6.19E-06 | RNF157 | Yes | ST6GALNAC2, UBE2O (glo); CYGB (ti) |
18 | 70,501,770 | rs76982055 | C | T | 0.034 | 2.43E-06 | - | No | |
20 | 42,732,473 | rs8119626 | T | C | 0.013 | 2.99E-06 | PTPRT | No |
Chr . | BP . | SNP . | A1 . | A2 . | MAF . | p value . | Nearby gene . | Kidney eQTL1 . | Involved gene expression in kidney tissue . |
---|---|---|---|---|---|---|---|---|---|
1 | 101,315,274 | rs77814797 | A | G | 0.056 | 5.14E-06 | - | No | |
1 | 169,524,475 | rs9332641 | T | C | 0.02 | 8.39E-06 | F5 | Yes | SELL (glo); ATP1B1 (ti) |
1 | 228,258,445 | rs10737429 | A | G | 0.434 | 6.66E-06 | OBSCN | Yes | TRIM17, OBSCN-AS1, IBA57-AS1, MRPL55 (ti) |
2 | 123,677,383 | rs10171475 | C | T | 0.227 | 1.82E-06 | - | No | |
3 | 54,613,722 | rs76352818 | T | C | 0.022 | 4.54E-06 | CACNA2D3 | No | |
4 | 60,669,935 | rs56188710 | T | C | 0.022 | 2.61E-06 | - | No | |
4 | 109,504,697 | rs139161904 | G | A | 0.021 | 4.21E-06 | SEC24B | No | |
4 | 116,637,209 | rs10007743 | G | A | 0.384 | 6.73E-06 | - | No | |
4 | 126,787,643 | rs139816356 | C | T | 0.022 | 8.72E-06 | - | No | |
4 | 138,588,684 | rs10461249 | C | G | 0.361 | 7.29E-06 | - | No | |
4 | 188,111,676 | rs67785655 | T | C | 0.188 | 1.96E-06 | TRIML2 | No | |
6 | 141,371,205 | rs117153205 | G | A | 0.069 | 9.14E-06 | - | No | |
8 | 140,598,947 | rs4961274 | T | C | 0.087 | 1.62E-06 | AGO2 | No | |
9 | 69,531,003 | rs74445275 | T | A | 0.02 | 1.84E-06 | APBA1 | No | |
10 | 103,868,924 | rs117231653 | A | G | 0.078 | 5.82E-07 | - | No | |
12 | 97,658,889 | rs4762397 | T | C | 0.399 | 8.64E-07 | - | No | |
14 | 34,806,498 | rs7147621 | A | G | 0.217 | 5.02E-06 | BAZ1A | Yes | CFL2 (ti) |
17 | 76,196,746 | rs73360624 | G | C | 0.047 | 6.19E-06 | RNF157 | Yes | ST6GALNAC2, UBE2O (glo); CYGB (ti) |
18 | 70,501,770 | rs76982055 | C | T | 0.034 | 2.43E-06 | - | No | |
20 | 42,732,473 | rs8119626 | T | C | 0.013 | 2.99E-06 | PTPRT | No |
1nephQTL (Gillies 2018 [27]).
glo, glomerulus; ti, tubulointerstitium.
Discussion
In this first PC-based GWAS regarding albuminuria, cross-sectional eGFR and eGFR slope, we identified thirty candidate loci associated with CKD and among which twenty were novel. Several implicated genes of the ten non-novel loci identified concomitantly in univariate GWAS and PC-GWAS were readily known to be associated with kidney injury or disease. Agmatinase (AGMAT) is significantly differentially expressed between different kidney disease entities [28]. Glucokinase Regulator (GCKR) polymorphism is known to be associated with CKD, end-stage kidney disease (ESKD) and serum uric acid level [29‒31]. Zinc Finger Protein 512 (ZNF512) has been suggested as a candidate gene of hyperuricemia and gout [32]. A-Kinase Anchoring Protein 6 (AKAP6) is shown to be associated with nephrolithiasis [33].
Among the twenty novel loci, some of the nearby genes are also implicated with kidney injury and CKD from literature. F5 (Coagulation factor V) is produced by macrophages in diverse circumstances including acute kidney injury and thus is associated with kidney fibrosis [34]. Obscurin (OBSCN) is involved in sarcoplasmic reticulum handling of intracellular calcium concentration. When there are bi-allelic loss-of-function mutations in OBSCN, the individual will suffer from recurrent rhabdomyolysis with myalgia, muscle weakness, or dark urine. Serum creatinine level will be also elevated due to release of creatine and phosphocreatine from myocytes and tubular necrosis secondary to obstruction of the kidney tubules by myoglobin [35‒37]. Argonaute RISC Catalytic Component 2 (AGO2) is an RNA-binding protein of the Argonaute family that assembles the miRNA-induced silencing complex (miRISC) via binding both miRNA and its target mRNA. miR-429-3p attenuates branched-chain amino acid catabolism in kidney proximal tubules, causing an oxidative-stress-induced form of cell death known as ferroptosis [38]. Thus AGO2 may be involved in kidney injury and CKD.
Four of the novel loci also had significant kidney eQTL results from nephQTL database [27], in which rs9332641 was associated with SELL expression in kidney glomeruli and ATP1B1 expression in tubulointerstitium, rs10737429 was associated with tubulointerstitial expression of TRIM17, OBSCN-AS1, IBA57-AS1 and MRPL55, rs7147621 was associated with tubulointerstitial expression of CFL2 and rs73360624 was associated with ST6GALNAC2 and UBE2O expression in kidney glomeruli and CYGB expression in tubulointerstitium. SELL encodes Selectin L which regulates the inflammatory response in different organs, including the kidney [39]. miR-192-5p aims ATP1B1 mRNA as a target, and for rats with miR-192-5p knockdown undergoing subsequent ATP1B1 knockdown, the salt-sensitive hypertension phenomenon can be attenuated [40]. CFL2 mutation causes nemaline rod myopathy [41]. Malnutrition, myopathy and conditions that cause reduced muscle mass will all affect serum creatinine level [42]. ST6GALNAC2 is associated with IgA1 sialylation. Desialylation of IgA1 is one of the pathomechanism of IgA nephropathy, and there was study showing the frequency of haplotype ADG in the promoter region of ST6GALNAC2 gene was significantly higher in patients with IgA nephropathy [43]. Reduced podocytes numbers can be found in Cygb knockout mice and positively correlated with kidney function decline. Analysis of the CYGB-dependent transcriptome revealed dysregulation of genes involved in redox balance, apoptosis and CKD [44]. There was also an intergenic SNP on chromosome 10 (rs117231653, p = 5.82 × 10−7) significantly associated with COL17A1 expression in whole blood noted from GTEx [45]. Collagen XVII is identified in both human and murine kidney, which localized in foot process of podocytes and glomerular basement membrane by immunoelectron microscopy, and thus is possibly related to podocyte maturation and glomerular filtration [46].
PCA identifies vectors (PCs) which best describe the characteristics of a dataset via reducing the dimensionality of the data while retaining its maximal variation [47]. Since logUACR, cross-sectional eGFR and eGFR slope were all quantitative traits, we applied PC-based GWAS in search of additional loci that were not able to be identified via conventional univariate GWAS. PC-based GWAS is uncommon in the current literature, and there is only one PC-based GWAS regarding kidney traits/CKD [48]. Tran et al. selected serum creatinine level, eGFR and blood urea nitrogen (BUN) as primary traits for their PC-based GWAS. However, the eGFR value is simply a dependent variable of serum creatinine, thus basically the two are heavily overlapped; BUN is not a specific kidney biomarker function as individuals with normal kidney function can have elevated BUN under circumstances of relatively dehydrate or on high protein diet. The primary traits Tran et al. [48] selected have issues stated above, and thus, the representativeness of their results is questionable.
There were issues of “missing heritability” among many GWASs [49]. From studies of different diseases and complex traits, low-frequency variants of certain population were noted to play a substantial role [50‒55]. The phenomenon of missing heritability could at least partially explained by that conventional GWASs only identify common variants with MAF higher than 5% while leaving out the rarer variants. Half of the twenty novel candidate loci discovered in this study had MAF between 1 and 5% and thus cannot be replicated or seen in previous GWASs.
Diabetes is the leading cause of CKD worldwide and it strongly affects eGFR change [56]. The trajectory of eGFR is extremely complex and heterogeneous in patients with type 2 diabetes [57‒59]. Also, there is a high proportion of patients with diabetic kidney disease experiencing kidney hyperfiltration, in other words, eGFR was elevated rather than decreased during this stage [60]. Due to these reasons, we think it is necessary to exclude diabetic individuals in advance than performing covariate adjustment during association analysis to fully negate the effect of diabetes on kidney function.
Our study has several limitations. Firstly, the number of individuals possessing all three primary kidney traits was slightly above 10,000, which is considered small compared to the sample sizes of many large-scale GWAS for individual trait. Use of p values below 10−5 to identify suggestively associated loci would also increase the likelihood of capturing false positive associations. Although we excluded individuals with diabetes from this study, it is still likely that a portion of the remaining individuals may have non-diabetic kidney disease. Moreover, most biobanks, including TWB, typically assess serum creatinine but not cystatin C due to the higher cost associated with the latter. However, using creatinine-based formulas for GFR estimation has inherent limitations. Therefore, there is a growing trend towards incorporating serum cystatin C levels to improve the precision of GFR estimation [61]. Furthermore, the current eQTL databases such as nephQTL and GTEx are not based on East Asian population, and thus, the eQTL information may not be precise for Taiwanese population. Therefore, future studies would benefit from the establishment of a Taiwanese kidney transcriptomic database to provide more accurate eQTL results for candidate loci identified in our study.
In conclusion, we have conducted the first PC-based GWAS regarding albuminuria, cross-sectional eGFR and eGFR slope in the non-diabetic Taiwanese population in search of genetic loci associated with kidney function decline/CKD. Twenty novel candidate loci were identified and several of them had significant kidney eQTL results and the involved genes were readily known to be associated with kidney function or kidney disease. Future functional studies may further support our findings.
Acknowledgments
The authors would like to thank all participants in the Taiwan Biobank for providing the data and all the supports.
Statement of Ethics
This study was approved by the Institutional Review Board of the National Taiwan University Hospital Research Ethics Committee (Approval No. 202312110RINA). All subjects have provided written informed consent, and all methods were carried out in accordance with the declaration of Helsinki and relevant regulations.
Conflict of Interest Statement
The authors declare no competing interests.
Funding Sources
This study is funded by intramural grants from the National Taiwan University Hospital (NTUH110-M4813 and NTUH 112-S0093). The funding agency had no role in the study design, data collection, data analysis, data interpretation, writing of the report, or the decision to submit the report for publication.
Author Contributions
G.-T.C. conceptualized the research idea and study design. G.-T.C. and T.P.-H.C. were responsible for data curation and formal analysis. G.-T.C. and Y.-C.C. contributed to funding acquisition. C.-N.H. provided statistical consultation and data interpretation. G.-T.C. drafted the initial manuscript. Y.-C.C. provided critical feedback during data analyzing and manuscript drafting. All authors read and approved the final manuscript.
Data Availability Statement
Data that supports the results of this study are available from the Taiwan Biobank. According to the restrictions of data availability, the data can only be used under license for the current study and are not publicly available. Further enquiries can be directed to the corresponding author.