Abstract
Introduction: Polygenic score (PGS) is a valuable method for assessing the estimated genetic liability to a given outcome or genetic variability contributing to a quantitative trait. While polygenic risk scores are widely used for complex traits, their application in uncovering shared genetic predisposition between phenotypes, i.e., when genetic variants influence more than one phenotype, remains limited. Methods: We developed an R package, comorbidPGS, which facilitates a systematic evaluation of shared genetic effects among (cor)related phenotypes using PGSs. The comorbidPGS package takes as input a set of single nucleotide polymorphisms along with their established effects on the original phenotype (Po), referred to as Po-PGS. It generates a comprehensive summary of effect(s) of Po-PGS on target phenotype(s) (Pt) with customisable graphical features. Results: We applied comorbidPGS to investigate the shared genetic predisposition between phenotypes defining elevated blood pressure (systolic blood pressure, SBP; diastolic blood pressure, DBP; pulse pressure) and several cancers (breast cancer; pancreatic cancer, PanC; kidney cancer, KidC; prostate cancer, PrC; colorectal cancer, CrC) using the European ancestry UK Biobank individuals and GWAS meta-analyses summary statistics from independent set of European ancestry individuals. We report a significant association between elevated DBP and the genetic risk of PrC (β [SE] = 0.066 [0.017], p value = 9.64 × 10−5), as well as between CrC PGS and both, lower SBP (β [SE] = −0.10 [0.029], p value = 3.83 × 10−4) and lower DBP (β [SE] = −0.055 [0.017], p value = 1.05 × 10−3). Our analysis highlights two nominally significant relationships for individuals with genetic predisposition to elevated SBP leading to higher risk of KidC (OR [95% CI] = 1.04 [1.0039–1.087], p value = 2.82 × 10−2) and PrC (OR [95% CI] = 1.02 [1.003–1.041], p value = 2.22 × 10−2). Conclusion: Using comorbidPGS, we underscore mechanistic relationships between blood pressure regulation and susceptibility to three comorbid malignancies. This package offers valuable means to evaluate shared genetic susceptibility between (cor)related phenotypes through polygenic scores.
Introduction
Since 2006, the polygenic score (PGS), sometimes known as polygenic risk score (PRS), genetic or genomic risk score (GRS) when applied to binary phenotype [1‒3], has become a widely implemented approach in follow-ups from genome-wide association study (GWAS) meta-analyses [4‒6]. PGS quantifies an individual’s genetic predisposition to a given trait [7].
Although PRSs for common phenotypes, such as breast cancer (BrC) or type 2 diabetes (T2D), are extensively studied, and numerous analytical tools for PGS construction have been developed [4, 8‒10], the application of PGS to uncover multi-phenotype effects remains an undeveloped area [8]. To address this gap, we built a novel user-friendly R package, comorbidPGS, designed to facilitate association analyses between a set of pre-defined PRSs, Po-PGS, and a set of target phenotype(s), Pt.
We tested comorbidPGS on several phenotypes using the UK BioBank (UKBB) dataset and PGS distributions constructed from up-to-date GWAS for blood pressure (BP) and five cancers. The cancers were selected for their epidemiological relationships with higher BP [11‒15]. Specifically, we used: (i) three common cancers, BrC, colorectal (CrC), and prostate (PrC); (ii) two rare cancers: kidney (KidC) and pancreatic (PanC); and (iii) three continuous blood pressure phenotypes used to diagnose hypertension, namely systolic (SBP), diastolic (DBP), and pulse pressure.
High BP is one of the most frequent conditions around the globe increasing the risk of stroke and cardiovascular diseases [16]. However, the nature of the relationship between hypertension and cancer development is mostly unknown. The interest in the underlying factors and the relationship between cancer and high BP surged following the discovery of an association between calcium channel blockers and multiple cancer types [17]. The epidemiological studies suggest complex relationships between elevated BP, particularly SBP and DBP, and cancers, such as KidC, comprising renal cell carcinoma, CrC, and BrC [11, 14, 18]. Except for KidC, these epidemiological associations are inconsistent and lack careful mechanistic interpretation [19, 20].
Several biological hypotheses exist about the direct relationship between high BP and higher cancer risk. Patients with high BP have elevated levels of plasma vascular endothelial growth factor (VEGF), inducing and dysregulating angiogenesis that could promote cancer development [16, 21‒23]. Another possible mechanism is the remodelling of the extracellular matrix (ECM), particularly ECM stiffening [24]. The latter is associated with hypertension and malignant phenotypes that disrupt molecules’ cell-to-cell junctions, boosting carcinogenesis [25]. ECM stiffening leads to changes in arterial walls through oxidative stress, promoting malignant transformation [26]. Additionally, a higher risk of KidC, related to elevated BP, could be attributed to chronic renal hypoxia [27] and lipid peroxidation formation of reactive oxygen species [28]. The acceleration of senescence due to elevated BP is also related to cancer development [26]. This phenomenon, observed in hypertensive rats, could activate a higher turnover of cells and reduction in DNA half-life, leading to telomere fragments shortening in kidneys [26]. While elevated BP is a predominant age-related condition, it is still unclear whether hypertension is a cancer risk factor or whether the reported associations arise through alternative mechanisms [15].
Materials and Methods
Description of comorbidPGS
comorbidPGS is an R package with two main analytical and visualisation purposes. Firstly, it evaluates the association between original phenotype’s (Po) PGS(s) in the user’s dataset, Po-PGS(s), and Pt (Fig. 1), again in the user’s dataset. Secondly, the package provides customisable graphical features to visualise the estimated distributions of individual Pt-PGS(s) and respective associations (Fig. 2).
comorbidPGS pipeline: from PGS construction using GWAS summary statistics and a dataset with genetic information; to the evaluation the genetic liability of Po-PGS on Pt. PGS, polygenic score; Po, original phenotype; Pt, target phenotype; Po-PGS, PGS based on the original phenotype.
comorbidPGS pipeline: from PGS construction using GWAS summary statistics and a dataset with genetic information; to the evaluation the genetic liability of Po-PGS on Pt. PGS, polygenic score; Po, original phenotype; Pt, target phenotype; Po-PGS, PGS based on the original phenotype.
Diagram of analytical and visualisation tools integrated in the comorbidPGS package: the association analysis using assoc(), multiphenassoc(), multiassoc(), and assocplot(); and the visualisation component, with the functions densityplot(), centileplot(), and decileboxplot().
Diagram of analytical and visualisation tools integrated in the comorbidPGS package: the association analysis using assoc(), multiphenassoc(), multiassoc(), and assocplot(); and the visualisation component, with the functions densityplot(), centileplot(), and decileboxplot().
comorbidPGS requires R environment version 3.5 or higher with the stats, utils, and ggplot2 packages. Most functions within the comorbidPGS package operate on an R data.frame containing a minimum of three columns: (1) a unique identifier for each sample/individual (“ID”), (2) at least one Po-PGS distribution (default column name: “SCORESUM,” in PLINK format [29]), and (3) at least one Pt (default column name: “Phenotype”). The Po-PGS distribution must be in numeric format and can be scaled prior to association using the argument scale = TRUE. We highly recommend to scale distributions, especially when using multiple PRSs, to avoid spurious associations resulting from changes in medians and standard deviations. The package provides a simulated mock dataset called “comorbidData” to facilitate testing and exploration.
comorbidPGS supports and automatically detects four distinct phenotype definitions (Pt), each with its own properties:
- (i)
Binary or case/control Pt: in this case, the phenotype column has 2 unique values. The column can be represented as logical (TRUE/FALSE), numeric (0/1), or characters/factors. In the case of characters, controls are considered the first level to appear in alphabetic order, e.g., “0,” while cases represent the remaining level, i.e., “1”. If the phenotype is already a factor data.frame, controls are the reference level. The columns named “brc,” “t2d,” and “hypertension” within the comorbidData dataset are considered binary Pt.
- (ii)
Categorical Pt: this type applies when the phenotype column contains characters/factors with at least three different values without inherent order – e.g., affective psychosis categories [30], or five depressive symptoms chronicity categories [31]. The column named “ethnicity” within comorbidData contains four distinct values (“1”, “2”, “3,” and “4”), this column is initially classified as categorical Pt. Upon transformation into ordered R factors (with 4 ordered levels), this column is considered as ordered categorical Pt (below).
- (iii)
Ordered categorical Pt: this type is relevant for ordered R factors with at least three unique values. The column named “sbp_cat” from comorbidData contains factors with three unique levels (“low”, “normal,” and “high”); this column is considered as ordered categorical Pt.
- (iv)
Continuous Pt: this type encompasses all other numeric values. For this phenotype, the package performs a systematic check using the Shapiro test to evaluate the normality of the distribution. A warning will be displayed if the Shapiro test detects a nonnormal distribution. Within this type of variable, the test consider the R NA and “-9” as values for missing data. “ldl,” “log_ldl,” “bmi,” and ‘sbp’ within comorbidData are classified as continuous Pt.
To test for the association between Po-PGS and related Pt, comorbidPGS provides several functions (Fig. 1): assoc() tests the association between one Po-PGS distribution and one Pt ; multiphenassoc() if testing multiple Pt (given in an R list or vector); multiassoc() supports multiple Po-PGS and multiple Pt (given in an R matrix, or a data.frame). multiassoc() offers the possibility to parallelise the associations, making the analysis much faster. However, due to its system architecture, the R parallelisation is not supported in Windows operating systems; therefore, analysis will take longer as compared to other operating systems. All three association functions output an R data.frame with properties given in online supplementary Table 1 (for all online suppl. material, see https://doi.org/10.1159/000539325).
The association functions in comorbidPGS automatically determine the variable type of input phenotype and apply an appropriate regression method. The package uses linear regression for the association between a continuous trait Pt and a Po-PGS distribution. In the case of a binary phenotype (case/control Pt), a logistic regression is applied for association. Categorical traits are analysed using multinomial logistic regression, while ordered categorical traits are assessed using ordinal logistic regression.
The output data.frame generated by these association functions varies depending on the regression method. For linear regression associations, the output includes betas and standard errors, without N sample size, while logistic regression reports include odds ratio (OR), 95% confidence intervals, and N_cases/N_controls/N total sample size. In the case of multinomial logistic regression, the results are divided into v − 1 rows, where v represents the number of different values for the given trait. Each row reports the association between the reference level and each of the subsequent levels. assocplot() allows to visualise association(s) from assoc(), multiphenassoc(), or multiassoc(). This function creates a forest plot using a customisable ggplot2 object (Fig. 2). If the association table contains different result types, such as betas and ORs, assocplot() generates two separate forest plots, stored in $continuous_phenotype and $discrete_phenotype, respectively.
In the comorbidPGS package, we offer several graphical representations of Po-PGS and Pt distributions using three functions (Fig. 2): densityplot(), centileplot(), and decileboxplot(). Each function is commonly used in PGS analysis to examine genetic predisposition [32, 33]. They require an R data.frame containing IDs, Po-PGS, and Pt as input. All graphical functions generate a customisable ggplot object from the ggplot2 package. densityplot() visualises the distribution of Po-PGS in relation to a given Pt (online suppl. Fig. 1A). This representation is particularly useful when examining categorical (plotting one distribution for each category) or cases/controls Pt (plotting two distributions) alongside the Po-PGS distribution. This function can be used with continuous Pt if a numerical threshold argument is provided. In such cases, the trait is transformed into Cases/Controls phenotype by applying the given threshold as a delimiter.
The centileplot() (online suppl. Fig. 1B) and decileboxplot() (online suppl. Fig. 1C) functions are used to display centiles and deciles, respectively, of Po-PGS in relation to a given Pt. When a cases/controls Pt is provided, the y-axis represents the prevalence of the phenotype. If Pt is continuous, these functions allow to define the mean or median value of the phenotype for each centile/decile. The centileplot() function and decileboxplot() function do not support categorical and ordered categorical Pt.
Constructing the PGS
We gathered established loci, from up-to-date and publicly available GWAS summary statistics, for SBP, DBP, PP [34], and the 5 cancers of interest [35‒39] (Table 1). We selected GWAS if they harboured many participants (N ≥ 10, 000) composed of European ancestry individuals or multi-ancestry including Europeans. To avoid false positive associations, only genome-wide significant (association p < 5 × 10−8) independent (r2 < 0.2) SNPs were used in our analysis. The loci with their associated weights from GWAS (e.g., for BrC Po-PGS, we used 212 BrC SNPs weighted by their respective effect size from the BrC GWAS) were stored [9] for PGS construction (full list of SNPs and respective weights is available in online supplementary Table 2) and represent the base data of PGS [8]. All studies were approved by Local Ethics Committees and all participants gave informed consent.
Established associations and respective original publications used in analysis of comorbidity elevated between blood pressure and five malignancies
Phenotype . | Breast cancer . | Colorectal cancer . | KidC . | PanC . | PrC . | SBP/DBP/PP (hypertension) . |
---|---|---|---|---|---|---|
NSNPs | 212 | 143 | 19 | 22 | 269 | 445/338/437 |
GWAS | Zhang et al., 2020 | Huyghe et al., 2019 | Scelo et al., 2017 | Klein et al., 2018 | Conti et al., 2021 | Evangelou et al., 2018 using ICBP only GWAS |
Ncases | 133,384 | 12,696 | 10,784 | 9,040 | 79,148 | NA |
Ncontrols | 113,789 | 15,113 | 20,406 | 12,496 | 61,106 | 299,024 |
Phenotype . | Breast cancer . | Colorectal cancer . | KidC . | PanC . | PrC . | SBP/DBP/PP (hypertension) . |
---|---|---|---|---|---|---|
NSNPs | 212 | 143 | 19 | 22 | 269 | 445/338/437 |
GWAS | Zhang et al., 2020 | Huyghe et al., 2019 | Scelo et al., 2017 | Klein et al., 2018 | Conti et al., 2021 | Evangelou et al., 2018 using ICBP only GWAS |
Ncases | 133,384 | 12,696 | 10,784 | 9,040 | 79,148 | NA |
Ncontrols | 113,789 | 15,113 | 20,406 | 12,496 | 61,106 | 299,024 |
SBP, systolic blood pressure; DBP, diastolic blood pressure; PP, pulse pressure.
For our analysis, we used the UKBB, a population-based prospective cohort study with genetic and phenotypic data on approximately half a million individuals from the United Kingdom, aged 39–73 [40]. Individuals were recruited between 2006 and 2010, and their phenotypic data were collected. It includes biological samples, physical measurements, a questionnaire, an interview, and medical records. UKBB dataset derived from genome-wide imputed data analysed in this work includes 459,247 individuals of European descent. We used combined hospital admission data, self-reports, and ICD10 to define the 5 cancers (online suppl. Table 3). Three BP continuous metrics of blood pressure were used, including SBP, DBP, and PP. There are automated and manual records for each of them in the UKBB [34]. We used the mean value when more than one value was found for a given individual and adjusted for medication use by adding 15 mm Hg to SBP and 10 mm Hg to DBP for individuals with reported data on blood pressure-lowering medication [41] (online suppl. Table 3). Among the UKBB individuals, we identified 1,425 individuals with PanC, 2,436 KidC, 8,201 CrC, 11,825 PrC, and 18,676 with BrC (including 13,355 with post-menopausal BrC). Additionally, 230,737 individuals were hypertensive patients (Table 2). We built PGS with the UKBB dataset as target data, using the count of SNPs previously gathered [8].
Baseline characteristics of individuals of European ancestry in the UKBB
Characteristics . | Females . | Males . | p value (statistical difference between females and males) . | All . |
---|---|---|---|---|
Genotypic info | 249,297 | 209,950 | NA | 459,247 |
Age | 57.09 (7.954) | 57.5 (8.122) | <2.20 × 10−16a | 57.27 (8.034) |
Individuals on blood pressure-lowering medication | 43,042 (17.27%) | 52,670 (25.09%) | 1.97 × 10−7b | 95,712 (20.84%) |
BMI | 27.01 (5.142) | 27.85 (4.242) | <2.20 × 10−16a | 27.4 (4.77) |
SBP | 140.1 (22.02) | 146.9 (20.18) | <2.20 × 10−16a | 143.2 (21.46) |
DBP | 82.29 (11.55) | 86.46 (11.42) | <2.20 × 10−16a | 84.2 (11.68) |
PP | 57.84 (15.94) | 60.43 (14.37) | <2.20 × 10−16a | 59.03 (15.29) |
Hypertensive patients | 102,755 (41.22%) | 114,844 (54.70%) | <2.20 × 10−16b | 217,599 (47.38%) |
Breast cancer patients | 18,589 (7.46%) | 87 (0.04%) | <2.20 × 10−16b | 18,676 (7.49%) |
Colorectal cancer patients | 3,593 (1.44%) | 4,608 (2.20%) | <2.20 × 10−16b | 8,201 (1.79%) |
KidC patients | 883 (0.40%) | 1,553 (0.85%) | <2.20 × 10−16b | 2,436 (0.60%) |
PanC patients | 654 (0.26%) | 771 (0.37%) | <2.20 × 10−16b | 1,425 (0.31%) |
Prostate cancer patients | 0 (0%) | 11,825 (5.64%) | <2.20 × 10−16b | 11,825 (2.57%) |
Characteristics . | Females . | Males . | p value (statistical difference between females and males) . | All . |
---|---|---|---|---|
Genotypic info | 249,297 | 209,950 | NA | 459,247 |
Age | 57.09 (7.954) | 57.5 (8.122) | <2.20 × 10−16a | 57.27 (8.034) |
Individuals on blood pressure-lowering medication | 43,042 (17.27%) | 52,670 (25.09%) | 1.97 × 10−7b | 95,712 (20.84%) |
BMI | 27.01 (5.142) | 27.85 (4.242) | <2.20 × 10−16a | 27.4 (4.77) |
SBP | 140.1 (22.02) | 146.9 (20.18) | <2.20 × 10−16a | 143.2 (21.46) |
DBP | 82.29 (11.55) | 86.46 (11.42) | <2.20 × 10−16a | 84.2 (11.68) |
PP | 57.84 (15.94) | 60.43 (14.37) | <2.20 × 10−16a | 59.03 (15.29) |
Hypertensive patients | 102,755 (41.22%) | 114,844 (54.70%) | <2.20 × 10−16b | 217,599 (47.38%) |
Breast cancer patients | 18,589 (7.46%) | 87 (0.04%) | <2.20 × 10−16b | 18,676 (7.49%) |
Colorectal cancer patients | 3,593 (1.44%) | 4,608 (2.20%) | <2.20 × 10−16b | 8,201 (1.79%) |
KidC patients | 883 (0.40%) | 1,553 (0.85%) | <2.20 × 10−16b | 2,436 (0.60%) |
PanC patients | 654 (0.26%) | 771 (0.37%) | <2.20 × 10−16b | 1,425 (0.31%) |
Prostate cancer patients | 0 (0%) | 11,825 (5.64%) | <2.20 × 10−16b | 11,825 (2.57%) |
BMI, body mass index; SBP, systolic blood pressure; DBP, diastolic blood pressure; PP, pulse pressure.
aStatistical significance in mean difference (for normal distribution) between sexes using unpaired Student’s t test after multiple testing correction.
bStatistical significance in contingency between sexes using χ2 test after multiple testing correction; .
When comorbidPGS analysis showed significant reciprocal risk prediction, we investigated further the relationship between Po and Pt using a two-sample mendelian randomization (MR) study design [43, 44]. MR is a natural experiment that uses genetic mutations among a population to simulate a randomised controlled trial (RCT). It allows making inferences about the causal effect of an exposure X on an outcome Y, using genetic mutation(s) G as instrumental variables (IVs) to measure the exposure effect. G must follow three assumptions to be considered valid IV for MR: 1. G is strongly associated with X, 2. G is not associated with Y through a confounder, and 3. G affects Y only through X [45]. We used the same aforementioned GWASs with given SNPs, the TwoSampleMR R package [46] with inverse-variance weighted, simple mode, weighted mode, weighted median, and MR Egger methods to report our MR results. Pleiotropy and heterogeneity were assessed using the MR Egger intercept and Cochran’s Q test, respectively.
Results
Validation of comorbidPGS
We first validated comorbidPGS on the mock dataset provided, comorbidData, including 39,380 simulated subjects, three Po-PGS (BrC_PGS, T2D_PGS, LDL_PGS) and five Pt (BrC and T2D [cases/controls phenotypes]; log_LDL and BMI [continuous phenotypes]; and SBP_cat [ordered categorical phenotype]). We tested comorbidPGS on both a Windows personal computer (referred to as PC, 8 threads, 16 GB of memory) and a Linux High Performance Computing (referred to as HPC, 256 threads, 64 GB memory). We used the function multiassoc() with the same arguments [df = comorbidData, assoc_table = expand.grid(c(“brc_PGS”, “t2d_PGS”, “ldl_PGS”), c(“brc”, “t2d”, “log_ldl”, “bmi”, “sbp_cat”)), covar_col = c(“sex”, “gen_array”, “age”, “ethnicity”)] except for the parallelisation argument to test the memory used and computing time of the association tests. For 15 Po-PGS – Pt associations, 736 MB were used, and computing time (median of 10 tests reported, with standard deviation in parenthesis) was 2.69 (0.14) seconds for HPC with parallelisation, 8.88 (0.22) seconds for HPC without parallelisation, and 12.67 (0.36) seconds for PC without parallelisation. We validated comorbidPGS by verifying the expected significant associations (BrC_PGS with BrC, T2D_PGS with T2D, and LDL_PGS with log_LDL), thereby confirming the efficacy of the methodology employed in the package (online suppl. Table 4).
Validation of the Constructed PRSs
We then validated the constructed Po-PGS for SBP, DBP, PP, BrC, CrC, KidC, PanC, and PrC by checking if they were predictive for their Po in UKBB data. We used the aforementioned HPC, comorbidPGS with parallelisation, and the UKBB data. All eight associations Po-PGS – Po were significantly associated after Bonferroni correction (online suppl. Table 5; online suppl. Fig. 1). The eight tests used 736 MB of memory and took 53.24 (3.14) seconds to run.
Application of comorbidPGS to Dissect Relationship between BP Traits and Cancers
After validating both PGS and comorbidPGS methods, we tested the Po-PGS on the Pt. We ran 30 associations using HPC, comorbidPGS with parallelisation, and the UKBB data. The 30 association tests used 736 MB of memory and took 122.53 (3.20) seconds to run.
We found an association between the risk of developing KidC and higher SBP PGS (OR [95% CI] = 1.04 [1.003–1.087], p value = 4.34 × 10−2). Similarly, we observed a direct association between high SBP and PrC (OR [95% CI] = 1.02 [1.003–1.041], p value = 2.22 × 10−2). However, these two signals are only nominally associated and do not survive the stringent Bonferroni correction (p valueBonferroni = 3.33 × 10−3) (Fig. 3a, online suppl. Table 6).
Using cancer PRSs, we found a significant association between PrC PGS and higher DBP (β [SE] = 0.066 [0.017], p value = 9.64 × 10−5; p valueBonferroni = 2.50 × 10−3). Furthermore, CrC PGS is nominally associated with lower PP (β [SE] = −0.049 [0.020], p value = 1.72 × 10−2), and significantly associated at the Bonferroni threshold with lower SBP (β [SE] = −0.10 [0.029], p value = 3.83 × 10−4; p valueBonferroni = 3.33 × 10−3) and DBP (β [SE] = −0.055 [0.017], p value = 1.05 × 10−3; p valueBonferroni = 3.33 × 10−3) (Fig. 3b, online suppl. Table 7).
a The prediction of cancer through SBP, DBP, and PP PGS. p valueBonferroni = 3.33 × 10−3. b The prediction of high BP through cancer PGS. p valueBonferroni = 2.50 × 10−3. Signals with an asterisk (*) are significant after Bonferroni correction. BrC, breast cancer; CrC, colorectal cancer; KidC, kidney cancer; LungC, lung cancer; PanC, pancreatic cancer; PrC, prostate cancer; BP, blood pressure; DBP, diastolic blood pressure; PP, pulse pressure; SBP, systolic blood pressure; PGS, polygenic score.
a The prediction of cancer through SBP, DBP, and PP PGS. p valueBonferroni = 3.33 × 10−3. b The prediction of high BP through cancer PGS. p valueBonferroni = 2.50 × 10−3. Signals with an asterisk (*) are significant after Bonferroni correction. BrC, breast cancer; CrC, colorectal cancer; KidC, kidney cancer; LungC, lung cancer; PanC, pancreatic cancer; PrC, prostate cancer; BP, blood pressure; DBP, diastolic blood pressure; PP, pulse pressure; SBP, systolic blood pressure; PGS, polygenic score.
The MR follow-up analysis was performed on the significant results of comorbidPGS, namely PrC to DBP, CrC to DBP and CrC to SBP associations. We were not able to confirm or refute any causal association between those phenotypes (online suppl. Table 8; online suppl. Fig. 2). The heterogeneity in all MR was high (online suppl. Table 8), suggesting a potential horizontal pleiotropy of the IV selected.
Discussion
We developed comorbidPGS, an R package improving the evaluation of shared genetic susceptibility between (cor)related phenotypes through PGS. By combining the use of comorbidPGS with a robust PGS PLINK tool output [29], we highlighted relationships between BP traits and susceptibility to three cancers, suggesting intertwined biological mechanisms between KidC, CrC, and PrC cancers and BP regulation.
Linkage disequilibrium (LD) score serves a crucial role in investigating the genetic correlation between phenotypes [47], and MR provides evaluation of the causality between phenotypes [45]. However, there was no tool designed to assess both, the multi-morbid status and the reciprocal genetic prediction of phenotypes. The introduction of comorbidPGS fills this gap, allowing a fast and automatic association analysis between genetic predisposition and phenotypes.
This tool proves valuable in scenarios where a high genetic correlation or shared genetic loci between phenotypes are detected. It serves as an exploratory analysis to find or validate traits having shared genetics. Leveraging comorbidPGS to explore the genetic reciprocal risk prediction and comorbidity is a novel way to deepen the understanding of relationships between phenotypes beyond epidemiological observations. Moreover, the parallel processing feature of comorbidPGS considerably accelerates the delivery of results by reducing computing time, by more than threefold when testing the included mock dataset. Another scenario would be to use comorbidPGS with a linear combination of PGSs to improve prediction of a disease, as developed in the multiPRS approach [48]. Nonetheless, comorbidPGS is not a standalone analysis tool but rather should be complemented with other methodological approaches.
The MR paradigm leverages genetic mutations within a population as randomised and uses those mutations as IVs to assess whether an exposure is causal to an outcome [45]. comorbidPGS provides a reciprocal genetic risk prediction between phenotypes and can be used prior to MR to support with additional evidence the intricate relationship between phenotypes of interest. To our knowledge, despite the validity of allele scores as IV [49], there is currently a lack of tools for performing MR on individual-level data with PGS, or one-sample MR design using external PGS as IV. To address this gap, the future version of comorbidPGS will employ the assoc() function in conjunction with a two-stage least squares regression to propose an MR function.
Our study reveals that a genetic predisposition to CrC (high CrC PGS) serves as a protective factor against elevated BP, indicating that a higher genetic risk of CrC associates with lower SBP/DBP. Notably, previous large-scale epidemiological meta-analyses failed to clearly predict BP in individuals with CrC [20]. We did not identify other cancers as protective against higher BP. However, it is essential to consider potential survival bias, as there is a suggested higher risk of early CrC development among hypertensive men, coupled with increased mortality rates [50, 51]. Similar trend has been observed in other cancers [52‒54], indicating that high BP might mediate age-related decreases in survival time [55].
Epidemiological reports suggest that the rise in BP might be linked to initial medication, such as anti-VEGF monoclonal antibody, bevacizumab, in metastatic CrC [56]. However, antiangiogenic medications [57] are not reported in UKBB medication records, and this drug-induced elevation cannot be detected in our genetic data analysis. Moreover, hypertensive patients display a higher rate of partial CrC remission compared to non-hypertensive patients, hinting at bevacizumab-induced hypertension as a potential predictive marker for the efficacy of antiangiogenic treatment and subsequent survival [58]. Nevertheless, as bevacizumab is not exclusively specific to CrC [59], it alone cannot explain the observed protective effect. To our knowledge, the association between a genetic predisposition to CrC and lower SBP/DBP has not been reported previously, shedding light on pleiotropy in BP regulation and shared biological mechanisms between CrC and BP regulation. The MR follow-up analysis suggested horizontal pleiotropy of genetic instruments, we used, suggesting a need for further research to address the complex relationship between risk of CrC and BP regulation.
We report a nominal association between SBP PGS and an increased risk of PrC as well as reciprocal significant risk of elevated DBP in individuals with high PrC PGS. The MR follow-up analysis was inconclusive regarding the causality. Previous studies did not provide consistent results about the relationship between elevated BP and the risk of PrC [20, 60]. At the same time, the inverse genetic relationship between another metabolic condition, type 2 diabetes and PrC is widely acknowledged [61]. Studies supporting evidence of an association between PrC and elevated BP suggested chronic inflammation among the potential mechanisms, driving their comorbidity [62‒64]. We observed stronger supportive evidence of PrC genetic predisposition association with elevated DBP. The reasons behind the association of high DBP, rather than SBP, with systemic inflammation are yet to be elucidated [65]. It might be related to the substantial heterogeneity of prostate carcinogenesis, especially in its genetics [12, 66]. Despite inconsistent findings, BP levels are used in PrC prognosis [67] and are monitored in PrC patients, as a recent study suggested men with first-line palliative treatment had higher rates of ischaemic stroke and heart failure [68]. Further research is needed to determine if raised BP is driving PrC susceptibility, or if premalignant cells are causing the elevation of BP through inflammation.
We also detected nominal genetic association between the higher SBP PGS and the risk of KidC. However, this association lacks statistical power to detect a significant relationship, when applying Bonferroni correction for multiple testing. This limitation can potentially be due to the relatively low number of cases with either KidC or PanC within UKBB.
Our study has several limitations. BP and cancers are complex phenotypes, with many contributing factors. The present study focused on the shared genetic components between BP traits and five malignancies. We recognise the limited predictive capability of PGSs [69] and do not intend, in this paper, to suggest causal relationships between BP and the risk of cancer. We provide an overview of genetic predisposition prediction across several (cor)related conditions to mechanistically dissect genetic variability contributing to their comorbidity. Further research is needed to enhance the understanding of the comorbidity between cancers and hypertension, such as fine mapping of the associated phenotypes, we studied, followed by MR and colocalisation analysis [70]. Furthermore, the heterogeneity of those complex traits can lead to under or overestimates for some associations (e.g., for CrC [58]). The relatively small sample sizes for KidC and PanC in both the UKBB and the GWAS also limit power of the association analysis for such relatively rare diseases. Finally, we acknowledge the well-documented “healthy volunteer” bias within the UKBB [71‒73], that can impede our capacity to achieve statistically powered associations in this study. Overall, the new R package comorbidPGS will support further seamless and user-friendly exploration into the genetic underpinnings of comorbid conditions.
Acknowledgments
The authors thank all the investigators from different consortia that built and shared the GWAS meta-analysis summary statistics used in this study, as well as the UK Biobank participants and dedicated staff. The authors would like to extend their special thanks to Mickaël Canouil, whose valuable advice on R proved instrumental.
Statement of Ethics
This research has been conducted using the UK Biobank Resource under Application Number 37685. A Statement of Ethics is not applicable because this study is based exclusively on published literature.
Conflict of Interest Statement
There are no competing interests.
Funding Sources
This project was in part funded by the World Cancer Research Fund (WCRF UK) and World Cancer Research Fund International (2017/1641), Diabetes UK (BDA number: 20/0006307), the European Union’s Horizon 2020 research and innovation programme (LONGITOOLS, H2020-SC1-2019–874739), Agence Nationale de la Recherche (PreciDIAB, ANR-18-IBHU-0001), by the European Union through the “Fonds européen de développement regional” (FEDER), by the “Conseil Régional des Hauts-de-France” (Hauts-de-France Regional Council) and by the “Métropole Européenne de Lille” (MEL, European Metropolis of Lille).
Author Contributions
V. Pascat developed comorbidPGS. I. Prokopenko designed the research project. V. Pascat, M. Kaakinen, A. Ulrich, L. Zudina contributed to the study design for blood pressure traits and cancer comorbidity analyses. Z. Balkhiyarova, M. Kaakinen, defined the phenotypes of interest in the UK BioBank dataset. J. Gichohi, I. Pupko and A. Demirkan provided and formatted the curated publicly available GWAS datasets. A. Bonnefond and P. Froguel contributed to evaluation of the results. V. Pascat, and I. Prokopenko led the manuscript writing. All authors contributed to manuscript writing.
Data Availability Statement
The R package comorbidPGS and its user guide are available on GitHub https://github.com/VP-biostat/comorbidPGS and CRAN https://cran.r-project.org/package=comorbidPGS. Further enquiries can be directed to the corresponding authors.