Abstract
Introduction: Although breast and prostate cancers arise in different organs and are more frequent in the opposite sex, multiple studies have reported an association between their family history. Analysis of single nucleotide polymorphism data, based on distant relatives, has revealed a small positive genetic correlation between these cancers explained by common variants. The estimate of genetic correlation based on close relatives reveals the extent to which shared genetic risks are explained by both common and rare variants. This estimate is unknown for breast and prostate cancer. Method: We estimated the relative risks, heritability, and genetic correlation of breast cancer and prostate cancer based on the Minnesota Breast and Prostate Cancer Study, a family study of 141 families ascertained for breast cancer. Results: Heritability of breast cancer was 0.34 (95% credible interval: 0.23–0.49) and 0.65 (95% credible interval: 0.36–0.97) for prostate cancer, and the genetic correlation was 0.23. In terms of odds ratios, these values correspond to a 1.3 times higher odds of breast cancer among probands, given that the brother has prostate cancer. Conclusion: This study shows the inherent relation between prostate cancer and breast cancer; an incident of one in a family increases the risk of developing the other. The large difference between estimates of genetic correlation from distant and close relatives, if replicated, suggests that rare variants contribute to the shared genetic risk of breast and prostate cancer. However, the difference could stem from genotype-by-family effects shared between the two types of cancers.
Introduction
Breast cancer is the most common cancer among women. It can occur among women and men but is more common among women. The lifetime risk of a woman in the United States developing breast cancer is about 13% [1]. Prostate cancer is the most common cancer among men in the US [2]. The incidence rate of prostate cancer increases with age, reaching 1 in every 52 in males between the age of 50–59 years [3, 4].
It is believed that both genetic and environmental factors contribute to the risk of developing breast and prostate cancer. The heritability of breast cancer is estimated to be 18–27% [5, 6], whereas the heritability of prostate cancer is around 42% [5]. Breast cancer and prostate cancer are multifactorial disorders, and among men, they are rarely reported in the same individual. However, there is growing evidence for family clustering of these two types of cancers. A meta-analysis of 18 studies demonstrated reliable evidence that the history of female breast cancer in first-degree relatives was associated with an increased risk of prostate cancer (relative risk 1.18; 95% CI, 1.12–1.25) [7]. In another study, men with familial breast cancer had a 21% higher risk of prostate cancer (95% CI, 1.10–1.34) [8]. In this study, the risk of prostate cancer was higher when there was a history of both prostate and breast cancers in the family. These studies suggest a shared genetic etiology between breast cancer and prostate cancer. The shared etiology between these cancers can be due to pleiotropy, i.e., a genetic variation that impacts the risk for two or more disorders, or genetic correlation. For an individual to be at increased genetic risk for two diseases, the pleiotropic variation must be aligned; that is to say, the disorders must be positively genetically correlated. Indeed, using genome-wide association study summary statistics of distant relatives, a positive genetic correlation between breast cancer and prostate cancer has been reported (genetic correlation = 0.072, p < 0.05) [9]. Estimate of genetic correlation from distant relatives reveals to what extent shared genetic risks are explained by common variants, while such estimate from close relatives reveals to what extent the risks are explained by both common and rare variants.
In this study, we estimated the relative risk, heritability, and genetic correlation between breast cancer and prostate cancer using both close and distant relatives. The estimate of heritability and genetic correlation using close and distant relatives enables us to estimate the joint effects of genetic correlation and the environment. In addition, by comparing the single nucleotide polymorphism (SNP)-based heritability (distant relatives) with family-based heritability (close relatives), we conjugate if rare variants contribute to the shared genetic risk of breast and prostate cancer. However, the difference between estimates of genetic correlation from distant and close relatives could stem from gene-environment interactions.
Understanding the shared genetic etiology of breast and prostate cancers will contribute to novel insights into the etiology of each specific disease, leading to the development of improved preventive measures. In the absence of such knowledge, a full realization of the therapeutic potentials of preventive measures will likely remain difficult.
Materials and Methods
Study Population
We used data from the Minnesota Breast and Prostate Cancer Study available in the kinship2 package [10] in R. The data were collected as follows. Between 1944 and 1952, a family study of breast cancer was initiated at the Dight Institute for Human Genetics at the University of Minnesota by Anderson and colleagues [11] (1960) to investigate whether relatives of breast cancer patients may see an increase in their risk of developing cancer. Sellers et al. [12] (1995) and Sellers et al. [13] (1999) performed a follow-up study to examine the heredity of breast cancer risk. Families were contacted to extend the number of relatives in the analysis [12, 13]. A total of 118 families were excluded, prior to the construction of the dataset, due to little or no information regarding relatives (for further details, see [13]). The subjects selected to be part of the study were the proband’s relatives.
The study was further extended by Grabrick et al. [14] (2003); the authors collected prostate cancer data in a subset of families of the original breast cancer study via questionnaires. The questionnaires were sent to men over 40 years, and 118 incidents of prostate cancer were found. Here, we will refer to this substudy as the Minnesota Breast and Prostate Cancer Study (MBPCS). The data have been used in several studies of breast and prostate cancer [12]. The data include 141 families and are described in more detail in Table 1.
Outcomes
We had binary (dichotomous) outcomes for two variables: breast and prostate cancer diagnosis from minbreast data.
Statistical Analysis
To investigate the co-inheritance of breast and prostate cancer, we used Fisher’s infinitesimal model which assumes that a large number of genes, each having a small effect, affects the probability of having breast and prostate cancer [18]. The binary outcome (response variable) was fitted using a linear mixed model under Gaussian assumptions and using the liability threshold model described below.
To estimate the heritability of a single phenotype (either breast cancer or prostate cancer), we used the following linear mixed model [18]:
Y = μ + Z b + e(Eq. 1)
where Y is the N-vector of binary outcomes. μ is the intercept term, Z is the N × q design matrix for estimating the genetic random effects for each individual, q is the total number of individuals in the pedigree, b is a q-vector of random genetic effects, and e is a vector of residuals. The two vectors b and e are independent and multivariate normal with b ∼ N(0,Aσ2b) and e ∼ N(0,σ2I), where σ2 is the residual variance, I is the identity matrix, σ2b is the variance of the genetic effects, and A is the relationship matrix [19], with the elements:
aii = 1
aij = 0.5 × (amother of i,j + afather of i,j)
The narrow-sense heritability is h2 = σ2b/(σ2b+ σ2), which affects how often cancer is inherited within families compared to between families. For large h2, individuals with cancer are clustered within families and cancer is found more frequently among close relatives.
Bivariate Model for Breast and Prostate Cancer
We used a bivariate linear mixed model to estimate the incidents of both breast and prostate cancer. The bivariate response variable Ybp is of length 2N and is equal to (YbYp), where Yb is the vector of outcomes for breast cancer, and Yp is the vector of outcomes for prostate cancer. Here, the vector of random genetic effects is bbp that is twice as long as the random effect b in Eq. 1, and with residuals ebp.bbp and ebp are assumed to be multivariate normally distributed with
where G (size 2q × 2q) and R (size 2N × 2N) are the (co)variance matrices of the random effects and residuals, respectively. From Eq. 2, it can be noted that the residuals are assumed to be independent of the genetic effects; a standard assumption to enable estimation of variance components for both genetic effects and residuals in pedigree studies [20]. However, the genetic effects are allowed to be correlated between traits
and the residuals can also be correlated between traits
The genetic correlation ρ is a measure of how often breast cancer and prostate cancer are inherited together among relatives, whereas σ2b1 and σ2b2 are the genetic variances for breast and prostate cancer, respectively. Furthermore, σ21 and σ22 are the residual variances for the two traits, and r is the correlation between the residuals of the two traits.
Bayesian Inference Using MCMCglmm
We used a Bayesian framework to estimate the variance components. Then, we calculated the mean of the posterior as the estimate of the variance components. We reported the results with 95% credible intervals (CrI), using Bayesian highest posterior density interval, which is analogous to two-sided 95% confidence intervals in frequentist statistics. We used the MCMCglmm package to fit the bivariate linear mixed model [21]. This package is suitable for the analysis because it can deal with large pedigree data, and it provides the estimates of the correlation parameters ρ and r. The package uses Markov Chain Monte Carlo (MCMC) methods combined with Gibbs sampling, slice sampling, and Metropolis-Hastings updates. The prior for the intercept term is Gaussian, whereas inverse-Wishart priors are used for the variance and covariance parameters.
The bivariate model was fitted using 600,000 iterations in the MCMC algorithm. The MCMC sample size after thinning (every 1,000 iterations) and removing the burn-in period of 3,000 was 597. Thinning was set to a large value (1,000) to ensure an autocorrelation of less than 0.1 (for more details, see [22]). Four independent MCMC chains were run to assess the stability of the results.
Liability Threshold Model
The liability threshold model is the most common approach for analyzing binary phenotypes where each individual has a hypothetical continuous liability composed of latent genetic and environmental factors [23]. Using the liability threshold model, it is possible to analyze the data based on the Gaussian assumption and then transform the estimates to those expected from a binomial probit model in a generalized linear mixed model (GLMM). In other words, the liability threshold model is mathematically equivalent to a probit-like risk model [24, 25].
Let h20 be the heritability estimate obtained from the Gaussian linear mixed model with observed binary outcomes. The expected heritability on the underlying scale h21 under probit GLMM assumptions is (Fig. 1):
where K is the proportion of cancer in the dataset, and z is the height of the standard normal probability density function at the threshold cutoff point in the probit model. The genetic correlation ρ is unaffected by the transformation from the observed to the underlying scale [26].
Calculations of Odds Ratios and Relative Risk for Pairs of Female-Male Relatives
The estimates of fixed effects (covariates) and random effects from the linear mixed model can be used to calculate odds ratios and relative risks. Given these parameters, the underlying bivariate normal distribution (of having breast/prostate cancer) for pairs of related male-female individuals is known. The correlation of this bivariate distribution is cρhl,bchl,pc, where c is the coefficient of relationship, ρ is the genetic correlation, hl,bc is the square root of the heritability of breast cancer on the underlying liability scale, and hl,pc is the square root of the heritability of prostate cancer on the underlying liability scale. Furthermore, the underlying bivariate normal distribution has unit marginal variances (following the assumptions of the probit model), and the two means in the bivariate distribution are given by the cancer proportions in the population.
We estimated the following four probabilities by calculating the proportion of the density distribution in the four quadrants of a two-dimensional coordinate system (Fig. 2):
Fitted bivariate underlying distribution for male-female relatives with a coefficient of relationship of 0.5 (e.g., full siblings, mother-son, father-daughter). The liability of getting breast cancer is shown on the y-axis and the liability of prostate cancer on the x-axis. The four proportions of observed cancer cases are: P(neither breast nor prostate cancer) = P0 equal to the proportion of the bivariate distribution in the bottom-left quadrant, P(both breast and prostate cancer) = P12 in the top-right quadrant, P(breast cancer but no prostate cancer) = P1 in the top-left quadrant, and P(no breast cancer but prostate cancer) = P2 in the bottom-right quadrant. The odds ratio of getting cancer given the relative’s status is (P0 × P12)/(P1 × P2).
Fitted bivariate underlying distribution for male-female relatives with a coefficient of relationship of 0.5 (e.g., full siblings, mother-son, father-daughter). The liability of getting breast cancer is shown on the y-axis and the liability of prostate cancer on the x-axis. The four proportions of observed cancer cases are: P(neither breast nor prostate cancer) = P0 equal to the proportion of the bivariate distribution in the bottom-left quadrant, P(both breast and prostate cancer) = P12 in the top-right quadrant, P(breast cancer but no prostate cancer) = P1 in the top-left quadrant, and P(no breast cancer but prostate cancer) = P2 in the bottom-right quadrant. The odds ratio of getting cancer given the relative’s status is (P0 × P12)/(P1 × P2).
P0 = P(neither breast nor prostate cancer),
P12 = P(both breast and prostate cancer),
P1 = P(breast cancer but no prostate cancer),
P2 = P(no breast cancer but prostate cancer).
The odds ratio of getting cancer given the relative’s status is
and the relative risk of breast cancer given the status of the male relative is
and the relative risk of prostate cancer given the status of the female relative is
The R code for the bivariate model, computing the odds ratios, and the relative risks are available at our GitHub repository: https://github.com/Adrcalvo/Genetic_correlation_between_breast_and_prostate_cancer.
Results
The cohort contained 141 families with 11,474 individuals, of which 126 (1.10%) were diagnosed with prostate cancer, and 495 (4.31%) were diagnosed with breast cancer. There were 1,147 (10%) individuals with a missing value for sex. Families contained information for individuals up to five generations. Table 1 describes the characteristics of different family types.
The bivariate linear mixed model was fitted to the MBPCS data. Detailed results with trace plots of the MCMC chain are presented in online supplementary material 1 (see www.karger.com/doi/10.1159/000521215 for all online suppl. material). Results for univariate analyses of prostate and breast cancer cases are presented in online supplementary material 2, showing that similar heritability estimates were obtained from the univariate analyses.
The estimate of the heritability (posterior means) for breast cancer and prostate cancer on the observed scale was 0.095 and 0.031, respectively. The 95% CrI for breast and prostate cancer on the observed scale was (95% CrI, 0.085–0.106) and (95% CrI, 0.024–0.037), respectively. In the liability scale, we estimated heritability of 0.34 (95% CrI, 0.23–0.49) for breast cancer and 0.65 (95% CrI, 0.36–0.97) for prostate cancer. In summary, the estimated heritability was more than three times as large for breast cancer compared to prostate cancer on the observed scale, whereas the estimates on the liability scale were almost twice as large for prostate cancer compared to breast cancer. The reason for this discrepancy is that the observed scale is highly affected by the incidence rate of the two types of cancers, whereas this is corrected for on the liability scale (see Eq. 5). The estimates on the liability scale (breast cancer 0.34; prostate cancer 0.65) are, therefore, the ones reflecting the true narrow-sense heritability for the two traits.
We observed a genetic covariance of 0.0014 (95% CrI, –0.0004 to 0.0034) corresponding to a genetic correlation of 0.23. Over 95% of the sampled posterior distribution was greater than zero for the genetic covariance (Fig. 3). Therefore, we observed reliable evidence that there was a positive genetic correlation between the two cancers.
Odds Ratios and Relative Risks for Pairs of Relatives
We estimated the probability of developing breast and prostate cancer for a female-male pair of individuals from the underlying bivariate distribution. The underlying distribution and the observed proportions are shown in Figure 2 for male-female relatives with a coefficient of relationship of 0.5 (full siblings).
These probabilities depend on the two estimated heritabilities and the genetic correlation. Consequently, these estimated values can be summarized in terms of odds ratios and relative risks, which are easier to interpret in practice. We observed a higher odds ratio for the closer relative (Table 2). For instance, females with a full sibling brother (coefficients of relationship = 0.5) with prostate cancer had an odds ratio of 1.276 for getting breast cancer. For a female with a male cousin (coefficients of relationship = 0.125) with prostate cancer, the odds ratio of getting breast cancer was 1.06. The credibility interval of the odds ratio was a direct function of the credibility interval of the heritability estimate but was not feasible to compute since it depends on a bivariate integral.
Discussion
In this study, we investigated the association between prostate cancer and breast cancer using MBPCS data. We observed 1.28 times higher odds of breast cancer among probands with a brother with prostate cancer. Using multivariate GLMM and under the liability threshold assumption, we estimated 65% (95% CrI, 36–97%) and 34% (95% CrI, 23–49%) of the variance in risk of prostate cancer and breast cancer, respectively, were due to direct additive genetics (narrow-sense heritability). Other studies reported a similar estimate of heritabilities. In particular, a large Nordic study by Lichtenstein et al. [5] reported 57% heritability for breast cancer and 31% for prostate cancer [7].
We observed reliable evidence for a genetic correlation of 0.23 between the two types of cancers. The analysis gives strong evidence for a large positive genetic correlation between the two types of cancers because over 95% of the sampled posterior distribution was greater than zero for the genetic covariance (Fig. 3). Our estimate of the genetic correlation, using family-based data, was three-fold higher than the previously estimated SNP-based genetic correlation (0.21 vs. 0.072) [9]. The difference in these estimates suggests that rare variants contribute to the shared genetic risk of breast and prostate cancer. However, this statement should be interpreted with caution since the SNP-based genetic correlation may differ from family-based genetic correlation due to other factors (e.g., genotype-by-family effects shared between the two types of cancers).
Rare variants, in particular protein-truncating variants (PTVs), tend to have larger effect sizes and dramatically change gene expression and function. Studies have shown that rare PTVs, in general, are more damaging than common PTVs. For example, for breast cancer, findings suggest that rare and common breast cancer susceptibility loci are differentially associated with tumor characteristics and survival [27]. Similarly, for prostate cancer, a differential burden of rare variants is identified between metastatic and nonaggressive cases [28]. Due to higher genetic correlations among relatives, it is more likely that rare damaging variants linked to breast cancer are observed in an individual if someone in the family has been affected with breast cancer. And due to the genetic correlation between breast and prostate cancer, then, it is more likely that rare damaging variants linked to breast cancer are observed in an individual if someone in the family has been affected with prostate cancer. The same argument is valid for prostate cancer.
Understanding the genetic correlation between prostate and breast cancer will eventually pave the way towards a better understanding of the causality relationships between these two types of cancers. We also presented an interpretation of the estimated heritabilities and genetic correlation in terms of odds ratios and relative risk, which we hope will be useful as guidelines for applied researchers and practitioners (Table 2). We observed that the odds of developing breast cancer was increased by 28%, 13%, and 6% if a brother, half-sibling, or cousin had prostate cancer, respectively.
Polygenic risk scores aggregate the effects of many genetic variants across the human genome and have been proposed as a tool of clinical medicine to measure a person’s genetic risk of a certain disorder. However, polygenic risk scores are still not ready for clinical use and further research is needed in this area. A recent study reported that the information of relatives would substantially increase the efficiency of genetic prediction [29].
The current study has several strengths. Although relative risk and heritability of breast and prostate cancer were estimated in previous studies based on both family-based and SNP-based data (distant relatives), to the best of our knowledge, genetic correlation has previously only been estimated based on SNP-based data (distant relatives). For the first time, we report the family-based estimation of the genetic correlation. Another strength of this work is the use of robust statistical methodologies to estimate heritability and genetic correlation. Different approaches to estimating the heritability of disease with dichotomous variables have been suggested [30]. The most common approach in human genetics, Falconer’s method, is based on comparing the tetrachoric correlations from relatives to a random sample from the general population. We used GLMM, which is a more flexible model and can treat complex pedigrees of variable size and structure to estimate the heritabilities and the genetic correlations of multiple traits [29].
The results of this study should be interpreted in the context of some limitations. First, the data are likely right-censored (some individuals may have received a diagnosis after study follow-up, and, therefore, these diagnoses would be missing from the data), in particular for individuals born in the later years of the study. Right-censored data could contribute to underdiagnoses of breast and prostate cancer and decreases the observed prevalence. Second, the individuals in the first generation might, to some degree, be selected based on cancer prevalence within families, which would, in that case, decrease the genetic variation among these individuals and bias our heritability estimates downwards. However, the bias is likely to be small since the selection of first-generation individuals was not very strong. Thirdly, this study lacks an analysis of environmental and lifestyle risk factors. Future studies with larger samples will be critical to elucidate further the role of genetics and environmental factors in the risk of prostate and breast cancer.
Conclusion
This study shed light on the association between prostate cancer and breast cancer; an incident of breast or prostate cancer in a family increases the risk of developing the other. The findings underline the importance of shared genetic variation as a risk factor for breast and prostate cancer and the importance of heritability-based analysis to understand their etiology. Therefore, it is important to ask patients – men and women – about the history of both breast cancer and prostate cancer in the family. These findings have translational relevance for cancer risk prediction in men and women. Patients with a high risk of developing cancer should be monitored more frequently, thereby increasing the likelihood of early detection.
Statement of Ethics
The data were publicly available and did not require ethical approval.
Conflict of Interest Statement
The authors do not have any conflict of interest to declare.
Funding Sources
No funding was used for this study.
Author Contributions
L.R. and A.C.C. designed the study and performed the data analysis. All authors contributed to writing the manuscript, study conception, interpretation of data, and critical revision of the manuscript for important intellectual content.
Data Availability Statement
The data are publicly available via GitHub: https://rdrr.io/github/mayoverse/kinship2/man/minnbreast.html.