Background: Many cancer types show considerable heritability, and extensive research has been done to identify germline susceptibility variants. Linkage studies have discovered many rare high-risk variants, and genome-wide association studies (GWAS) have discovered many common low-risk variants. However, it is believed that a considerable proportion of the heritability of cancer remains unexplained by known susceptibility variants. The “rare variant hypothesis” proposes that much of the missing heritability lies in rare variants that cannot reliably be detected by linkage analysis or GWAS. Until recently, high sequencing costs have precluded extensive surveys of rare variants, but technological advances have now made it possible to analyze rare variants on a much greater scale. Objectives: In this study, we investigated associations between rare variants and 14 cancer types. Methods: We ran association tests using whole-exome sequencing data from The Cancer Genome Atlas (TCGA) and validated the findings using data from the Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG). Results: We identified four significant associations in TCGA, only one of which was replicated in PCAWG (BRCA1 and ovarian cancer). Conclusions: Our results provide little evidence in favor of the rare variant hypothesis. Much larger sample sizes may be needed to detect undiscovered rare cancer variants.

Germline variants can lead to an increased risk of cancer. A large twin study estimated the heritability of cancer (proportion of variance in cancer risk due to genetic factors) to be 33% and found significant heritability for skin, prostate, ovarian, kidney, breast, and uterine cancer, with estimates ranging from 27% for uterine cancer to 58% for melanoma [1]. Extensive research has been done to identify susceptibility variants, with important discoveries emerging from genetic linkage and genome-wide association studies (GWAS). Linkage analysis of familial cancer has identified many cancer susceptibility genes with rare high-penetrance variants, including variants in the breast and ovarian cancer genes BRCA1 and BRCA2 [2, 3]. GWAS, which typically focus on common single-nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) >1%, have discovered many low-penetrance variants [4]. To date, over 100 moderate- to high-risk cancer susceptibility genes [5] and over 1,000 low-risk variants [6] have been identified. However, much of the heritability of cancer remains unexplained by known variants [7, 8]. For example, established susceptibility genes have been estimated to account for less than 25% of the familial risk of breast cancer [9] while low-risk SNPs identified through GWAS account for 18% of the familial risk of breast cancer [10].

The “rare variant hypothesis” proposes that some of the missing heritability lies in rare variants of moderate to high risk that are not penetrant enough to be detected by linkage analysis and not prevalent enough to be genotyped or imputed using GWAS arrays [11, 12]. The rationale is that variants conferring substantial disease risk are likely to be rare due to negative selection [11]. While many theory- and simulation-based arguments have been made for and against the hypothesis [13], empirical evidence is lacking in most disease settings [14, 15]. Until recently, the resources required to sequence large numbers of individuals have precluded extensive surveys of rare variants. However, advances in next-generation sequencing have now made it possible to conduct much more comprehensive analyses of rare variants. Large-scale pan-cancer sequencing projects, including The Cancer Genome Atlas (TCGA) and the Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG), have generated datasets of thousands of deeply sequenced exomes and genomes, providing new opportunities to study the role of rare variants in cancer susceptibility. While current sample sizes typically provide insufficient power to characterize the effects of individual rare variants, rare variant association methods address this issue by aggregating variants by gene or by other features [16].

A limited number of studies have evaluated the contribution of rare germline variants to cancer risk using a systematic exome-wide or genome-wide approach rather than targeted analysis of candidate genes or variants, and few of these studies have had large sample sizes. A recent review [17] of whole-exome and whole-genome studies of genetic susceptibility to cancer through 2018 found 16 case-control discovery studies that did not restrict the analyses to a priori candidate loci and that included more than 100 cases [18-33]. These studies, which each focused on a single cancer type, found varying amounts of evidence for rare variant associations. Some analyses yielded no statistically significant associations [20, 29, 30, 31] while others discovered and replicated novel associations [19, 23, 25, 26, 28, 32]. Two exome-wide studies published in 2020 [34, 35] used UK Biobank data to analyze rare variants in 49,960 individuals against thousands of phenotypes, including various types of cancer. Neither of the studies found any novel statistically significant associations for the cancer phenotypes. Additionally, a few pan-cancer whole-exome studies have examined the tumor site specificity of germline variants, but they have focused on either common variants [36] or variants restricted to a few hundred established cancer susceptibility genes [37-39].

In this study, we investigated the rare variant hypothesis through a pan-cancer analysis of rare germline variants (MAF ≤0.1%) across the entire exome, using a case-only approach to evaluate associations with 14 cancer types. That is, we compared relative frequencies of rare variants per gene in each cancer with a control group consisting of an ensemble of other cancer types for which specimens were evaluated at the same genotyping center(s). We ran four types of gene-based association tests on whole-exome sequencing data from 8,719 individuals with cancer from TCGA: the weighted burden test [40], the sequence kernel association test (SKAT) [41], the optimal unified test (SKAT-O) [42], and the optimally weighted combination of variants test (TOW) [43]. We then validated the significant gene-cancer associations in an independent dataset of 2,522 individuals with cancer from PCAWG.

Following the data access procedures described in Huang et al. [38] and ICGC/TCGA 2020 [44], we downloaded variant call format (VCF) files containing whole-exome germline variants for 10,389 TCGA participants and whole-genome germline variants for 2,583 PCAWG participants. The variant calling and quality control pipelines for these datasets are described in the corresponding papers [38, 44]. We excluded variants that did not pass the quality filters from the VCF files. In the TCGA data, we additionally excluded SNPs with FS >60, QD <2, SOR >3, MQ <40, MQRankSum <–12.5, ReadPosRankSum <–8, or BaseQRankSum <–3, indels with FS >200, QD <2, ReadPosRankSum <–20, or BaseQRankSum <–3, and genotype calls with read depth <10 or >1,000. Also, since the TCGA samples were not all sequenced with the same exome capture kit (online suppl. Table S1; for all online suppl. Material, see www.karger.com/doi/10.1159/000519355), we restricted the TCGA variants to those in the intersection of the capture regions across the six kits that were used (with a 100-bp buffer around each capture region).

TCGA clinical and demographic data were downloaded from the Genomic Data Commons Data Portal via the TCGABiolinks R package, while PCAWG clinical and demographic data were obtained from online supplementary Table 1 of ICGC/TCGA 2020 [44]. We excluded individuals with a prior malignancy at an unknown site or a different site than the primary tumor. Since some TCGA participants also contributed samples to PCAWG, we excluded the overlapping individuals from the TCGA discovery dataset. We also excluded TCGA participants with mesothelioma because it was the only cancer type sequenced at the Sanger Institute and we decided to only compare cancer types sequenced at the same center(s) to reduce the impact of technical variation. The final discovery dataset consisted of 8,719 samples representing 32 cancer types and the final validation dataset consisted of 2,522 samples representing 37 cancer types. We only performed association tests for the 14 cancer types that had at least 300 TCGA cases, but the remaining cancer types were still included in the analyses as “controls” (more details on the definition of the control groups are provided later). The 14 selected cancer types were breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), clear cell renal cell carcinoma (KIRC), low-grade glioma (LGG), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), prostate adenocarcinoma (PRAD), skin cutaneous melanoma (SKCM), stomach adenocarcinoma (STAD), thyroid carcinoma (THCA), and uterine corpus endometrial carcinoma (UCEC). In the TCGA dataset, the number of cases ranged from 316 for KIRC to 918 for BRCA, while in the PCAWG dataset, the number of cases ranged from 18 for LGG to 206 for BRCA (Table 1). Most of the samples were from individuals of European ancestry (83% of TCGA samples and 77% of PCAWG samples). Also, most of the samples were obtained from blood (84% of TCGA samples and 78% of PCAWG samples). A small proportion (11%) of TCGA samples underwent whole genome amplification (WGA) prior to sequencing. Additional details on the cohorts are provided in online supplementary Tables S1 and S2.

Table 1.

TCGA and PCAWG sample sizes for the 14 cancer types against which we ran association tests

TCGA and PCAWG sample sizes for the 14 cancer types against which we ran association tests
TCGA and PCAWG sample sizes for the 14 cancer types against which we ran association tests

Our analyses focused on rare exonic variants, defined as variants with MAF ≤0.1% in the Genome Aggregation Database (gnomAD) v2.1.1 [45], which has compiled and harmonized variants from 125,748 germline exomes from various population genetic and disease-specific studies, including TCGA studies. We excluded variants absent from gnomAD, variants flagged as low-quality by gnomAD, and variants in low-complexity/segmental duplication/decoy regions. We performed an analysis with functional variants as well as an analysis further restricted to loss-of-function variants. Functional variants were defined as variants predicted to have a high or moderate impact on protein function based on the Ensembl Variant Effect Predictor (VEP) [46], and loss-of-function variants were defined as variants predicted to have a high impact based on VEP. As a sensitivity analysis, we repeated the analyses using a MAF threshold of 1%.

In the TCGA dataset, the median number of qualifying functional variants per sample was 125, with interquartile range (IQR) 109–160, the median number of qualifying loss-of-function variants per sample was 8 (IQR 6–11), the median number of unique qualifying functional variants per gene was 30 (IQR 16–52), and the median number of unique qualifying loss-of-function variants per gene was 2 (IQR 1–5) (online suppl. Fig. S1). In the PCAWG dataset, the median number of qualifying functional variants per sample was 179 (IQR 157–240), the median number of qualifying loss-of-function variants per sample was 11 (IQR 8–14), the median number of unique qualifying functional variants per gene was 14 (IQR 7–25), and the median number of unique qualifying loss-of-function variants per gene was 2 (IQR 1–3). Coverage information and variant counts by cancer type are provided in online supplementary Tables S3 and S4. The overall distributions of variants per sample and carrier frequencies per gene did not vary substantially by cancer type (online suppl. Fig. S2–5).

To identify associations between rare variants and specific cancer types, we ran gene-based burden [40], SKAT [41], SKAT-O [42], and TOW tests [43]. We ran the first three tests using the SkatBinary function from the SKAT R package, weighting variants by their inverse variance, 1/(MAF(1-MAF)), as in Madsen and Browning’s weighted burden test [40]. The code for running the TOW test was adapted from https://pages.mtu.edu/~shuzhang/software/TOW-script.R. We first ran the TOW test using 10,000 permutations to calculate each p value, then increased the number of permutations to up to 100 million permutations for small p values (more details are provided in the online supplementary material). For each of the 14 selected cancer types, we used a one-versus-rest approach, treating individuals with the cancer of interest as cases and all other individuals as “controls.” For OV and UCEC, we restricted the analyses to women, and for PRAD, we restricted the analyses to men. To minimize the effects of technical variation and population stratification, we stratified on and adjusted for a few covariates, as follows. Heterogeneous workflows were used to generate the TCGA whole-exome sequencing data [47] and previous studies have observed batch effects related to sequencing center and WGA [47, 48], so in our TCGA analyses, we restricted the control group to samples whose sequencing center and WGA status were present among the case samples. We also restricted the control group to samples whose specimen type (blood, tissue, buccal, or bone marrow) was present among the case samples and adjusted for center/specimen type/WGA when they were non-constant among the case samples. Furthermore, we adjusted for the top five principal components based on a set of 37,501 linkage disequilibrium-pruned common SNPs (details provided in the online suppl. supplementary Material). In our PCAWG analyses, we restricted the control group to samples whose specimen type was present among the case samples. We adjusted for specimen type when it was non-constant among the case samples, as well as the top five principal components based on same set of SNPs used in the TCGA analysis. The numbers of cases and controls for the 14 selected cancer types are given in Table 1.

The significant results from the main analysis based on a MAF threshold of 0.1% are summarized in Tables 2, 3 and quantile-quantile plots for the p values from the tests are provided in online supplementary Figures S6–S7. In our analysis of functional variants, we applied each testing method to 16,537 genes, resulting in tests for 227,364 gene-cancer pairs (we did not run tests for gene-cancer pairs where none of the TCGA cases or controls had a qualifying variant). Applying a Bonferroni cut-off of 0.05/227364 = 2.2e-7 to the TCGA results from each type of test, we found four statistically significant gene-cancer associations (Table 2): BRCA1-OV was significant based on the burden test and SKAT-O, CCNL2-OV was significant based on SKAT, CETN1-OV was significant based on all tests except for the burden test, and RIMS2-KIRC was significant based on SKAT-O. The direction of association (based on the burden model, which collapses all variants within a gene into a single score that is regressed on the outcome) was positive for all four gene-cancer pairs. In the PCAWG validation dataset, only one of the gene-cancer pairs, BRCA1-OV, a well-established association, showed a direction of association consistent with the TCGA results and achieved statistically significant replication (based on a cut-off of 0.05). Though CETN1-OV has been studied to a limited extent with suggestive but inconclusive results [49], in the PCAWG analysis subset for CETN1-OV, only one donor (who did not have ovarian cancer) had a qualifying CETN1 variant (Table 3) and the association was not validated.

Table 2.

Gene-cancer pairs with at least one significant test result from the TCGA discovery analysis

Gene-cancer pairs with at least one significant test result from the TCGA discovery analysis
Gene-cancer pairs with at least one significant test result from the TCGA discovery analysis
Table 3.

Numbers of variants in cases and controls, stratified by sequencing center and dataset, for the significant gene-cancer pairs from the TCGA analysis (Table 2)

Numbers of variants in cases and controls, stratified by sequencing center and dataset, for the significant gene-cancer pairs from the TCGA analysis (Table 2)
Numbers of variants in cases and controls, stratified by sequencing center and dataset, for the significant gene-cancer pairs from the TCGA analysis (Table 2)

In our analysis of loss-of-function variants, we ran burden, SKAT, SKAT-O, and TOW tests for 12,283 genes, resulting in tests for 138,316 gene-cancer pairs. Applying a Bonferroni cut-off of 0.05/138316 = 3.6e-7 to the TCGA results, we found two significant gene-cancer associations (Table 2), BRCA1-OV and CETN1-OV, both of which were also significant in the TCGA functional variant analysis. Again, BRCA1-OV achieved replication in the PCAWG dataset with respect to both direction of association and significance. It was not possible to validate CETN1-OV in the PCAWG dataset because no donors had a qualifying loss-of-function variant in CETN1.

Finally, we repeated the analyses of functional and loss-of-function variants using a MAF threshold of 1% instead of 0.1% (online suppl. Tables S5–S6 and Figures S8–S9). In the analysis of functional variants, three additional gene-cancer pairs reached significance in the TCGA dataset. Two of these pairs, MTUS1-STAD and PLIN4-KIRC, showed the same direction of association in PCAWG as in TCGA, but were not significant in PCAWG, while the other pair failed to replicate with respect to either direction of association or significance (online suppl. Table S5). There were no changes to the loss-of-function findings.

Overall, our results provide little, if any, evidence in favor of the rare variant hypothesis. No novel associations were identified that achieved replication in the validation dataset. In fact, out of the four significant gene-cancer pairs identified in the discovery analysis, three of them did not even show the same direction of association in the validation analysis.

The lack of findings may be due to low statistical power. To investigate how likely it is to detect a significant gene-based association given the sample sizes in our analyses and various assumptions about genetic architecture, we performed analytic power calculations for the burden, SKAT, and SKAT-O tests using the Power_Logistic_R function from the SKAT R package. We focused on these three tests for the power calculations because their power can be calculated analytically [50]. A previous simulation study showed that TOW is more powerful than the burden test and SKAT in various scenarios with rare causal variants [43]. The parameters of the power calculation include the number of qualifying variants in the gene (which we set to either the mean number of functional variants or the mean number of loss-of-function variants in TCGA), the disease prevalence (we considered various values based on lifetime risk estimates from the SEER database [51]), the proportion of causal variants (which we varied), the proportion of protective causal variants (which we set to the default value of 0), the relationship between each variant’s odds ratio and MAF (we used the default setting, which assumes that the log odds ratio is proportional to log10(MAF)), and the maximum odds ratio across variants (which we varied). Additional details on the power calculation are provided in the supplemental material. Given a gene with 41 variants (the mean number of qualifying functional variants per gene in TCGA), Figure 1 shows the power of the burden, SKAT, and SKAT-O tests as a function of the maximum odds ratio (2–12) for different proportions of causal variants (20–80%) and different case-control numbers based on the TCGA dataset, while Figure 2 shows the power for case-control numbers based on the PCAWG dataset. Power increases quickly with the maximum odds ratio as well as the proportion of causal variants and the burden, and SKAT-O tests are almost always more powerful than the SKAT test under the scenarios considered. For example, given 918 cases and 1,513 controls (the sample sizes from the TCGA BRCA analyses), a significance threshold of 2.2e-7, a disease prevalence of 6.6% (the lifetime risk of breast cancer from SEER), and a maximum odds ratio of 8, the power of the burden test increases from <1% to 48% to 99% as the proportion of causal variants increases from 20% to 50% to 80% and the power of SKAT-O increases from <1% to 61% to >99%, while the power of the SKAT test increases from <1% to 16% to 69% (Fig. 1). Overall, the power calculations suggest that the discovery analyses had little power to detect genes with a large proportion of non-causal variants or genes with only moderate-risk variants. Therefore, much larger sample sizes may be needed to discover such susceptibility genes, if they exist. The power calculations also suggest that the validation dataset had low power to evaluate replicability under many of the genetic architecture scenarios considered, so further validation may be warranted for the TCGA findings that did not replicate. Online supplementary Figures S10 and S11 provide additional power plots for a gene with four variants (the mean number of qualifying loss-of-function variants in TCGA). These plots show low discovery and replication power under all of the scenarios considered. Therefore, while restricting to loss-of-function variants is likely to lead to a higher proportion of causal variants, the resulting sparsity considerably reduces power.

Fig. 1.

Power calculations for discovery analysis of functional variants, based on TCGA sample sizes. The plots show the power of the burden, SKAT, and SKAT-O tests for a single gene with 41 qualifying variants (mean number of functional variants per gene in TCGA), given a significance threshold of 2.2e-7 and three sample size scenarios representative of TCGA. n1, number of cases; n0, number of controls; % Causal, percentage of causal variants; BUR, burden. From left to right, the sample sizes correspond to TCGA sample sizes for BRCA, OV, and KIRC. The disease prevalence parameter in the power calculation was set to the SEER lifetime risk for the corresponding cancer type (6.6% for breast, 1.2% for ovarian, 1.7% for kidney).

Fig. 1.

Power calculations for discovery analysis of functional variants, based on TCGA sample sizes. The plots show the power of the burden, SKAT, and SKAT-O tests for a single gene with 41 qualifying variants (mean number of functional variants per gene in TCGA), given a significance threshold of 2.2e-7 and three sample size scenarios representative of TCGA. n1, number of cases; n0, number of controls; % Causal, percentage of causal variants; BUR, burden. From left to right, the sample sizes correspond to TCGA sample sizes for BRCA, OV, and KIRC. The disease prevalence parameter in the power calculation was set to the SEER lifetime risk for the corresponding cancer type (6.6% for breast, 1.2% for ovarian, 1.7% for kidney).

Close modal
Fig. 2.

Power calculations for validation analysis of functional variants, based on PCAWG sample sizes. The plots show the power of the burden, SKAT, and SKAT-O tests for a single gene with 41 qualifying variants (mean number of functional variants per gene in TCGA), given a significance threshold of 0.05 and three sample size scenarios representative of PCAWG. n1, number of cases; n0, number of controls; BUR, burden; % Causal, percentage of causal variants. From left to right, the sample sizes correspond to PCAWG sample sizes for BRCA, OV, and KIRC. The disease prevalence parameter in the power calculation was set to the SEER lifetime risk for the corresponding cancer type (6.6% for breast, 1.2% for ovarian, 1.7% for kidney).

Fig. 2.

Power calculations for validation analysis of functional variants, based on PCAWG sample sizes. The plots show the power of the burden, SKAT, and SKAT-O tests for a single gene with 41 qualifying variants (mean number of functional variants per gene in TCGA), given a significance threshold of 0.05 and three sample size scenarios representative of PCAWG. n1, number of cases; n0, number of controls; BUR, burden; % Causal, percentage of causal variants. From left to right, the sample sizes correspond to PCAWG sample sizes for BRCA, OV, and KIRC. The disease prevalence parameter in the power calculation was set to the SEER lifetime risk for the corresponding cancer type (6.6% for breast, 1.2% for ovarian, 1.7% for kidney).

Close modal

The UK Biobank analyses conducted by Cirulli et al. [34] and Van Hout et al. [35], which had a much larger overall sample size than our study but smaller numbers of cases for most of the cancer types analyzed in both their studies and ours, also did not identify any novel cancer susceptibility genes. A key difference between their studies and ours, besides the sample sizes, is that their studies were conventional “case-control” studies in which the control group comprised mainly individuals without cancer. Comparing cases with a given cancer type to non-cancer controls eliminates the possibility of missing germline variants that are associated with cancer overall.

Another issue that can hinder the discovery of associations is technical noise. There was considerable heterogeneity in the TCGA data preparation and sequencing pipelines [45], and various sources of technical variation (including sequencing center, WGA, exome capture kit, and sequencing technology) could potentially have induced spurious signals or obscured real ones. We took measures to minimize batch effects by restricting the TCGA analyses to the intersection of the capture regions across all capture kits and controlling for sequencing center and WGA, but our results may still have been affected by residual technical variation.

While gene-based aggregation is the most common approach for analyzing rare variants, other aggregation strategies, such as collapsing multiple genes or aggregating variants based on functional annotations, can potentially lead to higher power. Improved prediction of functional effects can also lead to higher power by increasing the chances of including causal variants in the analyses. Furthermore, while most known high-penetrance variants are in coding regions, expanding the scope of analysis from the exome to the genome may yield additional discoveries. These approaches, combined with growing sequencing efforts, will enable us to better understand the complexity of the genetic architecture of cancer risk.

The study used existing controlled access data from TCGA and PCAWG and was approved by dbGaP and ICGC.

The authors have no conflicts of interest to declare.

The study was supported by the National Cancer Institute, awards CA251339, CA206980, and CA008748.

Z.G, R.S., and C.B.B. designed the study. Z.G. analyzed the data and drafted the manuscript. R.S. and C.B.B. revised the manuscript.

1.
Mucci
LA
,
Hjelmborg
JB
,
Harris
JR
,
Czene
K
,
Havelick
DJ
,
Scheike
T
, et al;
Nordic Twin Study of Cancer (NorTwinCan) Collaboration
.
Familial risk and heritability of cancer among twins in Nordic countries
.
JAMA
.
2016
Jan
;
315
(
1
):
68
76
.
[PubMed]
1538-3598
2.
Miki
Y
,
Swensen
J
,
Shattuck-Eidens
D
,
Futreal
PA
,
Harshman
K
,
Tavtigian
S
, et al
A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1
.
Science
.
1994
Oct
;
266
(
5182
):
66
71
.
[PubMed]
0036-8075
3.
Wooster
R
,
Bignell
G
,
Lancaster
J
,
Swift
S
,
Seal
S
,
Mangion
J
, et al
Identification of the breast cancer susceptibility gene BRCA2
.
Nature
.
1995
Dec
;
378
(
6559
):
789
92
.
[PubMed]
0028-0836
4.
Sud
A
,
Kinnersley
B
,
Houlston
RS
.
Genome-wide association studies of cancer: current insights and future perspectives
.
Nat Rev Cancer
.
2017
Nov
;
17
(
11
):
692
704
.
[PubMed]
1474-1768
5.
Rahman
N
.
Realizing the promise of cancer predisposition genes
.
Nature
.
2014
Jan
;
505
(
7483
):
302
8
.
[PubMed]
1476-4687
6.
Qing
T
,
Mohsen
H
,
Marczyk
M
,
Ye
Y
,
O’Meara
T
,
Zhao
H
, et al
Germline variant burden in cancer genes correlates with age at diagnosis and somatic mutation burden
.
Nat Commun
.
2020
May
;
11
(
1
):
2438
.
[PubMed]
2041-1723
7.
Sampson
JN
,
Wheeler
WA
,
Yeager
M
,
Panagiotou
O
,
Wang
Z
,
Berndt
SI
, et al
Analysis of heritability and shared heritability based on genome-wide association studies for 13 cancer types
.
J Natl Cancer Inst
.
2015
Oct
;
107
(
12
):djv279.
[PubMed]
1460-2105
8.
Dai
J
,
Shen
W
,
Wen
W
,
Chang
J
,
Wang
T
,
Chen
H
, et al
Estimation of heritability for nine common cancers using data from genome-wide association studies in Chinese population
.
Int J Cancer
.
2017
Jan
;
140
(
2
):
329
36
.
[PubMed]
1097-0215
9.
Antoniou
AC
,
Easton
DF
.
Models of genetic susceptibility to breast cancer
.
Oncogene
.
2006
Sep
;
25
(
43
):
5898
905
.
[PubMed]
0950-9232
10.
Zhang
H
,
Ahearn
TU
,
Lecarpentier
J
,
Barnes
D
,
Beesley
J
,
Qi
G
, et al;
kConFab Investigators
;
ABCTB Investigators
;
EMBRACE Study
;
GEMO Study Collaborators
.
Genome-wide association study identifies 32 novel breast cancer susceptibility loci from overall and subtype-specific analyses
.
Nat Genet
.
2020
Jun
;
52
(
6
):
572
81
.
[PubMed]
1546-1718
11.
Pritchard
JK
.
Are rare variants responsible for susceptibility to complex diseases?
Am J Hum Genet
.
2001
Jul
;
69
(
1
):
124
37
.
[PubMed]
0002-9297
12.
Manolio
TA
,
Collins
FS
,
Cox
NJ
,
Goldstein
DB
,
Hindorff
LA
,
Hunter
DJ
, et al
Finding the missing heritability of complex diseases
.
Nature
.
2009
Oct
;
461
(
7265
):
747
53
.
[PubMed]
1476-4687
13.
Gibson
G
.
Rare and common variants: twenty arguments.
Nature Reviews Genetics. 2012;13(2):
135
145
. 14 Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, et al. Searching for missing heritability: designing rare variant association studies.
Proceedings of the National Academy of Sciences
.
2014
;111(4):E455–E464.
14.
Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, et al. Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci. 2014;111(4):E455–E464.
15.
Wray
NR
,
Wijmenga
C
,
Sullivan
PF
,
Yang
J
,
Visscher
PM
.
Common disease is more complex than implied by the core gene omnigenic model
.
Cell
.
2018
Jun
;
173
(
7
):
1573
80
.
[PubMed]
1097-4172
16.
Lee
S
,
Abecasis
GR
,
Boehnke
M
,
Lin
X
.
Rare-variant association analysis: study designs and statistical tests
.
Am J Hum Genet
.
2014
Jul
;
95
(
1
):
5
23
.
[PubMed]
1537-6605
17.
Rotunno
M
,
Barajas
R
,
Clyne
M
,
Hoover
E
,
Simonds
NI
,
Lam
TK
, et al
A systematic literature review of whole exome and genome sequencing population studies of genetic susceptibility to cancer
.
Cancer Epidemiol Biomarkers Prev
.
2020
Aug
;
29
(
8
):
1519
34
.
[PubMed]
1538-7755
18.
Kanchi
KL
,
Johnson
KJ
,
Lu
C
,
McLellan
MD
,
Leiserson
MD
,
Wendl
MC
, et al
Integrated analysis of germline and somatic variants in ovarian cancer
.
Nat Commun
.
2014
;
5
(
1
):
3156
.
[PubMed]
2041-1723
19.
Cybulski
C
,
Carrot-Zhang
J
,
Kluźniak
W
,
Rivera
B
,
Kashyap
A
,
Wokołorczyk
D
, et al
Germline RECQL mutations are associated with breast cancer susceptibility
.
Nat Genet
.
2015
Jun
;
47
(
6
):
643
6
.
[PubMed]
1546-1718
20.
Rand
KA
,
Rohland
N
,
Tandon
A
,
Stram
A
,
Sheng
X
,
Do
R
, et al;
African Ancestry Prostate Cancer GWAS Consortium
;
ELLIPSE/GAME-ON Consortium
.
Whole-exome sequencing of over 4100 men of African ancestry and prostate cancer risk
.
Hum Mol Genet
.
2016
Jan
;
25
(
2
):
371
81
.
[PubMed]
1460-2083
21.
Roberts
NJ
,
Norris
AL
,
Petersen
GM
,
Bondy
ML
,
Brand
R
,
Gallinger
S
, et al
Whole genome sequencing defines the genetic heterogeneity of familial pancreatic cancer
.
Cancer Discov
.
2016
Feb
;
6
(
2
):
166
75
.
[PubMed]
2159-8290
22.
Dai
W
,
Zheng
H
,
Cheung
AK
.
Tang CSm, Ko JMY, Wong BWY, et al. Whole-exome sequencing identifies MST1R as a genetic susceptibility gene in nasopharyngeal carcinoma.
Proceedings of the National Academy of Sciences
.
2016
;113(12):
3317
3322
.
23.
Chubb
D
,
Broderick
P
,
Dobbins
SE
,
Frampton
M
,
Kinnersley
B
,
Penegar
S
, et al
Rare disruptive mutations and their contribution to the heritable risk of colorectal cancer
.
Nat Commun
.
2016
Jun
;
7
(
1
):
11883
.
[PubMed]
2041-1723
24.
Koboldt
DC
,
Kanchi
KL
,
Gui
B
,
Larson
DE
,
Fulton
RS
,
Isaacs
WB
, et al
Rare variation in TET2 is associated with clinically relevant prostate carcinoma in African Americans
.
Cancer Epidemiol Biomarkers Prev
.
2016
Nov
;
25
(
11
):
1456
63
.
[PubMed]
1538-7755
25.
Karyadi
DM
,
Geybels
MS
,
Karlins
E
,
Decker
B
,
McIntosh
L
,
Hutchinson
A
, et al
Whole exome sequencing in 75 high-risk families with validation and replication in independent case-control studies identifies TANGO2, OR5H14, and CHAD as new prostate cancer susceptibility genes
.
Oncotarget
.
2017
Jan
;
8
(
1
):
1495
507
.
[PubMed]
1949-2553
26.
Litchfield
K
,
Levy
M
,
Dudakia
D
,
Proszek
P
,
Shipley
C
,
Basten
S
, et al
Rare disruptive mutations in ciliary function genes contribute to testicular cancer susceptibility
.
Nat Commun
.
2016
Dec
;
7
(
1
):
13840
.
[PubMed]
2041-1723
27.
Tiao
G
,
Improgo
MR
,
Kasar
S
,
Poh
W
,
Kamburov
A
,
Landau
DA
, et al
Rare germline variants in ATM are associated with chronic lymphocytic leukemia
.
Leukemia
.
2017
Oct
;
31
(
10
):
2244
7
.
[PubMed]
1476-5551
28.
Dicks
E
,
Song
H
,
Ramus
SJ
,
Oudenhove
EV
,
Tyrer
JP
,
Intermaggio
MP
, et al
Germline whole exome sequencing and large-scale replication identifies FANCM as a likely high grade serous ovarian cancer susceptibility gene
.
Oncotarget
.
2017
Mar
;
8
(
31
):
50930
40
.
[PubMed]
1949-2553
29.
Grant
RC
,
Denroche
RE
,
Borgida
A
,
Virtanen
C
,
Cook
N
,
Smith
AL
, et al
Exome-wide association study of pancreatic cancer risk
.
Gastroenterology
.
2018
Feb
;
154
(
3
):
719
722.e3
.
[PubMed]
1528-0012
30.
Yu
Y
,
Hu
H
,
Chen
JS
,
Hu
F
,
Fowler
J
,
Scheet
P
, et al
Integrated case-control and somatic-germline interaction analyses of melanoma susceptibility genes
.
Biochim Biophys Acta Mol Basis Dis
.
2018
Jun
;
1864
(
6
6 Pt B
):
2247
54
.
[PubMed]
0925-4439
31.
Litchfield
K
,
Loveday
C
,
Levy
M
,
Dudakia
D
,
Rapley
E
,
Nsengimana
J
, et al
Large-scale sequencing of testicular germ cell tumour (TGCT) cases excludes major TGCT predisposition gene
.
Eur Urol
.
2018
Jun
;
73
(
6
):
828
31
.
[PubMed]
1873-7560
32.
Artomov
M
,
Stratigos
AJ
,
Kim
I
,
Kumar
R
,
Lauss
M
,
Reddy
BY
, et al
Rare variant, gene-based association study of hereditary melanoma using whole-exome sequencing
.
J Natl Cancer Inst
.
2017
Dec
;
109
(
12
):djx083.
[PubMed]
1460-2105
33.
Mijuskovic
M
,
Saunders
EJ
,
Leongamornlert
DA
,
Wakerell
S
,
Whitmore
I
,
Dadaev
T
, et al
Rare germline variants in DNA repair genes and the angiogenesis pathway predispose prostate cancer patients to develop metastatic disease
.
Br J Cancer
.
2018
Jul
;
119
(
1
):
96
104
.
[PubMed]
1532-1827
34.
Cirulli
ET
,
White
S
,
Read
RW
,
Elhanan
G
,
Metcalf
WJ
,
Tanudjaja
F
, et al
Genome-wide rare variant analysis for thousands of phenotypes in over 70,000 exomes from two cohorts
.
Nat Commun
.
2020
Jan
;
11
(
1
):
542
.
[PubMed]
2041-1723
35.
Van Hout
CV
,
Tachmazidou
I
,
Backman
JD
,
Hoffman
JD
,
Liu
D
,
Pandey
AK
, et al;
Geisinger-Regeneron DiscovEHR Collaboration
;
Regeneron Genetics Center
.
Exome sequencing and characterization of 49,960 individuals in the UK Biobank
.
Nature
.
2020
Oct
;
586
(
7831
):
749
56
.
[PubMed]
1476-4687
36.
Carter
H
,
Marty
R
,
Hofree
M
,
Gross
AM
,
Jensen
J
,
Fisch
KM
, et al
Interaction landscape of inherited polymorphisms with somatic events in cancer
.
Cancer Discov
.
2017
Apr
;
7
(
4
):
410
23
.
[PubMed]
2159-8290
37.
Lu
C
,
Xie
M
,
Wendl
MC
,
Wang
J
,
McLellan
MD
,
Leiserson
MD
, et al
Patterns and functional implications of rare germline variants across 12 cancer types
.
Nat Commun
.
2015
Dec
;
6
(
1
):
10086
.
[PubMed]
2041-1723
38.
Huang
KL
,
Mashl
RJ
,
Wu
Y
,
Ritter
DI
,
Wang
J
,
Oh
C
, et al;
Cancer Genome Atlas Research Network
.
Pathogenic germline variants in 10,389 adult cancers
.
Cell
.
2018
Apr
;
173
(
2
):
355
370.e14
.
[PubMed]
1097-4172
39.
Oak
N
,
Cherniack
AD
,
Mashl
RJ
,
Hirsch
FR
,
Ding
L
,
Beroukhim
R
, et al;
TCGA Analysis Network
.
Ancestry-specific predisposing germline variants in cancer
.
Genome Med
.
2020
May
;
12
(
1
):
51
.
[PubMed]
1756-994X
40.
Madsen
BE
,
Browning
SR
.
A groupwise association test for rare mutations using a weighted sum statistic
.
PLoS Genet
.
2009
Feb
;
5
(
2
):
e1000384
.
[PubMed]
1553-7404
41.
Wu
MC
,
Lee
S
,
Cai
T
,
Li
Y
,
Boehnke
M
,
Lin
X
.
Rare-variant association testing for sequencing data with the sequence kernel association test
.
Am J Hum Genet
.
2011
Jul
;
89
(
1
):
82
93
.
[PubMed]
1537-6605
42.
Lee
S
,
Emond
MJ
,
Bamshad
MJ
,
Barnes
KC
,
Rieder
MJ
,
Nickerson
DA
, et al;
NHLBI GO Exome Sequencing Project—ESP Lung Project Team
.
Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies
.
Am J Hum Genet
.
2012
Aug
;
91
(
2
):
224
37
.
[PubMed]
1537-6605
43.
Sha
Q
,
Wang
X
,
Wang
X
,
Zhang
S
.
Detecting association of rare and common variants by testing an optimally weighted combination of variants
.
Genet Epidemiol
.
2012
Sep
;
36
(
6
):
561
71
.
[PubMed]
1098-2272
44.
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium
.
Pan-cancer analysis of whole genomes
.
Nature
.
2020
Feb
;
578
(
7793
):
82
93
.
[PubMed]
1476-4687
45.
Karczewski
KJ
,
Francioli
LC
,
Tiao
G
,
Cummings
BB
,
Alföldi
J
,
Wang
Q
, et al;
Genome Aggregation Database Consortium
.
The mutational constraint spectrum quantified from variation in 141,456 humans
.
Nature
.
2020
May
;
581
(
7809
):
434
43
.
[PubMed]
1476-4687
46.
McLaren
W
,
Gil
L
,
Hunt
SE
,
Riat
HS
,
Ritchie
GR
,
Thormann
A
, et al
The ensembl variant effect predictor
.
Genome Biol
.
2016
Jun
;
17
(
1
):
122
.
[PubMed]
1474-760X
47.
Buckley
AR
,
Standish
KA
,
Bhutani
K
,
Ideker
T
,
Lasken
RS
,
Carter
H
, et al
Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls
.
BMC Genomics
.
2017
Jun
;
18
(
1
):
458
.
[PubMed]
1471-2164
48.
Rasnic
R
,
Brandes
N
,
Zuk
O
,
Linial
M
.
Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants
.
BMC Cancer
.
2019
Aug
;
19
(
1
):
783
.
[PubMed]
1471-2407
49.
Rohozinski
J
,
Diaz-Arrastia
C
,
Edwards
CL
.
Do some epithelial ovarian cancers originate from a fallopian tube ciliate cell lineage?
Med Hypotheses
.
2017
Sep
;
107
:
16
21
.
[PubMed]
1532-2777
50.
Lee
S
,
Wu
MC
,
Lin
X
.
Optimal tests for rare variant effects in sequencing association studies
.
Biostatistics
.
2012
Sep
;
13
(
4
):
762
75
.
[PubMed]
1468-4357
51.
Hankey
BF
,
Ries
LA
,
Edwards
BK
.
The surveillance, epidemiology, and end results program: a national resource
.
Cancer Epidemiol Biomarkers Prev
.
1999
Dec
;
8
(
12
):
1117
21
.
[PubMed]
1055-9965
Copyright / Drug Dosage / Disclaimer
Copyright: All rights reserved. No part of this publication may be translated into other languages, reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, microcopying, or by any information storage and retrieval system, without permission in writing from the publisher.
Drug Dosage: The authors and the publisher have exerted every effort to ensure that drug selection and dosage set forth in this text are in accord with current recommendations and practice at the time of publication. However, in view of ongoing research, changes in government regulations, and the constant flow of information relating to drug therapy and drug reactions, the reader is urged to check the package insert for each drug for any changes in indications and dosage and for added warnings and precautions. This is particularly important when the recommended agent is a new and/or infrequently employed drug.
Disclaimer: The statements, opinions and data contained in this publication are solely those of the individual authors and contributors and not of the publishers and the editor(s). The appearance of advertisements or/and product references in the publication is not a warranty, endorsement, or approval of the products or services advertised or of their effectiveness, quality or safety. The publisher and the editor(s) disclaim responsibility for any injury to persons or property resulting from any ideas, methods, instructions or products referred to in the content or advertisements.