Introduction: Many missense variants in G protein-coupled receptors (GPCRs) involved in the neuroendocrine regulation of reproduction have been identified by phenotype-driven or large-scale exome sequencing. Computational functional prediction analysis is commonly performed to evaluate their impact on receptor function. Methods: To assess the performance and outcome of functional prediction analyses for these GPCRs, we performed a statistical analysis of the prediction performance of SIFT and PolyPhen-2 for variants with documented biological function as well as variants retrieved from Ensembl. We obtained missense variants with documented biological function testing from patients with reproductive disorders from a comprehensive literature search. Missense variants from individuals with known reproductive disorders were retrieved from the Human Gene Mutation Database. Missense variants from the general population were retrieved from the Ensembl genome database. Results: The accuracies of SIFT and PolyPhen-2 were 83 and 85%, respectively. The performance of both prediction tools was greater in predicting loss-of-function variants (SIFT: 92%; PolyPhen-2: 95%) than in predicting variants that did not affect function (SIFT: 54%; PolyPhen-2: 57%). Concordance between SIFT and PolyPhen-2 did not improve accuracy. Surprisingly, approximately half of the variants retrieved from Ensembl were predicted as loss-of-function variants by SIFT (47%) and PolyPhen-2 (54%). Conclusion: Our findings provide new guidance for interpreting the results and limitations of computational functional prediction analyses for GPCRs and will help to determine which variants require biological function testing. In addition, our findings raise important questions regarding the link between genotype and phenotype in the general population.
Phenotype-driven exome sequencing  and large-scale next-generation DNA sequencing of the general population  have revealed numerous rare nonsynonymous missense variants in protein-coding DNA sequences. The identification of causal missense variants that alter human phenotypes, in particular to induce disease states, is one of the fundamental goals of human genetics, with the objective of providing crucial insights into the biology connecting genotype and phenotype and potentially facilitating the prediction of disease onset. Performing biological testing to determine the effect of a missense variant on the function of the encoded protein usually produces reliable results but is laborious and time-consuming. In this context, the search for alternative and reliable methods for assigning effects of novel variants on protein function is of primary importance. There are various computational in silico prediction tools available for predicting the function of variants, using information derived from sequence similarity  and phylogenetic profiles . However, the performance of available functional prediction tools varies between proteins in different functional categories [5,6,7,8]. Delineating the effects of missense variants identified in G protein-coupled receptors (GPCRs), a large family of receptors involved in signal transduction and cellular response to outside signals and the most common drug targets (the targets account for 27% of all FDA-approved drugs) , faces this challenge. Sequence alignment of 94 GPCRs revealed many highly conserved amino acids , suggestive of potentially damaging effects if mutations occur in these highly conserved amino acids. Bioinformatics approaches have been used to predict intrinsically disordered regions of GPCRs - regions lacking a stable three-dimensional structure and playing a role in intra- and extracellular plasticity and protein-protein interactions of GPCRs - and regions predicting G protein coupling specificity [11,12]. In the GPCR family, gonadotropin-releasing hormone receptor (GnRHR), kisspeptin receptor (KISS1R), prokineticin receptor 2 (PROKR2), and tachykinin receptor 3 (TACR3) have been found to play key roles in the central neuroendocrine regulation of reproductive function . In these GPCRs, more than 300 missense variants have been found, either from patients with phenotypic reproductive disorders or from large-scale genomic sequencing of general populations. Some of the variants identified in patients with reproductive disorders have undergone in vitro biological function testing [13,14], which has aided in the interpretation of the pathogenic relationship between genotype and phenotype. Computational prediction results from Sorting Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping-2 (PolyPhen-2) are available for most of the variants, but the correlation between in silico prediction and in vitro biological function for these GPCR missense variants has not been well established.
Materials and Methods
Data and Materials
The missense variants of GnRHR, KISS1R, PROKR2, and TACR3 identified in patients with known phenotypes were acquired from The Human Gene Mutation Database (HGMD) (http://www.hgmd.org). The overall missense variants in representative populations were obtained from the Ensembl genome database (http://useast.ensembl.org/index.html). The data from HGMD and Ensembl were retrieved in July 2014. The missense variants with documented biological function tests were retrieved through a comprehensive literature search in PubMed of articles published between 1997 and July 2014.
Computational functional prediction results for each variant with documented biological function testing were obtained using SIFT  and PolyPhen-2  analyses. We defined the function predicted by SIFT and PolyPhen-2 following the programs' parameters. For SIFT, the predicted function of a variant is characterized as tolerated or deleterious based on the SIFT score (0-1). A score ≤0.05 is defined as deleterious, and a score >0.05 is defined as tolerated. For PolyPhen-2, the predicted function of a variant is characterized as benign (score 0-0.5), possibly damaging (score >0.5-0.9), or probably damaging (score >0.9) based on a score scale of 0-1. For in vitro biological function tests, we characterized the function as normal (maximum response of a variant at least 80% of the maximum response of the corresponding wild-type receptor), partial loss of function (maximum response of a variant between 20 and 80% of the maximum response of the wild-type receptor), or complete loss of function (maximum response <20% of the maximum response of the wild-type receptor).
In order to perform statistical analyses, we simplified the characterization to be either benign or damaging for both computational prediction tests and biological function test results. As such, benign was defined as tolerated in SIFT, benign in PolyPhen-2, and normal (>80% of wild type) in biological function tests. Damaging was defined as deleterious in SIFT, either possibly damaging or probably damaging in PolyPhen-2, and either partial or complete loss of function (<80% of wild type) in biological function tests. Concordance for the in silico analyses was defined as both computational prediction tools having the same functional prediction outcome. Statistical analyses, including true-positive (TP), false-positive (FP), true-negative (TN), false-negative (FN), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy (AC), and Matthews correlation coefficient (MCC), were calculated using the formulas presented below.
Summary of Variants in GnRHR, KISS1R, PROKR2, and TACR3 with Documented Biological Function Test Results and Their Correlation with the Computational Functional Prediction Tools
A total of 52 missense variants with documented biological function testing were identified in GnRHR, KISS1R, PROKR2, and TACR3through a literature search in PubMed. All of the variants with biological function tests were identified in patients with reproductive disorders. Table 1 summarizes the nucleotide (position and exon) and amino acid (position and receptor domain) changes, computational function prediction results by SIFT and PolyPhen-2, and biological function test results. Nineteen of 52 variants were identified in GnRHR, 7 in KISS1R, 17 in PROKR2, and 9 in TACR3. Ten of the variants had normal biological function testing, 21 had partial loss of function, and 21 had complete loss of function (table 1). All 10 variants reported with normal biological function were either PROKR2 or TACR3 variants, except one in KISS1R.
To evaluate the performance of the individual computational prediction tools, we analyzed the outcome and compared the findings with the results of biological function tests. Figure 1 summarizes the outcome of SIFT and PolyPhen-2 computational prediction testing and the comparison with biological function test results. Among variants that were predicted by SIFT to be tolerated, 53% had normal results on biological function testing (fig. 1a), while 57% of the variants predicted by PolyPhen-2 to be benign had normal results on biological function testing (fig. 1b). When SIFT predicted the variants to be deleterious, 93% of them had impaired function, based on documented in vitro biological testing (fig. 1a). Similarly, among variants predicted by PolyPhen-2 to be possibly or probably damaging, 100 and 94% had impaired function based on in vitro biological test results (partial or complete loss of function), respectively (fig. 1b). Overall, both tools performed better in predicting loss-of-function variants. In contrast, the rate for correctly predicting a variant to have normal function was only slightly above 50% for both prediction programs.
Concordance between the Prediction Tools
Since there is no single perfect tool that can guarantee correct computational prediction of biological function of a GPCR variant, many scientists tend to use more than one prediction tool with the expectation that concordance between or among the prediction tools increases the accuracy of the computational predictions. To evaluate whether the use of more than one prediction tool improved the computational prediction tool performance, we analyzed the concordance between the prediction tools assessed and matched the concordance rate with the results of biological function testing of the variants. Figure 2 shows the results of the concordance analysis. The concordance rates were calculated as the number of the variants predicted by SIFT and PolyPhen-2 to have the same functional outcome, divided by the total 52 variants with verified in vitro biological function. The overall concordance rate was 98%, but decreased to 83% after matching the computational predictions for concordance with biological function test results (fig. 2). PolyPhen-2 and SIFT concordantly predicted 25% of the 52 variants as benign. This decreased to 13.5% for concordance between the two computational programs and the documented biological function test results (fig. 2). Both tools concordantly predicted 73% of the total 52 variants as damaging. This rate again decreased, to 62.8%, upon matching the concordance with biological function test results (fig. 2).
Statistical Analyses of the Performance of the Computational Prediction Tools Compared to Biological Function Test Outcomes
We analyzed sensitivity, specificity, PPV, NPV, accuracy, and MCC to further evaluate the performance of the prediction programs. Of note, damaging (possibly or probably) or deleterious was defined as positive (loss of function), while benign or tolerated was defined as negative (normal function). Table 2 summarizes the results of these statistical analyses. The concordant prediction group is defined as the group in which both in silico tools concordantly predict the variants to have the same function. Among 42 variants with documented impaired biological function based on in vitro testing, each prediction program alone, as well as the combined prediction from both tools, predicted 36 variants to be damaging. Among 10 variants with normal biological function test results, PolyPhen-2 predicted 8 to be benign, while SIFT predicted 7 to be benign; a concordant prediction from both computational tools also correctly predicted 7 to be benign. One variant with normal biological function tests had discordance in functional prediction by the two in silico prediction tools.
The sensitivity (the ability to identify variants with loss of function) was 86% in all groups. The specificity (the ability to identify variants with normal function) was 70, 80 and 78% for the SIFT, PolyPhen-2, and concordant prediction groups, respectively. The PPVs (confidence that the prediction of a variant to be damaging in biological testing was correct) were 92, 95, and 95% for the SIFT, PolyPhen-2, and concordant prediction groups, respectively. The NPVs (confidence that a prediction of a variant to be benign was correct) were 54, 57, and 54%, respectively. The prediction accuracies were 83, 85, and 84% for SIFT, PolyPhen-2, and the concordant prediction groups, respectively. The MCC was 0.5, 0.6, and 0.6, respectively.
Proportion and Distribution of Missense Variants with Documented or Unknown Phenotype and Biological Function, and Functional Prediction Results
We retrieved nonsynonymous missense variants in GNRHR, KISS1R, PROKR2, and TACR3 from Ensembl, one of the largest databases for human variants. After excluding duplicates and the variants with known pathogenic clinical significance, there were a total of 322 missense variants derived from the Ensembl database. Among the variants, 16% (52), 12% (40), 30% (98), and 41% (132) were identified in GnRHR, KISS1R, PROKR2, and TACR3, respectively. In comparison, 95 variants identified by phenotype-driven sequencing were retrieved from HGMD. Among phenotype-related variants retrieved from HGMD, 36% (34), 13% (12) 38% (36), and 14% (13) were identified in GnRHR, KISS1R, PROKR2, and TACR3, respectively. Among 52 variants identified by phenotype-driven sequencing with documented in vitro biological function test results, 37% (19), 13% (7) 33% (17), and 17% (9) were identified in GnRHR, KISS1R, PROKR2, and TACR3, respectively (table 3).
We took advantage of the availability of SIFT and PolyPhen-2 prediction results in Ensembl and analyzed the computational functional predictions of the nonsynonymous missense variants retrieved from Ensembl. Interestingly, SIFT and PolyPhen-2 predicted 53% (171/322) and 46% (146/316) of the variants to be benign, respectively, with the other 47% (151/322) and 54% (170/316) as deleterious or damaging (table 4). PolyPhen-2 prediction results for 6 variants were not available. We compared the concordance between the two prediction programs. Among 169 variants that were predicted to be benign (tolerated) by SIFT, 72% (122/169) were concordantly predicted to be benign by PolyPhen-2. Among 150 variants that SIFT predicted to be damaging (deleterious), 145 (97%) were predicted to be damaging by PolyPhen-2 as well. In turn, when we calculated the concordance based on the PolyPhen-2 predictions, 86% (126/146) and 82% (155/170) were concordantly predicted by SIFT to be benign (tolerated) and damaging (deleterious), respectively (online suppl. table S1; for all suppl. material, see www.karger.com/doi/10.1159/000435884).
With the advances in genetic sequencing methodologies, the list of new nonsynonymous missense variants continues to increase rapidly. Accurate determination of the effect of each missense variant on protein function is a fundamental step to provide the connection between genotype and phenotype. Biological function testing is an effective way to determine the pathogenicity of a newly identified missense variant, which can provide revolutionary advances in our knowledge of human physiology and pathophysiology. For example, after a novel Leu148Ser variant of KISS1R was identified in a patient with hypogonadotropic hypogonadism (HH), our laboratory performed biological function testing and confirmed that the variant caused loss of function of KISS1R . This finding revealed the key role of KISS1R and its ligand kisspeptin in the regulation of reproductive function and advanced our understanding of reproductive control upstream of GnRH. Measuring changes in second messengers involved in GPCR-mediated signal transduction is an effective way to assess the impact of variants on receptor function. GnRHR, KISS1R, PROKR2, and TACR3are Gq-coupled receptors; activation of these receptors is reflected by changes in inositol phosphate production (IP), intracellular calcium concentrations ([Ca2+]i), and ERK phosphorylation. The assays for measurement of these second messengers have been well established [17,18,19]. However, biological function testing can be costly, labor-intensive, and time-consuming. As shown in this study, only 50% of the missense variants identified to date from phenotype-driven sequencing have had biological function tests (table 3). As an alternative, computational functional prediction tools have been used widely to predict the impact of missense variants on protein function, in an attempt to identify disease-causing variants. However, the accuracy of each prediction tool, while impressive, is not yet ideal. In a recent comprehensive analysis, the accuracies of nine widely used prediction methods were in the range of 60-82% .
In this study, we analyzed the performance of two frequently used prediction tools, SIFT and PolyPhen-2, in predicting the functional effects of missense variants of GPCRs involved in the central neuroendocrine regulation of reproduction. Another important rationale for our selection of these two prediction programs for this study is that the results of computational analyses of GPCR variants by SIFT and PolyPhen-2 are available in the Ensembl database. Analysis of the performance of these two programs will aid investigators in the proper interpretation of the results of computational analysis of variants retrieved from Ensembl. There are additional computational prediction programs available, such as Panther (http://www.pantherdb.org/tools/csnpScoreForm.jsp) and Mutation Taster (http://www.mutationtaster.org). Computational prediction programs principally use several attributes related to protein structure, evolutionary conservation, phylogeny, biophysical characteristics of the substitution, secondary structural information, and chain flexibility. The programs share some similarities but also have some distinct features . SIFT makes inferences from sequence similarity using mathematical operations , while PolyPhen2 employs a combination of sequence- and structure-based attributes for the description of an amino acid substitution, and the effect of a mutation is predicted by a naive Bayesian classifier [15,20].
Although the mechanisms of prediction differ between these two programs, both performed well in predicting loss-of-function variants (fig. 1). In contrast, however, the capability for precisely identifying normal function variants was inferior for both SIFT and PolyPhen-2, and as such, more than 40% of variants with demonstrated impaired receptor function by biological testing were falsely predicted to be benign by each method (fig. 1). This is an important consideration because investigators may stop pursuing biological function tests if in silico programs predict a novel variant to be benign. This statement is supported by our finding that only a small proportion (13/52 in SIFT, 14/52 in PolyPhen-2) of variants with documented biological function testing were predicted to be benign (table 1), whereas half of all missense variants from Ensembl were predicted to be benign by both programs (table 4). Alternatively, this difference in the proportion of variants predicted to be benign in the two datasets may be because variants in the first set were identified in patients with reproductive phenotypes, and were therefore more likely to be damaging, whereas variants from large-scale sequencing were found in general populations with unknown phenotypes. Since HH and central precocious puberty are rare, variants in these populations are unlikely to come from individuals with an underlying reproductive phenotype, so the variants in this population should be more likely to be benign. This tendency for investigators to stop pursuing biological function tests if in silico programs predict a novel variant to be benign may also explain why the benign variants with documented biological function reported were primarily in PROKR2 and TACR3, but not in GnRHR, since most of the studies of GnRHR were done earlier than the more recent studies of PROKR2 and TACR3, at a time when it was less accepted to report benign variants. In addition to loss-of-function mutations reported in these GPCRs in patients with GnRH deficiency, a gain-of-function mutation, R386P KISS1R, has been reported in association with central precocious puberty . However, neither SIFT nor PolyPhen-2 is designed to predict variants with gain of function.
To increase the accuracy of computational functional prediction analysis, a common approach is to apply more than one prediction tool. Interestingly, in this study, when the concordance between SIFT and PolyPhen-2 predictions was matched with biological function testing outcomes, the performance of the concordant predictions was not superior to the performance of a single program (table 2). Evidently, both programs can concordantly predict the incorrect function in a variant since concordance rates were always higher for the two in silico prediction tests than the rates when concordance of the prediction tests was further matched with biological function testing outcomes, although these differences did not reach statistical significance (fig. 2). It is even more difficult to interpret the function of a variant if the programs' predictions are discordant.
In our study, sensitivity represents the ability to identify loss-of-function variants. Both in silico prediction tools had the same sensitivity using biological function testing as the gold standard (table 2). Although even biological testing may not be perfect, there is no other, better method to be used than the gold standard. In contrast, our statistical analysis showed that the ability to identify variants with normal function by in silico prediction (specificity and NPV) was inferior to the ability to identify loss-of-function variants (sensitivity and PPV). Given the fact that more than 40% of variants predicted to be benign were actually deleterious/damaging in biological function testing, if a novel GPCR variant is predicted to be benign by in silico prediction tools, a biological function test may nonetheless still be indicated, particularly if there is a strong phenotype correlation. On the other hand, performance of biological function testing may not be necessary if a variant is predicted to be damaging or deleterious, as more than 90% of variants predicted to be damaging were true loss-of-function variants in biological function testing assays (fig. 1; table 2). The interpretation of concordant prediction analysis is complex. After excluding one variant with discordant predictions, the combined prediction analysis by SIFT and PolyPhen-2 had comparable performance outcomes when compared with the single-program predictions (table 2). The accuracies range from 83 to 86% for SIFT and PolyPhen-2, respectively, which, interestingly, was better than in an earlier comprehensive analysis .
The MCC  is an important statistical parameter as it is unaltered by differing proportions of benign and damaging variants, while PPV and NPV may be affected by the prevalence in varied populations . Because of its insensitivity to variation in sample size, the MCC gives a more balanced assessment of performance than the other performance measures . In light of the reference range (-1 to 1), a coefficient of +1 represents a perfect prediction, 0 equals random prediction, and −1 resembles complete discrepancy between prediction and observation. Compared with an earlier study , the MCC in this study (table 2) indicates a fair performance of SIFT, PolyPhen-2, and concordant predictions, without any differences between the different analyses.
Ensembl is one of several well-known genome browsers for the retrieval of genomic information. We chose Ensembl to retrieve missense variants of GPCRs involved in the neuroendocrine regulation of reproductive function because it also provides information on the functional prediction results by SIFT and PolyPhen-2. Analysis of the computational prediction results revealed a surprising finding that half of the variants were predicted to be deleterious/damaging by SIFT and PolyPhen-2 (table 4). Interpretation of the results of these prediction analyses in this setting is even more difficult since the majority of the missense variants were found in large-scale genomic sequencing of general, unphenotyped populations rather than from phenotype-driven sequencing (table 3). In view of the fact that both SIFT and PolyPhen-2 accurately predict more than 90% of variants with documented impaired function as deleterious/damaging (fig. 1; table 2), it is likely that most variants retrieved from Ensembl and predicted by SIFT and PolyPhen-2 to be deleterious/damaging do have impaired function. Linking these genotype findings to phenotype is difficult since the majority of the variants from Ensembl were not identified by phenotype-driven sequencing. In the general population, the prevalence of HH is extremely low, estimated to be about 0.01-0.025% . Individuals harboring loss-of-function variants may have a subclinical phenotype or no phenotype at all, in light of diverse pathogenetic mechanisms and inheritance patterns (autosomal recessive , autosomal dominant , haploinsufficiency, digenic inheritance [27,28], and oligogenic inheritance ) of GPCR loss-of-function variants. We speculate that if a variant co-segregates with an HH phenotype in a family, one might expect that it is more likely to be deleterious, but we did not specifically test this hypothesis in this study.
In conclusion, we have found that the performance of the computational prediction programs SIFT and PolyPhen-2 is effective in predicting loss of function for variants of GPCRs involved in the neuroendocrine regulation of reproduction. In contrast, both programs were less effective for predicting benign variants - more than 40% of variants predicted to be benign showed loss of function in biological function tests. Based on these findings, we recommend performing biological function testing even for variants predicted to be benign, especially if there is a close phenotype correlation. The surprising finding that a significant number of the variants identified in large-scale sequencing were predicted to be loss-of-function variants creates an immense challenge for the interpretation of their clinical significance.
This research was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NICHD/NIH), through cooperative agreement U54 HD28138 as part of the Specialized Cooperative Centers Program in Reproduction and Infertility Research and by R01 HD19938 (to U.B.K.), and by NICHD/NIH K08 HD070957 (to L.M.).
The authors have nothing to disclose.