Abstract
Background: Age at menarche and menstrual cycle characteristics are indicators of endocrine function and may be risk factors for diseases such as reproductive cancers. The progesterone receptor gene (PGR) has been identified as a candidate gene for age at menarche and menstrual function. Methods: Women office workers ages 19–41 self-reported age at menarche and participated in a prospective study of menstrual function and fertility. First-morning urine was used as the DNA source. 444 women were genotyped for a functional variant in PGR, rs1042838 (Val660Leu), and 264 women were also genotyped for 29 other SNPs across the extended gene region. Results: Genetic variation across PGR was associated with age at menarche using a global score statistic (p = 0.03 among non-Hispanic whites). Women carrying two copies of the Val660Leu variant experienced menarche 1 year later than women carrying one or no copies of the variant (13.6 ± 0.5 vs. 12.6 ± 0.1; p = 0.03). The Val660Leu variant was also associated with decreased odds of short menstrual cycles (17–24 days) (OR, 95% CI: 0.54 [0.36, 0.80]; p = 0.002). Conclusion: Genetic variation in PGR was associated with age at menarche and menstrual cycle length in this population. Further investigation of these associations in a replication dataset is warranted.
Introduction
Age at menarche is highly heritable, with at least 50–60% of the variance attributable to genetic factors [1,2]. Menstrual cycle characteristics, such as menstrual cycle length, may also have a genetic component [3]. A recent genomewide linkage scan for loci affecting age at menarche identified three genomic regions with significant LOD scores [1]. One of these regions contains the progesterone receptor gene (PGR). Genetic variants in the promoter and coding regions of this gene have been associated with breast, endometrial, and ovarian cancer, although there is some inconsistency among the studies [4,5,6,7,8,9].
A genetic variant of PGR coined ‘PROGINS’ consists of a 320-bp Alu insertion in an intron, which is in 100% linkage disequilibrium (LD) with a functional (nonsynonymous) variant Val660Leu and a synonymous variant His770His. An in vitro study showed that the PROGINS variant may have decreased response to progestins, and is not as efficient at opposing estrogen’s proliferative effects, due to decreased mRNA stability and protein activity [10]. Carriers of the Val660Leu variant were more likely to be nulliparous, infertile, and experience irregular menstrual cycles, and were less likely to have premenstrual weight gain or breast pain among controls in a case-control study of ovarian cancer [11]. The Val660Leu variant was also significantly associated with spontaneous abortion in a case-control study [12].
No studies were identified that examined the association between progesterone receptor polymorphisms and age at menarche. Because puberty and menstruation are complex processes that are dependent on feedback mechanisms involving the action of progesterone [13], we hypothesized that variation in the progesterone receptor may influence age at menarche and menstrual cycle characteristics. To investigate this hypothesis, we examined whether there was any association between the Val660Leu variant and age at menarche or menstrual cycle characteristics in a population of women office workers. We also genotyped a set of single-nucleotide polymorphisms (SNPs) that tag variation within the extended gene region of PGR and assessed whether such SNPs were associated with age at menarche.
Methods
Population
Women office workers in New York, New Jersey and Massachusetts were enrolled in a study of reproductive health from 1990 through 1994 [14]. A total of 4,640 women completed self-administered questionnaires including questions on reproductive health and current birth control practices. Women were eligible for a prospective study of fertility if they were between the ages of 18 and 40 and had been sexually active in the month prior to completing the questionnaire while using inconsistent or no birth control (n = 855). The study required first-morning urine collection at least 2 days each cycle at the onset of menstrual bleeding. 603 eligible women agreed to participate. Of these, 524 women collected at least one urine sample, which is the source of DNA for this study. The women reported age at menarche, as well as covariates such as race, ethnicity, and year of birth, during an interview at the onset of the prospective study. Three women who did not report age at menarche and 6 women who did not report either race or ethnicity were excluded, resulting in a starting sample size of 515 for the study of age at menarche.
Of the 524 women who collected urine samples, 470 women were eligible for the menstrual cycle analysis based upon the following criteria: completed follow-up for at least one menstrual cycle; no history of hysterectomy, polycystic ovaries, or tubal ligations; not currently infertile; and partner has not had a vasectomy. These women completed daily dairies which included information on menstrual bleeding as well as covariates such as intercourse, birth control use, smoking, alcohol and caffeine consumption.
Selection of Single Nucleotide Polymorphisms
The Progesterone Receptor is Entrez gene NM_000926, chromosome 11 q22–23, position 100414313-100506465. SNP data were downloaded from the CEPH population of the International HapMap Project, phase II [15]. The CEPH population consists of Utah residents with ancestry from northern and western Europe. We identified 293 SNPs in the extended gene region of the progesterone receptor, which includes the region starting 20 kb upstream of the gene to 10 kb downstream. We used Haploview [16] to identify SNPs with a minor allele frequency greater than 5% (145 SNPs). Because some SNPs are correlated with each other, the program Tagger within Haploview was used to identify linkage disequilibrium (LD)-tagging SNPs, requiring that tagging SNPs be highly correlated (R2 ≥ 0.95) with SNPs not genotyped [17]. This reduced the number to 37 LD-tagging SNPs, including the Val660Leu variant, rs1042838, which was the only functional variant identified.
DNA Extraction and Genotyping
DNA from the frozen, stored urine samples was extracted, amplified and genotyped in the Emory Biomarker Service Center. Urine samples were extracted in duplicate on 20% of the women. Five-milliliter aliquots of urine were centrifuged for 5 min at 3,000 rpm to pellet cells and debris. DNA was extracted using the Qiagen MagAttract DNA Mini M48 kit in combination with the BioRobot M48 workstation.
The Beckman-Coulter GenomeLab SNPStream system was used to genotype the women for the 37 SNPs using primers designed by autoprimer.com [18]. Up to 6 ng of DNA was used for genotyping. SNPs passed or failed genotyping based on default parameters in the GenomeLab SNPStream Genotyping System Software Suite v2.3 and manual quality control (signal intensity and clustering pattern). Genotyping success was defined as the number of genotypes called by the software, divided by the number of genotypes attempted. To assess genotyping accuracy, we examined the concordance among duplicate genotypes. We investigated whether the SNPs were in Hardy-Weinberg equilibrium using the calculator in Haploview, which uses an exact test [16,19]. To be included in our study, we required that each SNP be in Hardy-Weinberg equilibrium, have a >90% genotyping success rate, and have >98% genotyping accuracy.
Given the large body of literature for the rs1042838 SNP, we attempted to maximize our sample size by genotyping this SNP again from additional samples for every woman on a separate SNPStream chip.
Age at Menarche Analyses
We examined whether the mean age at menarche varied by rs1042838 genotype using an ANOVA for a 3-way comparison across genotypes among non-Hispanic whites. Linear regression was also conducted to estimate the effect of each SNP on age at menarche in the total population, adjusting for the effects of race (white, black or African-American, Asian, or other) and ethnicity (Hispanic or non-Hispanic) to control for potential population stratification. An important epidemiologic predictor of age at menarche is nutrition during childhood, which can be measured as childhood body mass index (BMI) or central adiposity [20,21,22]. Because the study included adult women, we were unable to control for childhood BMI. As a crude surrogate, we explored models including adult BMI, but recognize that the temporal sequence of this relationship may be erroneous if age at menarche influences adult BMI.
We then conducted a global test to determine the association between all SNPs in the gene and age at menarche. We fit a semiparametric regression model using the method of least-squares kernel machines (LSKM), which yields a single global score statistic that measures the association between all 30 SNPs and age at menarche [23]. This method has been shown to be more powerful than testing SNPs individually. However, the method does not permit missing data; thus, the analysis could only accommodate those women who had 100% genotyping success, thereby leading to a reduced sample size.
Menstrual Function Analysis
For women who were followed for at least one full menstrual cycle (n = 470), cycle length was calculated from the daily dairies by taking the number of days from the first day of menstrual bleeding of one cycle until the first day of menstrual bleeding of the next cycle. Cycles less than 17 days in length were excluded on the basis that spotting may have been misinterpreted as a menstrual bleed. Cycles longer than 99 days were excluded because they are indicative of an anovulatory condition. Cycles were categorized as short (17–24 days), standard (25–35 days), or long (36–99 days), based on the top and bottom deciles of cycle length, and are consistent with other definitions in the literature. The standard deviation of cycle length was used as the measure of cycle variability. Therefore, a woman had to have completed at least two complete menstrual cycles to be included in the cycle variability analysis.
Generalized estimating equations were used to model the effect of the rs1042838 genotype and covariates on the probability of experiencing long or short cycles, adjusting for the correlation of cycle lengths within a single woman assuming an exchangeable correlation structure. A woman was considered to have highly variable cycles when the standard deviation of her cycle length was in the top quartile. The probability of having highly variable cycles was modeled using logistic regression. Mixed linear models were used to model cycle length, including a random effect for each woman. All menstrual function analyses were either adjusted for self-reported race (white, black, or other) and ethnicity (Hispanic or non-Hispanic), or restricted to non-Hispanic whites, to reduce potential confounding by population stratification. Other established predictors of menstrual function in the literature include age, BMI, smoking, alcohol, and caffeine use. These were explored as covariates.
The Emory Institutional Review Board approved the study protocol after complete de-identification of all samples, surveys and interviews. SAS v. 9.2 (Cary, N.C., USA) was used for statistical analyses.
Results
Genotyping Results
Thirty-one SNPs were successfully genotyped of the 37 attempted. This is within the expected range for the Beckman SNPStream system [Tang, pers. commun.]. Of the 515 women genotyped for these 31 SNPs, 118 failed genotyping completely in the first round of genotyping, likely due to the age of the urine samples and the lower DNA concentration in some urine. The mean concentration of DNA in the urine samples was 11.2 ng/µl. 188 women had concentrations exceeding 10 ng/µl, and 407 women had concentrations exceeding 3 ng/µl. Genotypes were called for 397 women (mean DNA concentration = 12.6 ng/µl). After limiting the analysis to these women, one SNP failed our predetermined genotyping success rate threshold (<90%) and was excluded. Genotyping accuracy was greater than 99%, as calculated by determining the percent of duplicate genotypes that were concordant (n = 2,190 of 2,194). The few discordant genotypes (n = 4) were set to missing. Genotypes were obtained on all 30 SNPs for 264 women; therefore, the global LSKM analysis had a reduced sample size of n = 264. None of the 30 SNPs violated Hardy-Weinberg equilibrium. Among the non-Hispanic whites, the minor allele frequencies (MAF) for all SNPs were very similar to those reported in the CEPH population in HapMap.
During the second round of genotyping (in which the PROGINS SNP, rs1042838, was re-genotyped using DNA extracted from additional urine samples from each woman, and again in duplicate for 20% of the women), additional genotypes were obtained for this SNP, resulting in a total of 444 women available for the PROGINS analyses.
Population Characteristics
The study population was mostly white and non-Hispanic (table 1). The mean age at interview was 31.3 ± 0.2 (median = 31), and the mean age at menarche was 12.6 ± 0.1 (median = 13). There was no association of race or ethnicity with age at menarche in this population: the mean age at menarche among non-Hispanic whites was 12.7 ± 0.1; among blacks, 12.5 ± 0.2; among Hispanics, 12.8 ± 0.3. In addition, there was not a secular trend of age at menarche during this time period in this population (data not shown).
Age at Menarche Results
Carriers of the TT genotype (n = 13) of rs1042838 experienced menarche a year later than those with the GT or GG genotypes (table 2). This association was significant in unadjusted analyses, assuming a recessive genetic model (p = 0.03 in a t test comparing TT vs. others); in analyses restricted to non-Hispanic whites (p = 0.04, TT vs. others); and in a linear regression model adjusted for race and ethnicity (TT vs. others: β = 1.00 ± 0.46, p = 0.03). The linear regression parameter β is interpreted as the additional years to menarche for TT individuals compared to GT or GG individuals. Removing the outliers in age at menarche (2 girls who experienced menarche at ages 8 and 19, respectively) did not affect the results. An inverse normal transformation was then applied because the distribution of age at menarche was leptokurtic, with many individuals experiencing menarche at ages 12 and 13. The results of the multivariate regression model using the inverse-normalized data were consistent with the previous model (TT vs. others, p = 0.03). Additional adjustment for adult BMI did not alter the association. An additive genetic model was also explored; each additional T allele resulted in an increase of 0.15 ± 0.15 years in age at menarche (p = 0.31).
Missing data are not permitted for the LSKM method that tests the gene-wide association described in Kwee et al. [23], and therefore only women with data for all 30 SNPs could be included in the LSKM analysis (n = 264). There was a weak association between global variation in PGR and age at menarche, adjusted for race and ethnicity (p = 0.09). When the population was restricted to non-Hispanic whites (n = 181 with genotype data for all SNPs), the association achieved statistical significance (p = 0.03).
Cycle Length Results
Of the 470 women in the study who had menstrual cycle data, 382 were also successfully genotyped for rs1042838, and contributed a total of 2,553 complete menstrual cycles. Most women in this study (n = 338) were followed for more than one menstrual cycle, and therefore most women contributed cycles of various lengths to the analysis. 129 women contributed at least 1 ‘short’ cycle (17–24 days in length); 128 women contributed at least one ‘long’ cycle (36–99 days); and 365 women contributed at least 1 ‘normal’ cycle (25–35 days).
The mean menstrual cycle lengths were not significantly different across the three rs1042838 genotypes (table 3; linear regression β = 0.30 ± 0.29 for each additional T allele in a mixed linear model adjusted for race and ethnicity; p = 0.30.) However, each additional T allele was associated with significantly decreased odds of having a short cycle (17–24 days) in a dose-response manner (table 3). Similar results were obtained assuming a recessive genetic model: those with TT genotype had a reduced odds of having short menstrual cycles compared to those with the TG or GG genotypes (OR = 0.16; 95% CI, 0.02–1.08). These findings were remarkably robust to adjustment for potential confounders including age at interview, smoking, caffeine, alcohol, and BMI (BMI was modeled categorically, using the categories shown in table 1). While age at interview and BMI >30 were significant predictors of cycle length in our data, including them or any other covariates in the regression models did not alter any of the associations between rs1042838 genotype and menstrual cycle length or variability. There was no association of rs1042838 genotype with either the odds of having long cycles or variable cycles (table 3).
While age at menarche was positively correlated with cycle length in this population (partial correlation after adjusting for age at interview, ρ = 0.09, p = 0.05), this did not explain the association between rs1042838 genotype and P (short cycles); the regression coefficients and p value for the effect of this SNP on P (short cycles) were unchanged by the addition of age at menarche to the model (data not shown).
Discussion
We observed a significant association between the Val660Leu ‘PROGINS’ variant with a 1-year increase in age at menarche and with decreased probability of short menstrual cycles. We also observed a global association of variation across the progesterone receptor gene with age at menarche. This finding is consistent with a whole genome linkage scan for age at menarche, in which the progesterone receptor was identified as a candidate gene [1].
The precise sequence of physiological events which result in menarche is unknown, and the mechanism is likely complex. Progesterone is thought to be an important factor. Progesterone and estrogen levels increase at the inception of puberty, and progesterone production and subsequent withdrawal is necessary for a menstrual bleed to occur [13]. In addition, high adrenal progesterone levels can prevent menarche [24]. Therefore, it is plausible that genetic variation in progesterone or the progesterone receptor may affect pubertal development and age at menarche. The stability and transcriptional activity of the PROGINS variant of PGR differs from the wild-type receptor [10,25]; thus, the response to progesterone (and the threshold levels of circulating progesterone required for menstruation and menarche) could vary according to PGR genotype. Variants in both coding and non-coding regions could be responsible for variation in transcriptional, translational, or functional activity of PGR.
Three studies have found that the PROGINS variant is associated with a modestly increased risk of breast cancer, possibly due to its reduced opposition to estrogen’s mitogenic effects [5,9,26]. In a meta-analysis of ovarian cancer cases and controls, the rs1042838 SNP was associated with a significantly increased risk of endometrioid ovarian cancer [27]. Either this SNP may have pleiotropic effects on multiple reproductive outcomes, or its effects on cancer risk may be mediated through age at menarche. However, this study does not support the latter hypothesis, since the variant was associated with older age at menarche, but has been shown to increase the risk of reproductive cancers.
A strength of this study was the use of prospectively collected daily diary data on menstrual cycles. We found that the PROGINS variant was associated with a decreased risk of short menstrual cycles. This association may be due to chance; however, the observed dose-response relationship increases the plausibility of a causal association. In vitro studies or animal studies investigating feedback mechanisms between progesterone receptor activity and progesterone and estrogen levels could help elucidate the biological mechanism underlying this association.
A unique aspect of this study was the demonstration of urine as a DNA source for epidemiological studies. Studies have shown that urine can be as valid as blood or other sources for genotyping in women [28,29]. Our DNA yield was lower than these studies; we were only able to extract useable DNA from 444 of 515 women who contributed urine samples. However, our genotyping accuracy was over 99%, as measured by the concordance of genotypes among duplicate samples, and all of our SNPs were in Hardy-Weinberg equilibrium. The minor allele frequency for all SNPs in our dataset were almost identical to those found in HapMap, providing further evidence that there was not bias related to genotyping success.
Limitations of this study included the small sample size, particularly after stratifying by race/ethnicity, and the marginally significant results, which could be attributable to type I error. Replication of our findings is therefore important.
We did not observe an association between race or ethnicity and age at menarche in this study. The women in this study experienced menarche mainly in the 1960s and 1970s. Consistent with our study, the National Health and Nutrition Survey shows no significant difference of age at menarche among races in the 1960s [30]. In addition, no association between year of birth and age at menarche was observed. Although a secular trend of decreasing age at menarche was observed over the course of the 20th century [30], it may not be apparent in this study because of the limited time frame.
In conclusion, variation in the progesterone receptor was associated with age at menarche and menstrual cycle length in this population. Understanding the genetic contributions to menstruation and menarche can help elucidate the biological pathways and causal mechanisms involved, and can clarify the role of genetic variation in hormones and their receptors. A more complete picture of the factors affecting age at menarche may eventually help identify those at risk for disorders and chronic diseases associated with menarche. Likewise, understanding the influences of genes on menstruation will add to the body of knowledge concerning menstrual dysfunction and associated morbidities such as infertility. Replication of this study, with particular attention to the Val660Leu variant, is needed to clarify the relationships between progesterone, progesterone receptor genotypes, age at menarche, menstrual function, and risk of reproductive cancers.
Acknowledgments
This research was supported by funding from the National Institutes of Health (RO1HD24618, RO3HD55176, R01HG03618), and the University Research Committee of Emory University. This manuscript was developed under a STAR Research Assistance Agreement No. 916891 awarded by the US Environmental Protection Agency. It has not been formally reviewed by the EPA. The views expressed in the manuscript are solely those of Kira Creswell Taylor and the EPA does not endorse any products or commercial services mentioned in the manuscript.