Abstract
Background: In longitudinal epidemiological studies there may be individuals with rich phenotype data who die or are lost to follow-up before providing DNA for genetic studies. Often, the genotypic and phenotypic data of the relatives are available. Two strategies for analyzing the incomplete data are to exclude ungenotyped subjects from analysis (the complete-case method, CC) and to include phenotyped but ungenotyped individuals in analysis by using relatives' genotypes for genotype imputation (GI). In both strategies, the information in the phenotypic data was not used to handle the missing-genotype problem. Methods: We propose a phenotypically enriched genotypic imputation (PEGI) method that uses the EM (expectation-maximization)-based maximum likelihood method to incorporate observed phenotypes into genotype imputation. Results: Our simulations with genotypes missing completely at random show that, for a single-nucleotide polymorphism (SNP) with moderate to strong effect on a phenotype, PEGI improves power more than GI without excess type I errors. Using the Framingham Heart Study data set, we compare the ability of the PEGI, GI, and CC to detect the associations between 5 SNPs and age at natural menopause. Conclusion: The PEGI method may improve power to detect an association over both CC and GI under many circumstances.