199
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Polygenic Modeling with Bayesian Sparse Linear Mixed Models

      research-article
      1 , * , 1 , 1 , 2 , *
      PLoS Genetics
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications, including, recently, polygenic modeling in genome-wide association studies. These two approaches make very different assumptions, so are expected to perform well in different situations. However, in practice, for a given dataset one typically does not know which assumptions will be more accurate. Motivated by this, we consider a hybrid of the two, which we refer to as a “Bayesian sparse linear mixed model” (BSLMM) that includes both these models as special cases. We address several key computational and statistical issues that arise when applying BSLMM, including appropriate prior specification for the hyper-parameters and a novel Markov chain Monte Carlo algorithm for posterior inference. We apply BSLMM and compare it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction. For PVE estimation, we demonstrate that BSLMM combines the advantages of both standard LMMs and sparse regression modeling. For phenotype prediction it considerably outperforms either of the other two methods, as well as several other large-scale regression methods previously suggested for this problem. Software implementing our method is freely available from http://stephenslab.uchicago.edu/software.html.

          Author Summary

          The goal of polygenic modeling is to better understand the relationship between genetic variation and variation in observed characteristics, including variation in quantitative traits (e.g. cholesterol level in humans, milk production in cattle) and disease susceptibility. Improvements in polygenic modeling will help improve our understanding of this relationship and could ultimately lead to, for example, changes in clinical practice in humans or better breeding/mating strategies in agricultural programs. Polygenic models present important challenges, both at the modeling/statistical level (what modeling assumptions produce the best results) and at the computational level (how should these models be effectively fit to data). We develop novel approaches to help tackle both these challenges, and we demonstrate the gains in accuracy that result in both simulated and real data examples.

          Related collections

          Most cited references31

          • Record: found
          • Abstract: found
          • Article: not found

          An efficient multi-locus mixed model approach for genome-wide association studies in structured populations

          Population structure causes genome-wide linkage disequilibrium between unlinked loci, leading to statistical confounding in genome-wide association studies. Mixed models have been shown to handle the confounding effects of a diffuse background of large numbers of loci of small effect well, but do not always account for loci of larger effect. Here we propose a multi-locus mixed model as a general method for mapping complex traits in structured populations. Simulations suggest that our method outperforms existing methods, in terms of power as well as false discovery rate. We apply our method to human and Arabidopsis thaliana data, identifying novel associations in known candidates as well as evidence for allelic heterogeneity. We also demonstrate how a priori knowledge from an A. thaliana linkage mapping study can be integrated into our method using a Bayesian approach. Our implementation is computationally efficient, making the analysis of large datasets (n > 10000) practicable.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Mapping genes for complex traits in domestic animals and their use in breeding programmes.

            Genome-wide panels of SNPs have recently been used in domestic animal species to map and identify genes for many traits and to select genetically desirable livestock. This has led to the discovery of the causal genes and mutations for several single-gene traits but not for complex traits. However, the genetic merit of animals can still be estimated by genomic selection, which uses genome-wide SNP panels as markers and statistical methods that capture the effects of large numbers of SNPs simultaneously. This approach is expected to double the rate of genetic improvement per year in many livestock systems.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Genome-wide association analysis by lasso penalized logistic regression.

              In ordinary regression, imposition of a lasso penalty makes continuous model selection straightforward. Lasso penalized regression is particularly advantageous when the number of predictors far exceeds the number of observations. The present article evaluates the performance of lasso penalized logistic regression in case-control disease gene mapping with a large number of SNPs (single nucleotide polymorphisms) predictors. The strength of the lasso penalty can be tuned to select a predetermined number of the most relevant SNPs and other predictors. For a given value of the tuning constant, the penalized likelihood is quickly maximized by cyclic coordinate ascent. Once the most potent marginal predictors are identified, their two-way and higher order interactions can also be examined by lasso penalized logistic regression. This strategy is tested on both simulated and real data. Our findings on coeliac disease replicate the previous SNP results and shed light on possible interactions among the SNPs. The software discussed is available in Mendel 9.0 at the UCLA Human Genetics web site. Supplementary data are available at Bioinformatics online.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Genet
                PLoS Genet
                plos
                plosgen
                PLoS Genetics
                Public Library of Science (San Francisco, USA )
                1553-7390
                1553-7404
                February 2013
                February 2013
                7 February 2013
                : 9
                : 2
                : e1003264
                Affiliations
                [1 ]Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
                [2 ]Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
                The University of Queensland, Australia
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: XZ PC MS. Performed the experiments: XZ. Analyzed the data: XZ. Contributed reagents/materials/analysis tools: XZ. Wrote the paper: XZ MS. Developed the algorithm and implemented the software used in analysis: XZ.

                Article
                PGENETICS-D-12-02193
                10.1371/journal.pgen.1003264
                3567190
                23408905
                532f2b0d-be3b-4967-9114-f69b2f54a082
                Copyright @ 2013

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 29 August 2012
                : 5 December 2012
                Page count
                Pages: 14
                Funding
                This work was supported by NIH grant HG02585 to MS and by NIH grant HL092206 (PI Y Gilad) and a cross-disciplinary postdoctoral fellowship from the Human Frontiers Science Program to PC. Funding for the Wellcome Trust Case Control Consortium project was provided by the Wellcome Trust under award 076113 and 085475. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Genetics
                Heredity
                Complex Traits
                Quantitative Traits
                Human Genetics
                Genetic Association Studies
                Genome-Wide Association Studies
                Animal Genetics
                Genetics of Disease
                Genome-Wide Association Studies
                Mathematics
                Statistics
                Biostatistics
                Statistical Methods

                Genetics
                Genetics

                Comments

                Comment on this article