58
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Estimation of allele frequency and association mapping using next-generation sequencing data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15 X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.

          Results

          We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.

          Conclusions

          Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: not found
          • Article: not found

          A new approach to variable metric algorithms

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genomic scans for selective sweeps using SNP data.

            Detecting selective sweeps from genomic SNP data is complicated by the intricate ascertainment schemes used to discover SNPs, and by the confounding influence of the underlying complex demographics and varying mutation and recombination rates. Current methods for detecting selective sweeps have little or no robustness to the demographic assumptions and varying recombination rates, and provide no method for correcting for ascertainment biases. Here, we present several new tests aimed at detecting selective sweeps from genomic SNP data. Using extensive simulations, we show that a new parametric test, based on composite likelihood, has a high power to detect selective sweeps and is surprisingly robust to assumptions regarding recombination rates and demography (i.e., has low Type I error). Our new test also provides estimates of the location of the selective sweep(s) and the magnitude of the selection coefficient. To illustrate the method, we apply our approach to data from the Seattle SNP project and to Chromosome 2 data from the HapMap project. In Chromosome 2, the most extreme signal is found in the lactase gene, which previously has been shown to be undergoing positive selection. Evidence for selective sweeps is also found in many other regions, including genes known to be associated with disease risk such as DPP10 and COL4A3.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Conditioning of quasi-Newton methods for function minimization

                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2011
                11 June 2011
                : 12
                : 231
                Affiliations
                [1 ]Departments of Integrative Biology and Statistics, UC Berkeley, Berkeley CA 94720, USA
                [2 ]Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark
                [3 ]Beijing Genomics Institute, Shenzhen 518083, China
                [4 ]Department of Biology, University of Copenhagen, Copenhagen, Denmark
                [5 ]Beijing Institute of Genomics, Chinese Academy of Science, Beijing 101300, China
                [6 ]The Graduate University of Chinese Academy of Sciences, Beijing 100062, China
                [7 ]Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
                [8 ]Hagedorn Research Institute, Copenhagen, Denmark
                [9 ]Steno Diabetes Center, Gentofte, Denmark
                [10 ]Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
                [11 ]Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark
                [12 ]Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
                [13 ]Faculty of Health Sciences, University of Aarhus, Aarhus, Denmark
                [14 ]Institute of Biomedical Sciences, University of Copenhagen, Copenhagen, Denmark
                Article
                1471-2105-12-231
                10.1186/1471-2105-12-231
                3212839
                21663684
                1a8c776d-ce41-450a-a925-ddcc5394e0d1
                Copyright ©2011 Kim et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 24 February 2011
                : 11 June 2011
                Categories
                Research Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article