74
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: not found

          Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants.

          Targeted capture combined with massively parallel exome sequencing is a promising approach to identify genetic variants implicated in human traits. We report exome sequencing of 200 individuals from Denmark with targeted capture of 18,654 coding genes and sequence coverage of each individual exome at an average depth of 12-fold. On average, about 95% of the target regions were covered by at least one read. We identified 121,870 SNPs in the sample population, including 53,081 coding SNPs (cSNPs). Using a statistical method for SNP calling and an estimation of allelic frequencies based on our population data, we derived the allele frequency spectrum of cSNPs with a minor allele frequency greater than 0.02. We identified a 1.8-fold excess of deleterious, non-syonomyous cSNPs over synonymous cSNPs in the low-frequency range (minor allele frequencies between 2% and 5%). This excess was more pronounced for X-linked SNPs, suggesting that deleterious substitutions are primarily recessive.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics.

            Next-generation sequencing has opened the door to genomic analysis of nonmodel organisms. Technologies generating long-sequence reads (200-400 bp) are increasingly used in evolutionary studies of nonmodel organisms, but the short-sequence reads (30-50 bp) that can be produced at lower cost are thought to be of limited utility for de novo sequencing applications. Here, we tested this assumption by short-read sequencing the transcriptomes of the tropical disease vectors Aedes aegypti and Anopheles gambiae, for which complete genome sequences are available. Comparison of our results to the reference genomes allowed us to accurately evaluate the quantity, quality, and functional and evolutionary information content of our "test" data. We produced more than 0.7 billion nucleotides of sequenced data per species that assembled into more than 21,000 test contigs larger than 100 bp per species and covered approximately 27% of the Aedes reference transcriptome. Remarkably, the substitution error rate in the test contigs was approximately 0.25% per site, with very few indels or assembly errors. Test contigs of both species were enriched for genes involved in energy production and protein synthesis and underrepresented in genes involved in transcription and differentiation. Ortholog prediction using the test contigs was accurate across hundreds of millions of years of evolution. Our results demonstrate the considerable utility of short-read transcriptome sequencing for genomic studies of nonmodel organisms and suggest an approach for assessing the information content of next-generation data for evolutionary studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples.

              Reductions in the cost of sequencing have enabled whole-genome sequencing to identify sequence variants segregating in a population. An efficient approach is to sequence many samples at low coverage, then to combine data across samples to detect shared variants. Here, we present methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information. For each population, we first collect SNP candidates based on independent sequence calls per site. We then use MARGARITA with genotype or phased haplotype data from the same samples to collect 20 ancestral recombination graphs (ARGs). We refine the posterior probability of SNP candidates by considering possible mutations at internal branches of the 40 marginal ancestral trees inferred from the 20 ARGs at the left and right flanking genotype sites. Using a population genetic prior distribution on tree-branch length and Bayesian inference, we determine a posterior probability of the SNP being real and also the most probable phased genotype call for each individual. We present experiments on both simulation data and real data from the 1000 Genomes Project to prove the applicability of the methods. We also explore the relative tradeoff between sequencing depth and the number of sequenced samples.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2012
                24 July 2012
                31 July 2012
                : 7
                : 7
                : e37558
                Affiliations
                [1 ]BGI-Shenzhen, Shenzhen, China
                [2 ]Departments of Integrative Biology and Statistics, University of California, Berkeley, California, United States of America
                [3 ]Department of Biology, University of Copenhagen, Copenhagen, Denmark
                University of Montreal, Canada
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Conceived and designed the experiments: RN TK AA YL JW. Performed the experiments: RN TK AA. Analyzed the data: RN TK AA. Contributed reagents/materials/analysis tools: YL WJ. Wrote the paper: RN.

                Article
                PONE-D-11-26050
                10.1371/journal.pone.0037558
                3404070
                22911679
                9aac97b8-aeef-434e-8edb-a6ae95f65d16
                Copyright @ 2012

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 27 December 2011
                : 25 April 2012
                Page count
                Pages: 10
                Funding
                Funding was provided by United States National Institutes of Health grant R01-HG003229. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Computational Biology
                Population Genetics
                Genetic Polymorphism
                Sequence Analysis
                Evolutionary Biology
                Genetics
                Heredity
                Genotypes
                Genomics
                Genome Analysis Tools
                Theoretical Biology
                Mathematics
                Statistics
                Biostatistics

                Uncategorized
                Uncategorized

                Comments

                Comment on this article