8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: not found
      • Article: not found

      Genotyping Polyploids from Messy Sequencing Data

      , , ,
      Genetics
      Genetics Society of America

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          <p class="first" id="d360430e183">Gerard <i>et al.</i> highlight several issues encountered when genotyping polyploid organisms from next-generation sequencing data, including allelic bias, overdispersion, and outlying observations. They present modeling solutions and software to account for these issues... </p><p class="first" id="d360430e189">Detecting and quantifying the differences in individual genomes ( <i>i.e.</i>, genotyping), plays a fundamental role in most modern bioinformatics pipelines. Many scientists now use reduced representation next-generation sequencing (NGS) approaches for genotyping. Genotyping diploid individuals using NGS is a well-studied field, and similar methods for polyploid individuals are just emerging. However, there are many aspects of NGS data, particularly in polyploids, that remain unexplored by most methods. Our contributions in this paper are fourfold: (i) We draw attention to, and then model, common aspects of NGS data: sequencing error, allelic bias, overdispersion, and outlying observations. (ii) Many datasets feature related individuals, and so we use the structure of Mendelian segregation to build an empirical Bayes approach for genotyping polyploid individuals. (iii) We develop novel models to account for preferential pairing of chromosomes, and harness these for genotyping. (iv) We derive oracle genotyping error rates that may be used for read depth suggestions. We assess the accuracy of our method in simulations, and apply it to a dataset of hexaploid sweet potato ( <i>Ipomoea batatas</i>). An R package implementing our method is available at <a data-untrusted="" href="https://cran.r-project.org/package=updog" id="d360430e197" target="xrefwindow">https://cran.r-project.org/package=updog</a>. </p>

          Related collections

          Most cited references41

          • Record: found
          • Abstract: found
          • Article: not found

          A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

          Heng Li (2011)
          Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. http://samtools.sourceforge.net. hengli@broadinstitute.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Linkage disequilibrium in humans: models and data.

            In this review, we describe recent empirical and theoretical work on the extent of linkage disequilibrium (LD) in the human genome, comparing the predictions of simple population-genetic models to available data. Several studies report significant LD over distances longer than those predicted by standard models, whereas some data from short, intergenic regions show less LD than would be expected. The apparent discrepancies between theory and data present a challenge-both to modelers and to human geneticists-to identify which important features are missing from our understanding of the biological processes that give rise to LD. Salient features may include demographic complications such as recent admixture, as well as genetic factors such as local variation in recombination rates, gene conversion, and the potential segregation of inversions. We also outline some implications that the emerging patterns of LD have for association-mapping strategies. In particular, we discuss what marker densities might be necessary for genomewide association scans.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              WASP: allele-specific software for robust molecular quantitative trait locus discovery

              Allele-specific sequencing reads provide a powerful signal for identifying molecular quantitative trait loci (QTLs), however they are challenging to analyze and prone to technical artefacts. Here we describe WASP, a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. Using simulated reads, RNA-seq reads and ChIP-seq reads, we demonstrate that WASP has a low error rate and is far more powerful than existing QTL mapping approaches.
                Bookmark

                Author and article information

                Journal
                Genetics
                Genetics
                Genetics Society of America
                0016-6731
                1943-2631
                November 06 2018
                November 2018
                November 2018
                September 05 2018
                : 210
                : 3
                : 789-807
                Article
                10.1534/genetics.118.301468
                6218231
                30185430
                0aea44a7-6150-44e3-b7b1-3e9909615fd5
                © 2018
                History

                Comments

                Comment on this article