Genotyping Polyploids from Messy Sequencing Data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Gerard et al. highlight several issues encountered when genotyping polyploid organisms from next-generation sequencing data, including allelic bias, overdispersion, and outlying observations. They present modeling solutions and software to account for these issues... Detecting and quantifying the differences in individual genomes ( i.e., genotyping), plays a fundamental role in most modern bioinformatics pipelines. Many scientists now use reduced representation next-generation sequencing (NGS) approaches for genotyping. Genotyping diploid individuals using NGS is a well-studied field, and similar methods for polyploid individuals are just emerging. However, there are many aspects of NGS data, particularly in polyploids, that remain unexplored by most methods. Our contributions in this paper are fourfold: (i) We draw attention to, and then model, common aspects of NGS data: sequencing error, allelic bias, overdispersion, and outlying observations. (ii) Many datasets feature related individuals, and so we use the structure of Mendelian segregation to build an empirical Bayes approach for genotyping polyploid individuals. (iii) We develop novel models to account for preferential pairing of chromosomes, and harness these for genotyping. (iv) We derive oracle genotyping error rates that may be used for read depth suggestions. We assess the accuracy of our method in simulations, and apply it to a dataset of hexaploid sweet potato ( Ipomoea batatas). An R package implementing our method is available at <a data-untrusted="" href="https://cran.r-project.org/package=updog" id="d360430e197" target="xrefwindow">https://cran.r-project.org/package=updog</a>.

Related collections

Most cited references 41

Record: found
Abstract: found
Article: not found

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Heng Li (2011)

Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. http://samtools.sourceforge.net. hengli@broadinstitute.org.

0 comments Cited 2350 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Linkage disequilibrium in humans: models and data.

J. Pritchard, M Przeworski (2001)

In this review, we describe recent empirical and theoretical work on the extent of linkage disequilibrium (LD) in the human genome, comparing the predictions of simple population-genetic models to available data. Several studies report significant LD over distances longer than those predicted by standard models, whereas some data from short, intergenic regions show less LD than would be expected. The apparent discrepancies between theory and data present a challenge-both to modelers and to human geneticists-to identify which important features are missing from our understanding of the biological processes that give rise to LD. Salient features may include demographic complications such as recent admixture, as well as genetic factors such as local variation in recombination rates, gene conversion, and the potential segregation of inversions. We also outline some implications that the emerging patterns of LD have for association-mapping strategies. In particular, we discuss what marker densities might be necessary for genomewide association scans.

0 comments Cited 304 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

WASP: allele-specific software for robust molecular quantitative trait locus discovery

Bryce van de Geijn, Graham McVicker, Yoav Gilad … (2015)

Allele-specific sequencing reads provide a powerful signal for identifying molecular quantitative trait loci (QTLs), however they are challenging to analyze and prone to technical artefacts. Here we describe WASP, a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. Using simulated reads, RNA-seq reads and ChIP-seq reads, we demonstrate that WASP has a low error rate and is far more powerful than existing QTL mapping approaches.

0 comments Cited 238 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Genetics

Abbreviated Title: Genetics

Publisher: Genetics Society of America

ISSN (Print): 0016-6731

ISSN (Electronic): 1943-2631

Publication date Created: November 06 2018

Publication date Created: November 2018

Publication date (Print): November 2018

Publication date (Electronic): September 05 2018

Volume: 210

Issue: 3

Pages: 789-807

Article

DOI: 10.1534/genetics.118.301468

PMC ID: 6218231

PubMed ID: 30185430

SO-VID: 0aea44a7-6150-44e3-b7b1-3e9909615fd5

History

Data availability:

Comments

Comment on this article

scite_

Cited by 54

See all cited by

Most referenced authors 786

See all reference authors

- Version 1

Genotyping Polyploids from Messy Sequencing Data

Read this article at

Abstract

Related collections

On Research Data Publishing

Most cited references 41

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Linkage disequilibrium in humans: models and data.

WASP: allele-specific software for robust molecular quantitative trait locus discovery

Author and article information

Journal

Article

History

Comments

Comment on this article

Similar content 3,862

Cited by 54

Most referenced authors 786