Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Although many de novo genome assembly projects have recently been conducted using high-throughput sequencers, assembling highly heterozygous diploid genomes is a substantial challenge due to the increased complexity of the de Bruijn graph structure predominantly used. To address the increasing demand for sequencing of nonmodel and/or wild-type samples, in most cases inbred lines or fosmid-based hierarchical sequencing methods are used to overcome such problems. However, these methods are costly and time consuming, forfeiting the advantages of massive parallel sequencing. Here, we describe a novel de novo assembler, Platanus, that can effectively manage high-throughput data from heterozygous samples. Platanus assembles DNA fragments (reads) into contigs by constructing de Bruijn graphs with automatically optimized k-mer sizes followed by the scaffolding of contigs based on paired-end information. The complicated graph structures that result from the heterozygosity are simplified during not only the contig assembly step but also the scaffolding step. We evaluated the assembly results on eukaryotic samples with various levels of heterozygosity. Compared with other assemblers, Platanus yields assembly results that have a larger scaffold NG50 length without any accompanying loss of accuracy in both simulated and real data. In addition, Platanus recorded the largest scaffold NG50 values for two of the three low-heterozygosity species used in the de novo assembly contest, Assemblathon 2. Platanus therefore provides a novel and efficient approach for the assembly of gigabase-sized highly heterozygous genomes and is an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity.

Related collections

Most cited references 32

Record: found
Abstract: found
Article: found

Is Open Access

The Sequence Alignment/Map format and SAMtools

Heng Li, Bob Handsaker, Alec Wysoker … (2009)

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 13815 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Fast gapped-read alignment with Bowtie 2.

Ben Langmead, Steven L Salzberg (2022)

As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

0 comments Cited 12771 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

Manfred Grabherr, Brian Haas, Moran Yassour … (2011)

Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.

0 comments Cited 3319 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Genome Res

Journal ID (iso-abbrev): Genome Res

Journal ID (hwp): genome

Journal ID (pmc): genome

Journal ID (publisher-id): GENOME

Title: Genome Research

Publisher: Cold Spring Harbor Laboratory Press

ISSN (Print): 1088-9051

ISSN (Electronic): 1549-5469

Publication date (Print): August 2014

Publication date PMC-release: August 2014

Volume: 24

Issue: 8

Pages: 1384-1395

Affiliations

[1 ]Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan;

[2 ]AXIOHELIX Co. Ltd., Chuo-ku, Tokyo 103-0015, Japan;

[3 ]Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan;

[4 ]Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan;

[5 ]Division of Microbial Genomics, Frontier Science Research Center, University of Miyazaki, Miyazaki 889-1692, Japan;

[6 ]Division of Microbiology, Faculty of Medicine, University of Miyazaki, Miyazaki 889-1692, Japan;

[7 ]Division of Parasitology, Faculty of Medicine, University of Miyazaki, Miyazaki 889-1692, Japan;

[8 ]Genetic Strains Research Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan

Author notes

Corresponding author: takehiko@ 123456bio.titech.ac.jp

Article

Medline ID: 9518021

DOI: 10.1101/gr.170720.113

PMC ID: 4120091

PubMed ID: 24755901

SO-VID: fba042dd-48c6-47c9-878f-e0617f3a2d6d

License:

This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

History

Date received : 6 December 2013

Date accepted : 21 April 2014

Page count

Pages: 12

Comments

Comment on this article

scite_

Cited by 563

See all cited by

Most referenced authors 2,468

See all reference authors

Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads

Read this article at

Abstract

Related collections

Arabidopsis genomics

Most cited references 32

The Sequence Alignment/Map format and SAMtools

Fast gapped-read alignment with Bowtie 2.

Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 185

Cited by 563

Most referenced authors 2,468