Benchmarking of long-read assemblers for prokaryote whole genome sequencing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly.

Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of six long-read assemblers (Canu, Flye, Miniasm/Minipolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used.

Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.6 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 was the only assembler which consistently produced clean contig circularisation. Raven v0.0.5 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.3.0 were computationally efficient but more likely to produce incomplete assemblies.

Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

Related collections

Most cited references 20

Record: found
Abstract: found
Article: not found

Fast and accurate long-read assembly with wtdbg2

Jue Ruan, Heng Li (2019)

Existing long-read assemblers require thousands of CPU hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a long-read assembler wtdbg2 (https://github.com/ruanjue/wtdbg2) that is 2–17 times as fast as published tools while achieving comparable contiguity and accuracy. It paves the way for population-scale long-read assembly in future.

0 comments Cited 604 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Completing bacterial genome assemblies with multiplex MinION sequencing

Ryan Wick, Louise Judd, Claire Gorrie … (2017)

Illumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome requires long-read sequencing such as those generated by Oxford Nanopore Technologies (ONT) platforms. Here we describe our use of the ONT MinION to sequence 12 isolates of Klebsiella pneumoniae on a single flow cell. We assembled each genome using a combination of ONT reads and previously available Illumina reads, and little to no manual intervention was needed to achieve fully resolved assemblies using the Unicycler hybrid assembler. Assembling only ONT reads with Canu was less effective, resulting in fewer resolved genomes and higher error rates even following error correction with Nanopolish. We demonstrate that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing. Specifically, we advocate the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure.

0 comments Cited 434 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The fragment assembly string graph.

Eugene Myers (2005)

We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.

0 comments Cited 137 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Ryan R. Wick: Role: ConceptualizationRole: Data CurationRole: Formal AnalysisRole: InvestigationRole: MethodologyRole: SoftwareRole: Writing – Original Draft Preparation

ORCID: https://orcid.org/0000-0001-8349-0778

Kathryn E. Holt: Role: ConceptualizationRole: SupervisionRole: Writing – Review & Editing

Journal

Journal ID (nlm-ta): F1000Res

Journal ID (iso-abbrev): F1000Res

Journal ID (pmc): F1000Research

Title: F1000Research

Publisher: F1000 Research Limited (London, UK )

ISSN (Electronic): 2046-1402

Publication date (Electronic): 23 December 2019

Publication date Collection: 2019

Volume: 8

Electronic Location Identifier: 2138

Affiliations

[1 ]Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia

[2 ]Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK

[1 ]Department of Electronic Systems and Information Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia

[2 ]Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia

[3 ]Genome Institute of Singapore, A*STAR, Singapore

[1 ]Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA

[2 ]Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA

[3 ]Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, Maryland, USA

[4 ]Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA

Author notes

[a ] rrwick@ 123456gmail.com

No competing interests were disclosed.

Competing interests: No competing interests were disclosed.

Author information

Ryan R. Wick https://orcid.org/0000-0001-8349-0778

Article

DOI: 10.12688/f1000research.21782.1

PMC ID: 6966772

PubMed ID: 31984131

SO-VID: b551ade6-1823-4938-8eb2-23321b636236

License:

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date accepted : 18 December 2019

Funding

Funded by: Sylvia and Charles Viertel Charitable Foundation

Funded by: Bill and Melinda Gates Foundation

Award ID: OPP1175797

Funded by: Department of Education, Employment and Workplace Relations, Australian Government

This work was supported by the Bill & Melinda Gates Foundation, Seattle (grant number OPP1175797) and an Australian Government Research Training Program Scholarship. KEH is supported by a Senior Medical Research Fellowship from the Viertel Foundation of Victoria.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Benchmarking of long-read assemblers for prokaryote whole genome sequencing

Read this article at

Abstract

Related collections

Microbial Genomics

Most cited references 20

Fast and accurate long-read assembly with wtdbg2

Completing bacterial genome assemblies with multiplex MinION sequencing

The fragment assembly string graph.

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Funding

Categories

Comments

Comment on this article

Similar content 196

Cited by 48

Most referenced authors 362