11
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Benchmarking of long-read assemblers for prokaryote whole genome sequencing

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly.

          Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of six long-read assemblers (Canu, Flye, Miniasm/Minipolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used.

          Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.6 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 was the only assembler which consistently produced clean contig circularisation. Raven v0.0.5 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.3.0 were computationally efficient but more likely to produce incomplete assemblies.

          Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found

          Fast and accurate long-read assembly with wtdbg2

          Existing long-read assemblers require thousands of CPU hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a long-read assembler wtdbg2 (https://github.com/ruanjue/wtdbg2) that is 2–17 times as fast as published tools while achieving comparable contiguity and accuracy. It paves the way for population-scale long-read assembly in future.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Completing bacterial genome assemblies with multiplex MinION sequencing

            Illumina sequencing platforms have enabled widespread bacterial whole genome sequencing. While Illumina data is appropriate for many analyses, its short read length limits its ability to resolve genomic structure. This has major implications for tracking the spread of mobile genetic elements, including those which carry antimicrobial resistance determinants. Fully resolving a bacterial genome requires long-read sequencing such as those generated by Oxford Nanopore Technologies (ONT) platforms. Here we describe our use of the ONT MinION to sequence 12 isolates of Klebsiella pneumoniae on a single flow cell. We assembled each genome using a combination of ONT reads and previously available Illumina reads, and little to no manual intervention was needed to achieve fully resolved assemblies using the Unicycler hybrid assembler. Assembling only ONT reads with Canu was less effective, resulting in fewer resolved genomes and higher error rates even following error correction with Nanopolish. We demonstrate that multiplexed ONT sequencing is a valuable tool for high-throughput bacterial genome finishing. Specifically, we advocate the use of Illumina sequencing as a first analysis step, followed by ONT reads as needed to resolve genomic structure.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The fragment assembly string graph.

              We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data CurationRole: Formal AnalysisRole: InvestigationRole: MethodologyRole: SoftwareRole: Writing – Original Draft Preparation
                Role: ConceptualizationRole: SupervisionRole: Writing – Review & Editing
                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000Research
                F1000 Research Limited (London, UK )
                2046-1402
                23 December 2019
                2019
                : 8
                : 2138
                Affiliations
                [1 ]Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, VIC, 3004, Australia
                [2 ]Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
                [1 ]Department of Electronic Systems and Information Processing, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
                [2 ]Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
                [3 ]Genome Institute of Singapore, A*STAR, Singapore
                [1 ]Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA
                [2 ]Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
                [3 ]Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, Maryland, USA
                [4 ]Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA
                Author notes

                No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Author information
                https://orcid.org/0000-0001-8349-0778
                Article
                10.12688/f1000research.21782.1
                6966772
                31984131
                b551ade6-1823-4938-8eb2-23321b636236
                Copyright: © 2019 Wick RR and Holt KE

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 18 December 2019
                Funding
                Funded by: Sylvia and Charles Viertel Charitable Foundation
                Funded by: Bill and Melinda Gates Foundation
                Award ID: OPP1175797
                Funded by: Department of Education, Employment and Workplace Relations, Australian Government
                This work was supported by the Bill & Melinda Gates Foundation, Seattle (grant number OPP1175797) and an Australian Government Research Training Program Scholarship. KEH is supported by a Senior Medical Research Fellowship from the Viertel Foundation of Victoria.
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Articles

                assembly,long-read sequencing,oxford nanopore technologies,pacific biosciences,microbial genomics,benchmarking

                Comments

                Comment on this article