207
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

          Related collections

          Most cited references76

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement

          Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3–5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Fast and accurate de novo genome assembly from long uncorrected reads

            The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment–based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

              We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length, up to 50-kb N50 length in simulations of prokaryotic data and 3-kb N50 on simulated mammalian BACs. When applied to real Solexa data sets without read pairs, Velvet generated contigs of approximately 8 kb in a prokaryote and 2 kb in a mammalian BAC, in close agreement with our simulated results without read-pair information. Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies.
                Bookmark

                Author and article information

                Journal
                Genome Res
                Genome Res
                genome
                genome
                GENOME
                Genome Research
                Cold Spring Harbor Laboratory Press
                1088-9051
                1549-5469
                May 2017
                May 2017
                : 27
                : 5
                : 722-736
                Affiliations
                [1 ]Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA;
                [2 ]Invincea Incorporated, Fairfax, Virginia 22030, USA;
                [3 ]J. Craig Venter Institute, Rockville, Maryland 20850, USA;
                [4 ]National Biodefense Analysis and Countermeasures Center, Frederick, Maryland 21702, USA
                Author notes
                [5]

                These authors contributed equally to this work.

                Corresponding author: adam.phillippy@ 123456nih.gov
                Author information
                http://orcid.org/0000-0003-2983-8934
                Article
                9509184
                10.1101/gr.215087.116
                5411767
                28298431
                0ef173b8-c7c3-4242-a154-4197c9e4249f
                © 2017 Koren et al.; Published by Cold Spring Harbor Laboratory Press

                This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

                History
                : 23 August 2016
                : 3 March 2017
                Funding
                Funded by: National Human Genome Research Institute http://dx.doi.org/10.13039/100000051
                Funded by: National Institutes of Health http://dx.doi.org/10.13039/100000002
                Award ID: HSHQDC-07-C-00020
                Funded by: US Department of Homeland Security http://dx.doi.org/10.13039/100000180
                Funded by: National Science Foundation http://dx.doi.org/10.13039/100000001
                Award ID: NSF IOS-1237993
                Categories
                Method

                Comments

                Comment on this article