Blog
About

33
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Illumina DNA sequencing platform generates accurate but short reads, which can be used to produce accurate but fragmented genome assemblies. Pacific Biosciences and Oxford Nanopore Technologies DNA sequencing platforms generate long reads that can produce complete genome assemblies, but the sequencing is more expensive and error-prone. There is significant interest in combining data from these complementary sequencing technologies to generate more accurate “hybrid” assemblies. However, few tools exist that truly leverage the benefits of both types of data, namely the accuracy of short reads and the structural resolving power of long reads. Here we present Unicycler, a new tool for assembling bacterial genomes from a combination of short and long reads, which produces assemblies that are accurate, complete and cost-effective. Unicycler builds an initial assembly graph from short reads using the de novo assembler SPAdes and then simplifies the graph using information from short and long reads. Unicycler uses a novel semi-global aligner to align long reads to the assembly graph. Tests on both synthetic and real reads show Unicycler can assemble larger contigs with fewer misassemblies than other hybrid assemblers, even when long-read depth and accuracy are low. Unicycler is open source (GPLv3) and available at github.com/rrwick/Unicycler.

          Related collections

          Most cited references 49

          • Record: found
          • Abstract: found
          • Article: not found

          Fast gapped-read alignment with Bowtie 2.

          As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

            The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              T-Coffee: A novel method for fast and accurate multiple sequence alignment.

              We describe a new method (T-Coffee) for multiple sequence alignment that provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives. The method is broadly based on the popular progressive approach to multiple alignment but avoids the most serious pitfalls caused by the greedy nature of this algorithm. With T-Coffee we pre-process a data set of all pair-wise alignments between the sequences. This provides us with a library of alignment information that can be used to guide the progressive alignment. Intermediate alignments are then based not only on the sequences to be aligned next but also on how all of the sequences align with each other. This alignment information can be derived from heterogeneous sources such as a mixture of alignment programs and/or structure superposition. Here, we illustrate the power of the approach by using a combination of local and global pair-wise alignments to generate the library. The resulting alignments are significantly more reliable, as determined by comparison with a set of 141 test cases, than any of the popular alternatives that we tried. The improvement, especially clear with the more difficult test cases, is always visible, regardless of the phylogenetic spread of the sequences in the tests. Copyright 2000 Academic Press.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                8 June 2017
                June 2017
                : 13
                : 6
                Affiliations
                Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Victoria, Australia
                National Human Genome Research Institute, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                • Conceptualization: RRW KEH.

                • Data curation: CLG.

                • Formal analysis: RRW.

                • Funding acquisition: KEH.

                • Investigation: RRW LMJ.

                • Methodology: RRW.

                • Project administration: KEH.

                • Resources: LMJ KEH.

                • Software: RRW.

                • Supervision: KEH.

                • Validation: RRW.

                • Visualization: RRW.

                • Writing – original draft: RRW.

                • Writing – review & editing: RRW LMJ CLG KEH.

                Article
                PCOMPBIOL-D-17-00068
                10.1371/journal.pcbi.1005595
                5481147
                28594827
                © 2017 Wick et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                Counts
                Figures: 9, Tables: 2, Pages: 22
                Product
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/501100000925, National Health and Medical Research Council;
                Award ID: 1043822
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100000925, National Health and Medical Research Council;
                Award ID: 1061409
                Award Recipient :
                This work was funded by the NHMRC of Australia (project #1043822 and Fellowship #1061409 to KEH). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Sequence Assembly Tools
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Sequence Assembly Tools
                Biology and Life Sciences
                Microbiology
                Bacteriology
                Bacterial Genetics
                Bacterial Genomics
                Biology and Life Sciences
                Genetics
                Microbial Genetics
                Bacterial Genetics
                Bacterial Genomics
                Biology and Life Sciences
                Genetics
                Genomics
                Microbial Genomics
                Bacterial Genomics
                Biology and Life Sciences
                Microbiology
                Microbial Genomics
                Bacterial Genomics
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Alignment
                Biology and Life Sciences
                Organisms
                Bacteria
                Klebsiella
                Klebsiella Pneumoniae
                Biology and Life Sciences
                Microbiology
                Medical Microbiology
                Microbial Pathogens
                Bacterial Pathogens
                Klebsiella
                Klebsiella Pneumoniae
                Medicine and Health Sciences
                Pathology and Laboratory Medicine
                Pathogens
                Microbial Pathogens
                Bacterial Pathogens
                Klebsiella
                Klebsiella Pneumoniae
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genomic Libraries
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genomic Libraries
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromosomes
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromosomes
                Chromosome Pairs
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Genome Sequencing
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Genome Sequencing
                Custom metadata
                vor-update-to-uncorrected-proof
                2017-06-22
                All reference genomes used for simulation data are available from the NCBI assembly database (accession numbers in Table 1). E. coli sequence files are publicly available (links in Table 2). Klebsiella sequence files are available from the NCBI Sequence Read Archive database (accession numbers ERX1087708, ERX1087759, SRX2874872 and SRX2874871).

                Quantitative & Systems biology

                Comments

                Comment on this article