24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SKESA: strategic k-mer extension for scrupulous assemblies

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          SKESA is a DeBruijn graph-based de-novo assembler designed for assembling reads of microbial genomes sequenced using Illumina. Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources. SKESA has been used for assembling over 272,000 read sets in the Sequence Read Archive at NCBI and for real-time pathogen detection. Source code for SKESA is freely available at https://github.com/ncbi/SKESA/releases.

          Electronic supplementary material

          The online version of this article (10.1186/s13059-018-1540-z) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: not found

          Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing.

          The Gram-negative bacteria Klebsiella pneumoniae is a major cause of nosocomial infections, primarily among immunocompromised patients. The emergence of strains resistant to carbapenems has left few treatment options, making infection containment critical. In 2011, the U.S. National Institutes of Health Clinical Center experienced an outbreak of carbapenem-resistant K. pneumoniae that affected 18 patients, 11 of whom died. Whole-genome sequencing was performed on K. pneumoniae isolates to gain insight into why the outbreak progressed despite early implementation of infection control procedures. Integrated genomic and epidemiological analysis traced the outbreak to three independent transmissions from a single patient who was discharged 3 weeks before the next case became clinically apparent. Additional genomic comparisons provided evidence for unexpected transmission routes, with subsequent mining of epidemiological data pointing to possible explanations for these transmissions. Our analysis demonstrates that integration of genomic and epidemiological data can yield actionable insights and facilitate the control of nosocomial transmission.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            An Integrated Pipeline for de Novo Assembly of Microbial Genomes

            Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

              The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks.
                Bookmark

                Author and article information

                Contributors
                souvorov@ncbi.nlm.nih.gov
                agarwala@ncbi.nlm.nih.gov
                lipman@ncbi.nlm.nih.gov
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                4 October 2018
                4 October 2018
                2018
                : 19
                : 153
                Affiliations
                [1 ]NCBI/NLM/NIH/DHHS, 8600 Rockville Pike, Bethesda, 20894 MD USA
                [2 ]Impossible Foods, impossiblefoods.com, Redwood City, 94063 CA USA
                Author information
                http://orcid.org/0000-0002-5518-9723
                Article
                1540
                10.1186/s13059-018-1540-z
                6172800
                30286803
                6db0db02-f75e-4ee3-a938-da6003de4f3a
                © The Author(s) 2018

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 8 May 2018
                : 12 September 2018
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100000092, U.S. National Library of Medicine;
                Categories
                Software
                Custom metadata
                © The Author(s) 2018

                Genetics
                illumina reads,de-novo assembly,debruijn graphs,sequence quality,contamination
                Genetics
                illumina reads, de-novo assembly, debruijn graphs, sequence quality, contamination

                Comments

                Comment on this article