Blog
About

140
views
0
recommends
+1 Recommend
0 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

      Nature Methods

      methods, Sequence Analysis, DNA, Repetitive Sequences, Nucleic Acid, Humans, Genome, Bacterial, Gene Library, genetics, Escherichia coli, Chromosomes, Artificial, Bacterial

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

          Related collections

          Most cited references 23

          • Record: found
          • Abstract: found
          • Article: not found

          GAGE: A critical evaluation of genome assemblies and assembly algorithms.

          New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Quake: quality-aware detection and correction of sequencing errors

            We introduce Quake, a program to detect and correct errors in DNA sequencing reads. Using a maximum likelihood approach incorporating quality values and nucleotide specific miscall rates, Quake achieves the highest accuracy on realistically simulated reads. We further demonstrate substantial improvements in de novo assembly and SNP detection after using Quake. Quake can be used for any size project, including more than one billion human reads, and is freely available as open source software from http://www.cbcb.umd.edu/software/quake.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Genomics. Genome project standards in a new era of sequencing.

                Bookmark

                Author and article information

                Journal
                23644548
                10.1038/nmeth.2474

                Comments

                Comment on this article