124
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genome assembly forensics: finding the elusive mis-assembly

      product-review
      1 , 1 , 1 ,
      Genome Biology
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A collection of software tools is combined for the first time in an automated pipeline for detecting large-scale genome assembly errors and for validating genome assemblies.

          Abstract

          We present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, called amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at http://amos.sourceforge.net.

          Related collections

          Most cited references32

          • Record: found
          • Abstract: found
          • Article: not found

          Consed: a graphical tool for sequence finishing.

          Sequencing of large clones or small genomes is generally done by the shotgun approach (Anderson et al. 1982). This has two phases: (1) a shotgun phase in which a number of reads are generated from random subclones and assembled into contigs, followed by (2) a directed, or finishing phase in which the assembly is inspected for correctness and for various kinds of data anomalies (such as contaminant reads, unremoved vector sequence, and chimeric or deleted reads), additional data are collected to close gaps and resolve low quality regions, and editing is performed to correct assembly or base-calling errors. Finishing is currently a bottleneck in large-scale sequencing efforts, and throughput gains will depend both on reducing the need for human intervention and making it as efficient as possible. We have developed a finishing tool, consed, which attempts to implement these principles. A distinguishing feature relative to other programs is the use of error probabilities from our programs phred and phrap as an objective criterion to guide the entire finishing process. More information is available at http:// www.genome.washington.edu/consed/consed. html.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Fast algorithms for large-scale genome alignment and comparison.

            We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs three times faster while using one-third as much memory as the original MUMmer system. It has been used successfully to align the entire human and mouse genomes to each other, and to align numerous smaller eukaryotic and prokaryotic genomes. A new module permits the alignment of multiple DNA sequence fragments, which has proven valuable in the comparison of incomplete genome sequences. We also describe a method to align more distantly related genomes by detecting protein sequence homology. This extension to MUMmer aligns two genomes after translating the sequence in all six reading frames, extracts all matching protein sequences and then clusters together matches. This method has been applied to both incomplete and complete genome sequences in order to detect regions of conserved synteny, in which multiple proteins from one organism are found in the same order and orientation in another. The system code is being made freely available by the authors.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The Staden package, 1998.

                Bookmark

                Author and article information

                Journal
                Genome Biol
                Genome Biology
                BioMed Central
                1465-6906
                1465-6914
                2008
                14 March 2008
                : 9
                : 3
                : R55
                Affiliations
                [1 ]Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
                Article
                gb-2008-9-3-r55
                10.1186/gb-2008-9-3-r55
                2397507
                18341692
                773f0448-2fe3-40ce-8a1e-9c0d9fd774a6
                Copyright © 2008 Phillippy et al.; licensee BioMed Central Ltd.

                This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 16 October 2007
                : 10 January 2008
                : 14 March 2008
                Categories
                Software

                Genetics
                Genetics

                Comments

                Comment on this article