Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Prodigal: prokaryotic gene recognition and translation initiation site identification

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Background

      The quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals.

      Results

      With our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives.

      Conclusion

      We built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/. Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines.

      Related collections

      Most cited references 15

      • Record: found
      • Abstract: found
      • Article: not found

      Basic local alignment search tool.

      A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        Identifying bacterial genes and endosymbiont DNA with Glimmer.

        The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host. The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella. Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          Artemis: sequence visualization and annotation.

          Artemis is a DNA sequence visualization and annotation tool that allows the results of any analysis or sets of analyses to be viewed in the context of the sequence and its six-frame translation. Artemis is especially useful in analysing the compact genomes of bacteria, archaea and lower eukaryotes, and will cope with sequences of any size from small genes to whole genomes. It is implemented in Java, and can be run on any suitable platform. Sequences and annotation can be read and written directly in EMBL, GenBank and GFF format. AVAILABITLTY: Artemis is available under the GNU General Public License from http://www.sanger.ac.uk/Software/Artemis
            Bookmark

            Author and article information

            Affiliations
            [1 ]Computational Biology and Bioinformatics Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
            [2 ]Genome Science and Technology Graduate School, The University of Tennessee, Knoxville, TN 37996, USA
            [3 ]DOE Joint Genome Institute, Oak Ridge National Laboratory, Oak Ridge TN 37831, USA
            Contributors
            Journal
            BMC Bioinformatics
            BMC Bioinformatics
            BioMed Central
            1471-2105
            2010
            8 March 2010
            : 11
            : 119
            2848648
            1471-2105-11-119
            20211023
            10.1186/1471-2105-11-119
            Copyright ©2010 Hyatt et al; licensee BioMed Central Ltd.

            This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

            Categories
            Software

            Bioinformatics & Computational biology

            Comments

            Comment on this article