79
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      A novel method for accurate operon predictions in all sequenced prokaryotes

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacter pylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from six phylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC 6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: not found

          The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

          P. Sharp, W Li (1987)
          A simple, effective measure of synonymous codon usage bias, the Codon Adaptation Index, is detailed. The index uses a reference set of highly expressed genes from a species to assess the relative merits of each codon, and a score for a gene is calculated from the frequency of use of all codons in that gene. The index assesses the extent to which selection has been effective in moulding the pattern of codon usage. In that respect it is useful for predicting the level of expression of a gene, for assessing the adaptation of viral genes to their hosts, and for making comparisons of codon usage in different organisms. The index may also give an approximate indication of the likely success of heterologous gene expression.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences.

            Campylobacter jejuni, from the delta-epsilon group of proteobacteria, is a microaerophilic, Gram-negative, flagellate, spiral bacterium-properties it shares with the related gastric pathogen Helicobacter pylori. It is the leading cause of bacterial food-borne diarrhoeal disease throughout the world. In addition, infection with C. jejuni is the most frequent antecedent to a form of neuromuscular paralysis known as Guillain-Barré syndrome. Here we report the genome sequence of C. jejuni NCTC11168. C. jejuni has a circular chromosome of 1,641,481 base pairs (30.6% G+C) which is predicted to encode 1,654 proteins and 54 stable RNA species. The genome is unusual in that there are virtually no insertion sequences or phage-associated sequences and very few repeat sequences. One of the most striking findings in the genome was the presence of hypervariable sequences. These short homopolymeric runs of nucleotides were commonly found in genes encoding the biosynthesis or modification of surface structures, or in closely linked genes of unknown function. The apparently high rate of variation of these homopolymeric tracts may be important in the survival strategy of C. jejuni.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.

              Determining protein functions from genomic sequences is a central goal of bioinformatics. We present a method based on the assumption that proteins that function together in a pathway or structural complex are likely to evolve in a correlated fashion. During evolution, all such functionally linked proteins tend to be either preserved or eliminated in a new species. We describe this property of correlated evolution by characterizing each protein by its phylogenetic profile, a string that encodes the presence or absence of a protein in every known genome. We show that proteins having matching or similar profiles strongly tend to be functionally linked. This method of phylogenetic profiling allows us to predict the function of uncharacterized proteins.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Research
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                2005
                2005
                8 February 2005
                : 33
                : 3
                : 880-892
                Affiliations
                1Lawrence Berkeley National Lab 1 Cyclotron Road, Mailstop 939R704, Berkeley, CA 94720, USA
                2Howard Hughes Medical Institute Berkeley, CA, USA
                3Department of Bioengineering, University of California Berkeley, USA
                Author notes
                *To whom correspondence should be addressed. Tel: +1 510 843 1794; Fax: +1 510 486 6059; Email: ejalm@ 123456lbl.gov
                Article
                10.1093/nar/gki232
                549399
                15701760
                5d95853c-7291-4108-baf5-d1afac64e5c5
                © The Author 2005. Published by Oxford University Press. All rights reserved

                The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@ 123456oupjournals.org

                History
                : 14 September 2004
                : 25 October 2004
                : 20 January 2005
                Categories
                Article

                Genetics
                Genetics

                Comments

                Comment on this article