47
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Phyletic Profiling with Cliques of Orthologs Is Enhanced by Signatures of Paralogy Relationships

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs—homologs separated by a speciation and a duplication event, respectively—provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model's estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ∼400000 specific annotations with the estimated Precision of 90%, ∼19000 of which are highly specific—e.g. “penicillin binding,” “tRNA aminoacylation for protein translation,” or “pathogenesis”—and are freely available at http://gorbi.irb.hr/.

          Author Summary

          While both the number and the diversity of sequenced prokaryotic genomes grow rapidly, the number of specific assignments of gene functions in the databases remains low and skewed toward the model prokaryote Escherichia coli. To aid in understanding the full set of newly sequenced genes, we created a computational model for assignment of function to prokaryotic genomes. The result is an innovative framework for orthology and paralogy-aware phyletic profiling that provides a large number of computational annotations with high predictive accuracy in train/test evaluations. Our predictions include annotations for 1.3 million genes with the estimated Precision of 90%; these, and many more predictions for 998 prokaryotic genomes are freely available at http://gorbi.irb.hr/. More importantly, we show a proof of principle that our functional annotation model can be used to generate new biological hypotheses: we performed experiments on 38 E. coli knockout mutants and showed that our annotation model provides realistic estimates of predictive accuracy. With this, our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Reorganizing the protein space at the Universal Protein Resource (UniProt)

            The mission of UniProt is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces. UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. A key development at UniProt is the provision of complete, reference and representative proteomes. UniProt is updated and distributed every 4 weeks and can be accessed online for searches or download at http://www.uniprot.org.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Distinguishing homologous from analogous proteins.

                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                January 2013
                January 2013
                3 January 2013
                7 January 2013
                : 9
                : 1
                : e1002852
                Affiliations
                [1 ]ETH Zurich, Computer Science, Zurich, Switzerland
                [2 ]Swiss Institute of Bioinformatics, Zurich, Switzerland
                [3 ]Ruđer Bošković Institute, Division of Electronics, Zagreb, Croatia
                [4 ]Mediterranean Institute for Life Sciences, Split, Croatia
                [5 ]Jozef Stefan Institute, Department of Knowledge Technologies, Ljubljana, Slovenia
                [6 ]Jozef Stefan International Postgraduate School, Ljubljana, Slovenia
                [7 ]Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Ljubljana, Slovenia
                University of Chicago, United States of America
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: NŠ SDž TS FS. Performed the experiments: NŠ PP FS. Analyzed the data: NŠ MB PP TS FS. Contributed reagents/materials/analysis tools: NŠ AK PP SDž FS. Wrote the paper: NŠ FS. Performed the experiments on Escherichia coli: AK Designed and implemented the Web site with the results: NŠ MB FS.

                Article
                PCOMPBIOL-D-12-01145
                10.1371/journal.pcbi.1002852
                3536626
                23308060
                ac8f3bcd-0fb3-4fa9-b1c0-fa4bd8b4c044
                Copyright @ 2013

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 16 July 2012
                : 5 November 2012
                Page count
                Pages: 14
                Funding
                This work was supported by the Croatian Ministry of Science, Education and Sport [098-0982560-2563, 098-0000000-3168, iProjekt 2008-058]. SD and PP were supported by the Slovenian Research Agency (Grants P2-0103 and J2-2285), the European Commission (Grants ICT-2010-266722 and ICT-2011-287713), and Operation no. OP13.1.1.2.02.0005 financed by the European Regional Development Fund (85%) and the Slovenian Ministry of Higher Education, Science and Technology (15%). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Computational Biology
                Genomics
                Comparative Genomics
                Functional Genomics
                Bio-Ontologies
                Genetics
                Gene Function
                Genomics
                Comparative Genomics
                Computer Science
                Computer Applications
                Web-Based Applications

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article