75
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      tRNA Signatures Reveal a Polyphyletic Origin of SAR11 Strains among Alphaproteobacteria

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Molecular phylogenetics and phylogenomics are subject to noise from horizontal gene transfer (HGT) and bias from convergence in macromolecular compositions. Extensive variation in size, structure and base composition of alphaproteobacterial genomes has complicated their phylogenomics, sparking controversy over the origins and closest relatives of the SAR11 strains. SAR11 are highly abundant, cosmopolitan aquatic Alphaproteobacteria with streamlined, A+T-biased genomes. A dominant view holds that SAR11 are monophyletic and related to both Rickettsiales and the ancestor of mitochondria. Other studies dispute this, finding evidence of a polyphyletic origin of SAR11 with most strains distantly related to Rickettsiales. Although careful evolutionary modeling can reduce bias and noise in phylogenomic inference, entirely different approaches may be useful to extract robust phylogenetic signals from genomes. Here we develop simple phyloclassifiers from bioinformatically derived tRNA Class-Informative Features (CIFs), features predicted to target tRNAs for specific interactions within the tRNA interaction network. Our tRNA CIF-based model robustly and accurately classifies alphaproteobacterial genomes into one of seven undisputed monophyletic orders or families, despite great variability in tRNA gene complement sizes and base compositions. Our model robustly rejects monophyly of SAR11, classifying all but one strain as Rhizobiales with strong statistical support. Yet remarkably, conventional phylogenetic analysis of tRNAs classifies all SAR11 strains identically as Rickettsiales. We attribute this discrepancy to convergence of SAR11 and Rickettsiales tRNA base compositions. Thus, tRNA CIFs appear more robust to compositional convergence than tRNA sequences generally. Our results suggest that tRNA-CIF-based phyloclassification is robust to HGT of components of the tRNA interaction network, such as aminoacyl-tRNA synthetases. We explain why tRNAs are especially advantageous for prediction of traits governing macromolecular interactions from genomic data, and why such traits may be advantageous in the search for robust signals to address difficult problems in classification and phylogeny.

          Author Summary

          If gene products work well in the networks of foreign cells, their genes may transfer horizontally between unrelated genomes. What factors dictate the ability to integrate into foreign networks? Different RNAs and proteins must interact specifically in order to function well as a system. For example, tRNA functions are determined by the interactions they have with other macromolecules. We have developed ways to predict, from genomic data alone, how tRNAs distinguish themselves to their specific interaction partners. Here, as proof of concept, we built a robust computational model from these bioinformatic predictions in seven lineages of Alphaproteobacteria. We validated our model by classifying hundreds of diverse alphaproteobacterial taxa and tested it on eight strains of SAR11, a phylogenetically controversial group that is highly abundant in the world's oceans. We found that different strains of SAR11 are more distantly related, both to each other and to mitochondria, than widely believed. We explain conflicting results about SAR11 as an artifact of bias created by the variability in base contents of alphaproteobacterial genomes. While this bias affects tRNAs too, our classifier appears unexpectedly robust to it. More broadly, our results suggest that traits governing macromolecular interactions may be more faithfully vertically inherited than the macromolecules themselves.

          Related collections

          Most cited references67

          • Record: found
          • Abstract: found
          • Article: not found

          The Bioperl toolkit: Perl modules for the life sciences.

          The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data

            Next-generation sequencing techniques, and PhyloChip, have made simultaneous phylogenetic analyses of hundreds of microbial communities possible. Insight into community structure has been limited by the inability to integrate and visualize such vast datasets. Fast UniFrac overcomes these issues, allowing integration of larger numbers of sequences and samples into a single analysis. Its new array-based implementation offers orders of magnitude improvements over the original version. New 3D visualization of principal coordinates analysis (PCoA) results, with the option to view multiple coordinate axes simultaneously, provides a powerful way to quickly identify patterns that relate vast numbers of microbial communities. We demonstrate the potential of Fast UniFrac using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens. We show that a Fast UniFrac analysis using a reference tree recaptures patterns that could not be detected without considering phylogenetic relationships and that Fast UniFrac, coupled with BLAST-based sequence assignment, can be used to quickly analyze pyrosequencing runs containing hundreds of thousands of sequences, revealing patterns relating human and gut samples. Finally, we show that the application of Fast UniFrac to PhyloChip data could identify well-defined subcategories associated with infection. Together, these case studies point the way towards a broad range of applications and demonstrate some of the new features of Fast UniFrac.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Genome streamlining in a cosmopolitan oceanic bacterium.

              The SAR11 clade consists of very small, heterotrophic marine alpha-proteobacteria that are found throughout the oceans, where they account for about 25% of all microbial cells. Pelagibacter ubique, the first cultured member of this clade, has the smallest genome and encodes the smallest number of predicted open reading frames known for a free-living microorganism. In contrast to parasitic bacteria and archaea with small genomes, P. ubique has complete biosynthetic pathways for all 20 amino acids and all but a few cofactors. P. ubique has no pseudogenes, introns, transposons, extrachromosomal elements, or inteins; few paralogs; and the shortest intergenic spacers yet observed for any cell.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                February 2014
                27 February 2014
                : 10
                : 2
                : e1003454
                Affiliations
                [1]Program in Quantitative and Systems Biology, University of California, Merced, Merced, California, United States of America
                The Centre for Research and Technology, Hellas, Greece
                Author notes

                The authors have declared that no competing interests exist.

                Conceived and designed the experiments: KCHA WDS DHA. Performed the experiments: KCHA WDS DHA. Analyzed the data: KCHA WDS DHA. Contributed reagents/materials/analysis tools: KCHA WDS DHA. Wrote the paper: KCHA WDS DHA.

                [¤]

                Current address: Department of Biological Sciences, Northern Illinois University, DeKalb, Illinois, United States of America.

                Article
                PCOMPBIOL-D-13-00752
                10.1371/journal.pcbi.1003454
                3937112
                24586126
                faeff251-01d2-4360-b3d1-32bebfb86ce5
                Copyright @ 2014

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 1 May 2013
                : 10 December 2013
                Page count
                Pages: 13
                Funding
                This work was supported by UC Merced. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology
                Computational biology
                Genomics
                Comparative genomics
                Macromolecular structure analysis
                Macromolecular complex analysis
                Sequence analysis
                Systems biology
                Evolutionary biology
                Organismal evolution
                Microbial evolution
                Comparative genomics
                Genomics
                Genome evolution
                Microbiology
                Bacteriology
                Bacterial evolution
                Microbial evolution
                Systems biology

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article