15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A functional comparative genome analysis is essential to understand the mechanisms underlying bacterial evolution and adaptation. Detection of functional orthologs using standard global sequence similarity methods faces several problems; the need for defining arbitrary acceptance thresholds for similarity and alignment length, lateral gene acquisition and the high computational cost for finding bi-directional best matches at a large scale.

          We investigated the use of protein domain architectures for large scale functional comparative analysis as an alternative method. The performance of both approaches was assessed through functional comparison of 446 bacterial genomes sampled at different taxonomic levels.

          We show that protein domain architectures provide a fast and efficient alternative to methods based on sequence similarity to identify groups of functionally equivalent proteins within and across taxonomic bounderies. As the computational cost scales linearly, and not quadratically with the number of genomes, it is suitable for large scale comparative analysis. Running both methods in parallel pinpoints potential functional adaptations that may add to bacterial fitness.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          Profile hidden Markov models.

          S. Eddy (1998)
          The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

            The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The TIGRFAMs database of protein families.

              TIGRFAMs is a collection of manually curated protein families consisting of hidden Markov models (HMMs), multiple sequence alignments, commentary, Gene Ontology (GO) assignments, literature references and pointers to related TIGRFAMs, Pfam and InterPro models. These models are designed to support both automated and manually curated annotation of genomes. TIGRFAMs contains models of full-length proteins and shorter regions at the levels of superfamilies, subfamilies and equivalogs, where equivalogs are sets of homologous proteins conserved with respect to function since their last common ancestor. The scope of each model is set by raising or lowering cutoff scores and choosing members of the seed alignment to group proteins sharing specific function (equivalog) or more general properties. The overall goal is to provide information with maximum utility for the annotation process. TIGRFAMs is thus complementary to Pfam, whose models typically achieve broad coverage across distant homologs but end at the boundaries of conserved structural domains. The database currently contains over 1600 protein families. TIGRFAMs is available for searching or downloading at www.tigr.org/TIGRFAMs.
                Bookmark

                Author and article information

                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000Research
                F1000Research (London, UK )
                2046-1402
                15 August 2016
                2016
                : 5
                : 1987
                Affiliations
                [1 ]Laboratory of Systems and Synthetic Biology, Wageningen University and Research, Stippeneng, Netherlands
                [2 ]LifeGlimmer, GmBH, Berlin, Germany
                [1 ]Department of Biomedical Engineering, The University of Iowa, Iowa City, IA, USA
                [1 ]European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
                [1 ]Department of Chemistry "Ugo Schiff", University of Florence, Sesto Fiorentino, Italy
                Author notes

                JJK, MSD, ES, PJS participated in the set-up of the research. JJK and MSD were responsible for the analysis. JJK, ES, PJS, MSD and VdS wrote the manuscript. All authors critically revised the manuscript.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Article
                10.12688/f1000research.9416.1
                5031134
                35dbf735-3361-4074-9cda-e3c752e66dea
                Copyright: © 2016 Koehorst JJ et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 12 August 2016
                Funding
                Funded by: European Union’s Horizon 2020
                Award ID: 635536
                This work was partly supported by the European Union’s Horizon 2020 research and innovation programme (EmPowerPutida, Contract No. 635536, granted to Vitor A P Martins dos Santos).
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Articles
                Genomics
                Microbial Evolution & Genomics

                bacterial genomics,bacterial functionome,orthology,horizontal gene transfer,clustering,semantic annotation

                Comments

                Comment on this article