32
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life

      research-article
      a , 1 , 2 , 3 , 4 , 5 , 6 , b , 1 , 2 , 3
      Scientific Reports
      Nature Publishing Group

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.

          Related collections

          Most cited references23

          • Record: found
          • Abstract: found
          • Article: not found

          TM-align: a protein structure alignment algorithm based on the TM-score

          We have developed TM-align, a new algorithm to identify the best structural alignment between protein pairs that combines the TM-score rotation matrix and Dynamic Programming (DP). The algorithm is ∼4 times faster than CE and 20 times faster than DALI and SAL. On average, the resulting structure alignments have higher accuracy and coverage than those provided by these most often-used methods. TM-align is applied to an all-against-all structure comparison of 10 515 representative protein chains from the Protein Data Bank (PDB) with a sequence identity cutoff <95%: 1996 distinct folds are found when a TM-score threshold of 0.5 is used. We also use TM-align to match the models predicted by TASSER for solved non-homologous proteins in PDB. For both folded and misfolded models, TM-align can almost always find close structural analogs, with an average root mean square deviation, RMSD, of 3 Å and 87% alignment coverage. Nevertheless, there exists a significant correlation between the correctness of the predicted structure and the structural similarity of the model to the other proteins in the PDB. This correlation could be used to assist in model selection in blind protein structure predictions. The TM-align program is freely downloadable at .
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Protein disorder prediction: implications for structural proteomics.

            A great challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Disordered regions in proteins often contain short linear peptide motifs (e.g., SH3 ligands and targeting signals) that are important for protein function. We present here DisEMBL, a computational tool for prediction of disordered/unstructured regions within a protein sequence. As no clear definition of disorder exists, we have developed parameters based on several alternative definitions and introduced a new one based on the concept of "hot loops," i.e., coils with high temperature factors. Avoiding potentially disordered segments in protein expression constructs can increase expression, foldability, and stability of the expressed protein. DisEMBL is thus useful for target selection and the design of constructs as needed for many biochemical studies, particularly structural biology and structural genomics projects. The tool is freely available via a web interface (http://dis.embl.de) and can be downloaded for use in large-scale studies.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla.

              BD1-5, OP11, and OD1 bacteria have been widely detected in anaerobic environments, but their metabolisms remain unclear owing to lack of cultivated representatives and minimal genomic sampling. We uncovered metabolic characteristics for members of these phyla, and a new lineage, PER, via cultivation-independent recovery of 49 partial to near-complete genomes from an acetate-amended aquifer. All organisms were nonrespiring anaerobes predicted to ferment. Three augment fermentation with archaeal-like hybrid type II/III ribulose-1,5-bisphosphate carboxylase-oxygenase (RuBisCO) that couples adenosine monophosphate salvage with CO(2) fixation, a pathway not previously described in Bacteria. Members of OD1 reduce sulfur and may pump protons using archaeal-type hydrogenases. For six organisms, the UGA stop codon is translated as tryptophan. All bacteria studied here may play previously unrecognized roles in hydrogen production, sulfur cycling, and fermentation of refractory sedimentary carbon.
                Bookmark

                Author and article information

                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group
                2045-2322
                05 October 2015
                2015
                : 5
                : 14717
                Affiliations
                [1 ]Laboratoire de Génomique et Biochimie du Métabolisme, Genoscope, Institut de Génomique, Commissariat à l’Energie Atomique et aux Energies Alternatives , Evry, Essonne, 91057, France
                [2 ]UMR 8030 – Génomique Métabolique, Centre National de la Recherche Scientifique , Evry, Essonne, 91057, France
                [3 ]Départment de Biologie, Université d’Evry-Val-d’Essonne , Evry, Essonne, 91000, France
                [4 ]PRES UniverSud Paris , Saint-Aubin, Essonne, 91190, France
                [5 ]Institut de Biologie Computationnelle, LIRMM, CNRS, Université de Montpellier , Montpellier, 34095, France
                [6 ]Centre de Recherche de Biochimie Macromoléculaire , CNRS-UMR 5237, Montpellier, 34293, France
                Author notes
                Article
                srep14717
                10.1038/srep14717
                4592975
                26434770
                b9c1f3d7-999e-46f2-9641-4e9140be6d06
                Copyright © 2015, Macmillan Publishers Limited

                This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

                History
                : 27 May 2015
                : 07 September 2015
                Categories
                Article

                Uncategorized
                Uncategorized

                Comments

                Comment on this article