0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identification and Analysis of Natural Building Blocks for Evolution-Guided Fragment-Based Protein Design

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Natural evolution has generated an impressively diverse protein universe via duplication and recombination from a set of protein fragments that served as building blocks. The application of these concepts to the design of new proteins using subdomain-sized fragments from different folds has proven to be experimentally successful. To better understand how evolution has shaped our protein universe, we performed an all-against-all comparison of protein domains representing all naturally existing folds and identified conserved homologous protein fragments. Overall, we found more than 1000 protein fragments of various lengths among different folds through similarity network analysis. These fragments are present in very different protein environments and represent versatile building blocks for protein design. These data are available in our web server called F(old P)uzzle ( fuzzle.uni-bayreuth.de), which allows to individually filter the dataset and create customized networks for folds of interest. We believe that our results serve as an invaluable resource for structural and evolutionary biologists and as raw material for the design of custom-made proteins.

          Graphical abstract

          Highlights

          • Nature has created an impressive diversity of proteins via the recombination, replication and differentiation of a set of protein fragments that act as building blocks.

          • We have performed an all-against-all search comparison of proteins in structural databases via sensitive homology detection methods and identified fragments that appear across the protein universe in very different environments.

          • We have identified more than a 1000 sub-domain sized fragments that Nature has reused to create new proteins and that can be used as an innovative route for protein design.

          • The results are publicly available in our Fuzzle database at fuzzle.uni-bayreuth.de.

          Related collections

          Most cited references80

          • Record: found
          • Abstract: found
          • Article: not found

          Protein homology detection by HMM-HMM comparison.

          Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The coming of age of de novo protein design.

            There are 20(200) possible amino-acid sequences for a 200-residue protein, of which the natural evolutionary process has sampled only an infinitesimal subset. De novo protein design explores the full sequence space, guided by the physical principles that underlie protein folding. Computational methodology has advanced to the point that a wide range of structures can be designed from scratch with atomic-level accuracy. Almost all protein engineering so far has involved the modification of naturally occurring proteins; it should now be possible to design new functional proteins from the ground up to tackle current challenges in biomedicine and nanotechnology.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Database of homology-derived protein structures and the structural meaning of sequence alignment.

              The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.
                Bookmark

                Author and article information

                Contributors
                Journal
                J Mol Biol
                J. Mol. Biol
                Journal of Molecular Biology
                Elsevier
                0022-2836
                1089-8638
                12 June 2020
                12 June 2020
                : 432
                : 13
                : 3898-3914
                Affiliations
                [1 ]Department of Biochemistry, University of Bayreuth, Bayreuth, Germany
                [2 ]Max Planck Institute for Developmental Biology, Tübingen, Germany
                [3 ]Computational Biochemistry, University of Bayreuth, Bayreuth, Germany
                Author notes
                [* ]Corresponding authors at: Department of Biochemistry, University of Bayreuth, Bayreuth, Germany. steffen.schmidt@ 123456uni-bayreuth.de birte.hoecker@ 123456uni-bayreuth.de
                [4]

                Current address: D. Lemm, MARVEL, Department of Chemistry, University of Basel, Switzerland.

                [5]

                Current address: J.A. Farías-Rico, Systems Biology and Synthetic Biology Laboratory, Center for Genomic Sciences, UNAM, Cuernavaca, Mexico.

                Article
                S0022-2836(20)30300-4
                10.1016/j.jmb.2020.04.013
                7322520
                32330481
                8ae4b0df-ed25-407e-9e5f-c979db62c521
                © 2020 The Authors

                This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

                History
                : 23 December 2019
                : 12 April 2020
                : 13 April 2020
                Categories
                Article

                Molecular biology
                hmm, hidden markov model,hisf, imidazole glycerol phosphate synthase,tpr, tetratricopeptide repeat,mit, microtubule interacting and trafficking,protein design,evolution,protein recombination,protein fragments

                Comments

                Comment on this article