28
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity

      research-article
      1 , 1 , 1 ,
      BMC Bioinformatics
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within.

          Results

          Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions.

          Conclusions

          Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.

          Related collections

          Most cited references52

          • Record: found
          • Abstract: found
          • Article: not found

          Pfam: clans, web tools and services

          Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (), the USA (), France () and Sweden ().
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.

            A new algorithm is reported which builds an alignment between two protein structures. The algorithm involves a combinatorial extension (CE) of an alignment path defined by aligned fragment pairs (AFPs) rather than the more conventional techniques using dynamic programming and Monte Carlo optimization. AFPs, as the name suggests, are pairs of fragments, one from each protein, which confer structure similarity. AFPs are based on local geometry, rather than global features such as orientation of secondary structures and overall topology. Combinations of AFPs that represent possible continuous alignment paths are selectively extended or discarded thereby leading to a single optimal alignment. The algorithm is fast and accurate in finding an optimal structure alignment and hence suitable for database scanning and detailed analysis of large protein families. The method has been tested and compared with results from Dali and VAST using a representative sample of similar structures. Several new structural similarities not detected by these other methods are reported. Specific one-on-one alignments and searches against all structures as found in the Protein Data Bank (PDB) can be performed via the Web at http://cl.sdsc.edu/ce.html.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              CATH--a hierarchic classification of protein domain structures.

              Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structure-based classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures. We present a semi-automatic procedure for deriving a novel hierarchical classification of protein domain structures (CATH). The four main levels of our classification are protein class (C), architecture (A), topology (T) and homologous superfamily (H). Class is the simplest level, and it essentially describes the secondary structure composition of each domain. In contrast, architecture summarises the shape revealed by the orientations of the secondary structure units, such as barrels and sandwiches. At the topology level, sequential connectivity is considered, such that members of the same architecture might have quite different topologies. When structures belonging to the same T-level have suitably high similarities combined with similar functions, the proteins are assumed to be evolutionarily related and put into the same homologous superfamily. Analysis of the structural families generated by CATH reveals the prominent features of protein structure space. We find that nearly a third of the homologous superfamilies (H-levels) belong to ten major T-levels, which we call superfolds, and furthermore that nearly two-thirds of these H-levels cluster into nine simple architectures. A database of well-characterised protein structure families, such as CATH, will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2010
                26 March 2010
                : 11
                : 155
                Affiliations
                [1 ]Bioinformatics Institute, A*STAR, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
                Article
                1471-2105-11-155
                10.1186/1471-2105-11-155
                3098053
                20338066
                97471e86-a442-4ba9-8c43-5f43239c74d5
                Copyright ©2010 Zhang et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 27 October 2009
                : 26 March 2010
                Categories
                Research Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article