69
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Exploring the limits of fold discrimination by structural alignment: A large scale benchmark using decoys of known fold

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Graphical abstract

          Highlights

          ► Structure alignment methods are used to assign proteins to fold groups. ► The accuracy of this procedure is difficult to test as the definition of folds can be debated. ► By defining folds topologically and building decoys with defined fold the accuracy of fold assignments can be tested. ► Protein structure alignments decoys of different fold can be assigned a high significance, leading to errors in fold classification. ► This observation can be extended to comparisons between decoy models and real protein structures.

          Abstract

          Protein structure comparison by pairwise alignment is commonly used to identify highly similar substructures in pairs of proteins and provide a measure of structural similarity based on the size and geometric similarity of the match. These scores are routinely applied in analyses of protein fold space under the assumption that high statistical significance is equivalent to a meaningful relationship, however the truth of this assumption has previously been difficult to test since there is a lack of automated methods which do not rely on the same underlying principles. As a resolution to this we present a method based on the use of topological descriptions of global protein structure, providing an independent means to assess the ability of structural alignment to maintain meaningful structural correspondances on a large scale.

          Using a large set of decoys of specified global fold we benchmark three widely used methods for structure comparison, SAP, TM-align and DALI, and test the degree to which this assumption is justified for these methods. Application of a topological edit distance measure to provide a scale of the degree of fold change shows that while there is a broad correlation between high structural alignment scores and low edit distances there remain many pairs of highly significant score which differ by core strand swaps and therefore are structurally different on a global level. Possible causes of this problem and its meaning for present assessments of protein fold space are discussed.

          Related collections

          Most cited references60

          • Record: found
          • Abstract: found
          • Article: not found

          CATH--a hierarchic classification of protein domain structures.

          Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structure-based classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures. We present a semi-automatic procedure for deriving a novel hierarchical classification of protein domain structures (CATH). The four main levels of our classification are protein class (C), architecture (A), topology (T) and homologous superfamily (H). Class is the simplest level, and it essentially describes the secondary structure composition of each domain. In contrast, architecture summarises the shape revealed by the orientations of the secondary structure units, such as barrels and sandwiches. At the topology level, sequential connectivity is considered, such that members of the same architecture might have quite different topologies. When structures belonging to the same T-level have suitably high similarities combined with similar functions, the proteins are assumed to be evolutionarily related and put into the same homologous superfamily. Analysis of the structural families generated by CATH reveals the prominent features of protein structure space. We find that nearly a third of the homologous superfamilies (H-levels) belong to ten major T-levels, which we call superfolds, and furthermore that nearly two-thirds of these H-levels cluster into nine simple architectures. A database of well-characterised protein structure families, such as CATH, will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            How significant is a protein structure similarity with TM-score = 0.5?

            Protein structure similarity is often measured by root mean squared deviation, global distance test score and template modeling score (TM-score). However, the scores themselves cannot provide information on how significant the structural similarity is. Also, it lacks a quantitative relation between the scores and conventional fold classifications. This article aims to answer two questions: (i) what is the statistical significance of TM-score? (ii) What is the probability of two proteins having the same fold given a specific TM-score? We first made an all-to-all gapless structural match on 6684 non-homologous single-domain proteins in the PDB and found that the TM-scores follow an extreme value distribution. The data allow us to assign each TM-score a P-value that measures the chance of two randomly selected proteins obtaining an equal or higher TM-score. With a TM-score at 0.5, for instance, its P-value is 5.5 x 10(-7), which means we need to consider at least 1.8 million random protein pairs to acquire a TM-score of no less than 0.5. Second, we examine the posterior probability of the same fold proteins from three datasets SCOP, CATH and the consensus of SCOP and CATH. It is found that the posterior probability from different datasets has a similar rapid phase transition around TM-score=0.5. This finding indicates that TM-score can be used as an approximate but quantitative criterion for protein topology classification, i.e. protein pairs with a TM-score >0.5 are mostly in the same fold while those with a TM-score <0.5 are mainly not in the same fold.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Fold change in evolution of protein structures.

              Typically, protein spatial structures are more conserved in evolution than amino acid sequences. However, the recent explosion of sequence and structure information accompanied by the development of powerful computational methods led to the accumulation of examples of homologous proteins with globally distinct structures. Significant sequence conservation, local structural resemblance, and functional similarity strongly indicate evolutionary relationships between these proteins despite pronounced structural differences at the fold level. Several mechanisms such as insertions/deletions/substitutions, circular permutations, and rearrangements in beta-sheet topologies account for the majority of detected structural irregularities. The existence of evolutionarily related proteins that possess different folds brings new challenges to the homology modeling techniques and the structure classification strategies and offers new opportunities for protein design in experimental studies. Copyright 2001 Academic Press.
                Bookmark

                Author and article information

                Journal
                Comput Biol Chem
                Comput Biol Chem
                Computational Biology and Chemistry
                Elsevier
                1476-9271
                1476-928X
                June 2011
                June 2011
                : 35
                : 3
                : 174-188
                Affiliations
                [a ]Department of Informatics, University of Bergen, Bergen, Norway
                [b ]Division of Mathematical Biology, MRC National Institute for Medical Research, London NW71AA, UK
                [c ]Computational Biology Unit, University of Bergen, Bergen, Norway
                Author notes
                [* ]Corresponding author. Tel.: +44 0 208 816 2587. msadows@ 123456nimr.mrc.ac.uk
                Article
                CBAC6173
                10.1016/j.compbiolchem.2011.04.008
                3145973
                21704264
                35e6b15c-b01a-49de-a4db-4a77850b2eed
                © 2011 Elsevier Ltd.

                This document may be redistributed and reused, subject to certain conditions.

                History
                : 21 January 2011
                : 23 April 2011
                Categories
                Research Article

                Computational chemistry & Modeling
                protein structure alignment,dali,tm-align,protein structure comparison,sap,decoy model,protein fold,tm-score

                Comments

                Comment on this article