125
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      InterPro in 2017—beyond protein family and domain annotations

      research-article
      1 , * , 2 , 3 , 1 , 4 , 5 , 1 , 6 , 1 , 1 , 7 , 8 , 3 , 9 , 10 , 11 , 1 , 12 , 12 , 10 , 1 , 13 , 14 , 1 , 15 , 1 , 1 , 14 , 1 , 1 , 5 , 1 , 5 , 1 , 5 , 15 , 7 , 1 , 8 , 12 , 10 , 14 , 16 , 9 , 5 , 13 , 1 , 1
      Nucleic Acids Research
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          InterPro ( http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.

          Related collections

          Most cited references22

          • Record: found
          • Abstract: found
          • Article: not found

          Gene Ontology: tool for the unification of biology

          Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            SMART: recent updates, new developments and status in 2015

            SMART (Simple Modular Architecture Research Tool) is a web resource (http://smart.embl.de/) providing simple identification and extensive annotation of protein domains and the exploration of protein domain architectures. In the current version, SMART contains manually curated models for more than 1200 protein domains, with ∼200 new models since our last update article. The underlying protein databases were synchronized with UniProt, Ensembl and STRING, bringing the total number of annotated domains and other protein features above 100 million. SMART's ‘Genomic’ mode, which annotates proteins from completely sequenced genomes was greatly expanded and now includes 2031 species, compared to 1133 in the previous release. SMART analysis results pages have been completely redesigned and include links to several new information sources. A new, vector-based display engine has been developed for protein schematics in SMART, which can also be exported as high-resolution bitmap images for easy inclusion into other documents. Taxonomic tree displays in SMART have been significantly improved, and can be easily navigated using the integrated search engine.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins.

              The structural stability of a protein requires a large number of interresidue interactions. The energetic contribution of these can be approximated by low-resolution force fields extracted from known structures, based on observed amino acid pairing frequencies. The summation of such energies, however, cannot be carried out for proteins whose structure is not known or for intrinsically unstructured proteins. To overcome these limitations, we present a novel method for estimating the total pairwise interaction energy, based on a quadratic form in the amino acid composition of the protein. This approach is validated by the good correlation of the estimated and actual energies of proteins of known structure and by a clear separation of folded and disordered proteins in the energy space it defines. As the novel algorithm has not been trained on unstructured proteins, it substantiates the concept of protein disorder, i.e. that the inability to form a well-defined 3D structure is an intrinsic property of many proteins and protein domains. This property is encoded in their sequence, because their biased amino acid composition does not allow sufficient stabilizing interactions to form. By limiting the calculation to a predefined sequential neighborhood, the algorithm was turned into a position-specific scoring scheme that characterizes the tendency of a given amino acid to fall into an ordered or disordered region. This application we term IUPred and compare its performance with three generally accepted predictors, PONDR VL3H, DISOPRED2 and GlobPlot on a database of disordered proteins.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                04 January 2017
                28 November 2016
                28 November 2016
                : 45
                : Database issue , Database issue
                : D190-D199
                Affiliations
                [1 ]European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
                [2 ]School of Computer Science, University of Manchester, UK
                [3 ]Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
                [4 ]European Molecular Biology Laboratory, Biocomputing, Meyerhofstasse 1, 69117 Heidelberg, Germany
                [5 ]Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
                [6 ]MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Pázmány Péter sétány 1/c, Budapest, Hungary
                [7 ]Computer Science department, University of Bristol, Woodland Road, Bristol BS8 1UB, UK
                [8 ]Bioinformatics Department, J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
                [9 ]Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA
                [10 ]Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
                [11 ]Biobyte Solutions GmbH, Bothestr. 142, 69126 Heidelberg, Germany
                [12 ]National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
                [13 ]Georgetown University Medical Center, 3300 Whitehaven St, NW, Washington, DC 20007, USA
                [14 ]Department of Biomedical Sciences and CRIBI Biotech Center, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
                [15 ]Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
                [16 ]CNR Institute of Neuroscience, via U. Bassi 58/b, 35131 Padua, Italy
                Author notes
                [* ]To whom correspondence should be addressed. Tel: +44 1223 492 679; Fax: +44 1223 494 46; Email: rdf@ 123456ebi.ac.uk
                Author information
                http://orcid.org/0000-0002-6982-4660
                http://orcid.org/0000-0002-6731-6398
                http://orcid.org/0000-0001-9508-8065
                http://orcid.org/0000-0003-4525-7793
                Article
                10.1093/nar/gkw1107
                5210578
                27899635
                0ff74bac-4f1d-4b57-a700-d8fd58b24502
                © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 27 October 2016
                : 24 October 2016
                Page count
                Pages: 10
                Categories
                Database Issue
                Custom metadata
                04 January 2017

                Genetics
                Genetics

                Comments

                Comment on this article