1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      INGA 2.0: improving protein function prediction for the dark proteome

      research-article
      1 , 1 , 2
      Nucleic Acids Research
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Our current knowledge of complex biological systems is stored in a computable form through the Gene Ontology (GO) which provides a comprehensive description of genes function. Prediction of GO terms from the sequence remains, however, a challenging task, which is particularly critical for novel genomes. Here we present INGA 2.0, a new version of the INGA software for protein function prediction. INGA exploits homology, domain architecture, interaction networks and information from the ‘dark proteome’, like transmembrane and intrinsically disordered regions, to generate a consensus prediction. INGA was ranked in the top ten methods on both CAFA2 and CAFA3 blind tests. The new algorithm can process entire genomes in a few hours or even less when additional input files are provided. The new interface provides a better user experience by integrating filters and widgets to explore the graph structure of the predicted terms. The INGA web server, databases and benchmarking are available from URL: https://inga.bio.unipd.it/.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          An expanded evaluation of protein function prediction methods shows an improvement in accuracy

          Background A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. Results We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. Conclusions The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1037-6) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            More than the sum of their parts: on the evolution of proteins from peptides.

            Despite their seemingly endless diversity, proteins adopt a limited number of structural forms. It has been estimated that 80% of proteins will be found to adopt one of only about 400 folds, most of which are already known. These folds are largely formed by a limited 'vocabulary' of recurring supersecondary structure elements, often by repetition of the same element and, increasingly, elements similar in both structure and sequence are discovered. This suggests that modern proteins evolved by fusion and recombination from a more ancient peptide world and that many of the core folds observed today may contain homologous building blocks. The peptides forming these building blocks would not in themselves have had the ability to fold, but would have emerged as cofactors supporting RNA-based replication and catalysis (the 'RNA world'). Their association into larger structures and eventual fusion into polypeptide chains would have allowed them to become independent of their RNA scaffold, leading to the evolution of a novel type of macromolecule: the folded protein. Copyright 2003 Wiley Periodicals, Inc.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Unexpected features of the dark proteome.

              We surveyed the "dark" proteome-that is, regions of proteins never observed by experimental structure determination and inaccessible to homology modeling. For 546,000 Swiss-Prot proteins, we found that 44-54% of the proteome in eukaryotes and viruses was dark, compared with only ∼14% in archaea and bacteria. Surprisingly, most of the dark proteome could not be accounted for by conventional explanations, such as intrinsic disorder or transmembrane regions. Nearly half of the dark proteome comprised dark proteins, in which the entire sequence lacked similarity to any known structure. Dark proteins fulfill a wide variety of functions, but a subset showed distinct and largely unexpected features, such as association with secretion, specific tissues, the endoplasmic reticulum, disulfide bonding, and proteolytic cleavage. Dark proteins also had short sequence length, low evolutionary reuse, and few known interactions with other proteins. These results suggest new research directions in structural and computational biology.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                02 July 2019
                10 May 2019
                10 May 2019
                : 47
                : W1
                : W373-W378
                Affiliations
                [1 ]Department of Biomedical Sciences, University of Padua, Padua, Italy
                [2 ]CNR Institute of Neuroscience, Padua, Italy
                Author notes
                To whom correspondence should be addressed. Tel: +39 498276269; Fax: +39 498276260; Email: damiano.piovesan@ 123456unipd.it
                Author information
                http://orcid.org/0000-0001-8210-2390
                http://orcid.org/0000-0003-4525-7793
                Article
                gkz375
                10.1093/nar/gkz375
                6602455
                31073595
                0975705a-f4dc-4276-bc50-2e4369e78432
                © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@ 123456oup.com

                History
                : 30 April 2019
                : 29 April 2019
                : 13 February 2019
                Page count
                Pages: 6
                Funding
                Funded by: Marie Skłodowska-Curie
                Award ID: 778247
                Categories
                Web Server Issue

                Genetics
                Genetics

                Comments

                Comment on this article