12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The exponential growth of protein structural and sequence databases is enabling multifaceted approaches to understanding the long sought sequence-structure-function relationship. Advances in computation now make it possible to apply well-established data mining and pattern recognition techniques to these data to learn models that effectively relate structure and function. However, extracting meaningful numerical descriptors of protein sequence and structure is a key issue that requires an efficient and widely available solution.

          Results

          We here introduce ProtDCal, a new computational software suite capable of generating tens of thousands of features considering both sequence-based and 3D-structural descriptors. We demonstrate, by means of principle component analysis and Shannon entropy tests, how ProtDCal’s sequence-based descriptors provide new and more relevant information not encoded by currently available servers for sequence-based protein feature generation. The wide diversity of the 3D-structure-based features generated by ProtDCal is shown to provide additional complementary information and effectively completes its general protein encoding capability. As demonstration of the utility of ProtDCal’s features, prediction models of N-linked glycosylation sites are trained and evaluated. Classification performance compares favourably with that of contemporary predictors of N-linked glycosylation sites, in spite of not using domain-specific features as input information.

          Conclusions

          ProtDCal provides a friendly and cross-platform graphical user interface, developed in the Java programming language and is freely available at: http://bioinf.sce.carleton.ca/ProtDCal/. ProtDCal introduces local and group-based encoding which enhances the diversity of the information captured by the computed features. Furthermore, we have shown that adding structure-based descriptors contributes non-redundant additional information to the features-based characterization of polypeptide systems. This software is intended to provide a useful tool for general-purpose encoding of protein sequences and structures for applications is protein classification, similarity analyses and function prediction.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s12859-015-0586-0) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references43

          • Record: found
          • Abstract: found
          • Article: not found

          Activities at the Universal Protein Resource (UniProt)

          The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequences and functional annotation. It integrates, interprets and standardizes data from literature and numerous resources to achieve the most comprehensive catalog possible of protein information. The central activities are the biocuration of the UniProt Knowledgebase and the dissemination of these data through our Web site and web services. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics

            The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo-and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Mechanisms and principles of N-linked protein glycosylation.

              N-linked glycosylation, a protein modification system present in all domains of life, is characterized by a high structural diversity of N-linked glycans found among different species and by a large number of proteins that are glycosylated. Based on structural, functional, and phylogenetic approaches, this review discusses the highly conserved processes that are at the basis of this unique general protein modification system. Copyright © 2011 Elsevier Ltd. All rights reserved.
                Bookmark

                Author and article information

                Contributors
                yasserrb@uclv.edu.cu
                waldopaz@uclv.cu
                jrgreen@sce.carleton.ca
                ymarrero77@yahoo.es
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                16 May 2015
                16 May 2015
                2015
                : 16
                : 162
                Affiliations
                [ ]Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia, Universidad Central “Marta Abreu” de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP: 54830 Villa Clara Cuba
                [ ]Department of Systems and Computer Engineering, Carleton University, Ottawa, ON Canada
                [ ]Centre of Informatics Studies (CEI), Universidad Central “Marta Abreu” de Las Villas, Road to Camajuani km 5 ½, Santa Clara, CP:54830 Villa Clara Cuba
                [ ]Grupo de Investigación Microbiología y Ambiente (GIMA). Programa de Bacteriología, Facultad Ciencias de la Salud, Universidad de San Buenaventura, Calle Real de Ternera, Cartagena (Bolivar), Colombia
                Article
                586
                10.1186/s12859-015-0586-0
                4432771
                25982853
                81ce35f8-e4b5-4b84-abbe-142ed9fe8564
                © Ruiz-Blanco et al.; licensee BioMed Central. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 15 January 2015
                : 22 April 2015
                Categories
                Software
                Custom metadata
                © The Author(s) 2015

                Bioinformatics & Computational biology
                protdcal,protein feature generation,protein descriptors,data mining,protein function modelling

                Comments

                Comment on this article