52
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power.

          By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins.

          As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.

          Related collections

          Most cited references53

          • Record: found
          • Abstract: found
          • Article: not found

          PISCES: a protein sequence culling server.

          PISCES is a public server for culling sets of protein sequences from the Protein Data Bank (PDB) by sequence identity and structural quality criteria. PISCES can provide lists culled from the entire PDB or from lists of PDB entries or chains provided by the user. The sequence identities are obtained from PSI-BLAST alignments with position-specific substitution matrices derived from the non-redundant protein sequence database. PISCES therefore provides better lists than servers that use BLAST, which is unable to identify many relationships below 40% sequence identity and often overestimates sequence identity by aligning only well-conserved fragments. PDB sequences are updated weekly. PISCES can also cull non-PDB sequences provided by the user as a list of GenBank identifiers, a FASTA format file, or BLAST/PSI-BLAST output.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Recent progress in protein subcellular location prediction.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              PISCES: recent improvements to a PDB sequence culling server

              PISCES is a database server for producing lists of sequences from the Protein Data Bank (PDB) using a number of entry- and chain-specific criteria and mutual sequence identity. Our goal in culling the PDB is to provide the longest list possible of the highest resolution structures that fulfill the sequence identity and structural quality cut-offs. The new PISCES server uses a combination of PSI-BLAST and structure-based alignments to determine sequence identities. Structure alignment produces more complete alignments and therefore more accurate sequence identities than PSI-BLAST. PISCES now allows a user to cull the PDB by-entry in addition to the standard culling by individual chains. In this scenario, a list will contain only entries that do not have a chain that has a sequence identity to any chain in any other entry in the list over the sequence identity cut-off. PISCES also provides fully annotated sequences including gene name and species. The server allows a user to cull an input list of entries or chains, so that other criteria, such as function, can be used. Results from a search on the re-engineered RCSB's site for the PDB can be entered into the PISCES server by a single click, combining the powerful searching abilities of the PDB with PISCES's utilities for sequence culling. The server's data are updated weekly. The server is available at .
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, USA )
                1932-6203
                2011
                15 September 2011
                : 6
                : 9
                : e24756
                Affiliations
                [1 ]Information Science and Technology School, Donghua University, Shanghai, China
                [2 ]Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, China
                [3 ]Gordon Life Science Institute, San Diego, California, United States of America
                University of South Florida College of Medicine, United States of America
                Author notes

                Conceived and designed the experiments: WZL XX KCC. Performed the experiments: WZL JAF. Analyzed the data: WZL KCC. Contributed reagents/materials/analysis tools: XX. Wrote the paper: WZL KCC.

                Article
                PONE-D-11-14058
                10.1371/journal.pone.0024756
                3174210
                21935457
                e8d07b4d-127a-4458-bb2e-1fb0a7daca89
                Lin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 24 July 2011
                : 16 August 2011
                Page count
                Pages: 7
                Categories
                Research Article
                Biology
                Biochemistry
                Proteins
                DNA-binding proteins
                Computational Biology
                Genomics
                Genome Analysis Tools
                Genetics
                Genomics
                Genome Analysis Tools
                Proteomics
                Computer Science
                Computer Applications
                Web-Based Applications
                Computer Modeling

                Uncategorized
                Uncategorized

                Comments

                Comment on this article