Blog
About

8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PPero, a Computational Model for Plant PTS1 Type Peroxisomal Protein Prediction

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Well-defined motifs often make it easy to investigate protein function and localization. In plants, peroxisomal proteins are guided to peroxisomes mainly by a conserved type 1 (PTS1) or type 2 (PTS2) targeting signal, and the PTS1 motif is commonly used for peroxisome targeting protein prediction. Currently computational prediction of peroxisome targeted PTS1-type proteins are mostly based on the 3 amino acids PTS1 motif and the adjacent sequence which is less than 14 amino acid residue in length. The potential contribution of the adjacent sequences beyond this short region has never been well investigated in plants. In this work, we develop a bi-profile Bayesian SVM method to extract and learn position-based amino acid features for both PTS1 motifs and their extended adjacent sequences in plants. Our proposed model outperformed other implementations with similar applications and achieved the highest accuracy of 93.6% and 92.6% for Arabidosis and other plant species respectively. A large scale analysis for Arabidopsis, Rice, Maize, Potato, Wheat, and Soybean proteome was conducted using the proposed model and a batch of candidate PTS1 proteins were predicted. The DNA segments corresponding to the C-terminal sequences of 9 selected candidates were cloned and transformed into Arabidopsis for experimental validation, and 5 of them demonstrated peroxisome targeting.

          Related collections

          Most cited references 19

          • Record: found
          • Abstract: found
          • Article: not found

          Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

          In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Arabidopsis information resource: Making and mining the "gold standard" annotated reference plant genome.

            The Arabidopsis Information Resource (TAIR) is a continuously updated, online database of genetic and molecular biology data for the model plant Arabidopsis thaliana that provides a global research community with centralized access to data for over 30,000 Arabidopsis genes. TAIR's biocurators systematically extract, organize, and interconnect experimental data from the literature along with computational predictions, community submissions, and high throughput datasets to present a high quality and comprehensive picture of Arabidopsis gene function. TAIR provides tools for data visualization and analysis, and enables ordering of seed and DNA stocks, protein chips, and other experimental resources. TAIR actively engages with its users who contribute expertise and data that augments the work of the curatorial staff. TAIR's focus in an extensive and evolving ecosystem of online resources for plant biology is on the critically important role of extracting experimentally based research findings from the literature and making that information computationally accessible. In response to the loss of government grant funding, the TAIR team founded a nonprofit entity, Phoenix Bioinformatics, with the aim of developing sustainable funding models for biological databases, using TAIR as a test case. Phoenix has successfully transitioned TAIR to subscription-based funding while still keeping its data relatively open and accessible.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A conserved tripeptide sorts proteins to peroxisomes

              The firefly luciferase protein contains a peroxisomal targeting signal at its extreme COOH terminus (Gould et al., 1987). Site-directed mutagenesis of the luciferase gene reveals that this peroxisomal targeting signal consists of the COOH-terminal three amino acids of the protein, serine-lysine-leucine. When this tripeptide is appended to the COOH terminus of a cytosolic protein (chloramphenicol acetyltransferase), it is sufficient to direct the fusion protein into peroxisomes. Additional mutagenesis experiments reveal that only a limited number of conservative changes can be made in this tripeptide targeting signal without abolishing its activity. These results indicate that peroxisomal protein import, unlike other types of transmembrane translocation, is dependent upon a conserved amino acid sequence.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                3 January 2017
                2017
                : 12
                : 1
                Affiliations
                [1 ]School of Life Sciences and State Key Lab of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong
                [2 ]Department of Medical Genetics, Shenzhen University Health Science Center, Shenzhen, China
                Wuhan Botanical Garden, CHINA
                Author notes

                Competing Interests: The authors declare that they have no competing interests.

                • Methodology: JW YW.

                • Validation: JW CG.

                • Writing – review & editing: DG LJ.

                Article
                PONE-D-16-29512
                10.1371/journal.pone.0168912
                5207514
                28045983
                © 2017 Wang et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                Counts
                Figures: 3, Tables: 2, Pages: 12
                Product
                Funding
                This work is supported by a grant from Shenzhen Science and Technology Committee (grant no. JCYJ20140425184428456), and partially by a grant from Hong Kong Research Grand Council (project no. CUHK3/CRF/11G). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Research and Analysis Methods
                Database and Informatics Methods
                Bioinformatics
                Sequence Analysis
                Sequence Motif Analysis
                Biology and Life Sciences
                Cell Biology
                Cellular Structures and Organelles
                Peroxisomes
                Research and Analysis Methods
                Experimental Organism Systems
                Model Organisms
                Arabidopsis Thaliana
                Research and Analysis Methods
                Model Organisms
                Arabidopsis Thaliana
                Biology and Life Sciences
                Organisms
                Plants
                Brassica
                Arabidopsis Thaliana
                Research and Analysis Methods
                Experimental Organism Systems
                Plant and Algal Models
                Arabidopsis Thaliana
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Protein Sequencing
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Protein Sequencing
                Biology and Life Sciences
                Biochemistry
                Proteins
                Luminescent Proteins
                Yellow Fluorescent Protein
                Research and Analysis Methods
                Experimental Organism Systems
                Model Organisms
                Maize
                Research and Analysis Methods
                Model Organisms
                Maize
                Biology and Life Sciences
                Agriculture
                Crop Science
                Crops
                Cereal Crops
                Maize
                Biology and Life Sciences
                Organisms
                Plants
                Grasses
                Maize
                Research and Analysis Methods
                Experimental Organism Systems
                Plant and Algal Models
                Maize
                Biology and Life Sciences
                Organisms
                Plants
                Solanum
                Potato
                Biology and Life Sciences
                Agriculture
                Crop Science
                Crops
                Vegetables
                Potato
                Biology and Life Sciences
                Organisms
                Plants
                Vegetables
                Potato
                Biology and Life Sciences
                Agriculture
                Crop Science
                Crops
                Cereal Crops
                Wheat
                Biology and Life Sciences
                Organisms
                Plants
                Grasses
                Wheat
                Custom metadata
                Training data and all of prediction results could be found in supplemental files. The source code, binary files of our model and example data could be found and downloaded via GitHub ( https://github.com/WangJueCUHK/PPero2.0). These data also could be found and downloaded on our web server ( http://biocomputer.bio.cuhk.edu.hk/PP).

                Uncategorized

                Comments

                Comment on this article