3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      PRGminer: harnessing deep learning for the prediction of resistance genes involved in plant defense mechanisms

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Plant resistance genes are crucial in plant defense systems against a variety of diseases and pests. These plant-specific genes encode proteins that identify particular molecular patterns associated with pathogens invading the plants. When these resistance genes are active, they initiate a sequence of molecular processes that culminate in the activation of defensive responses such as the synthesis of antimicrobial chemicals, cell wall strengthening, and triggering of programmed cell death in infected cells. Plant resistance genes are exceedingly varied, with several classes and subclasses found across a wide range of plant species. The identification of new resistance genes (Rgenes) is a critical component of disease resistance breeding. Nonetheless, identifying Rgenes in wild species and near relatives of plants is not only challenging but also time-consuming. In this study, we present PRGminer, a deep learning-based high-throughput Rgenes prediction tool. PRGminer is implemented in two phases: Phase I predicts the input protein sequences as Rgenes or non-Rgenes; and Phase II classify the Rgenes predicted in Phase I into eight different classes. Among all the sequence representations tested, the dipeptide composition gave the best prediction performance (accuracy of 98.75% in a k-fold training/testing procedure, and 95.72% on an independent testing) with a high Matthews correlation coefficient (0.98 training and 0.91 in independent testing) in Phase I; phase II (overall accuracy of 97.55% in a k-fold training/testing and 97.21% in an independent testing) with the MCC values of 0.93 for k-fold training procedure and 0.92 in an independent testing. PRGminer is available as a webserver which can be freely accessed at https://kaabil.net/prgminer/, as well as a standalone tool available for download at https://github.com/usubioinfo/PRGminer. PRGminer will help researchers to accelerate the discovery of new R genes, understand the genetic basis of plant resistance, and develop new strategies for breeding plants that are resistant to disease and pests.

          Related collections

          Most cited references59

          • Record: found
          • Abstract: found
          • Article: not found

          Basic local alignment search tool.

          A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent mathematical results on the stochastic properties of MSP scores allow an analysis of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a number of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the analysis of multiple regions of similarity in long DNA sequences. In addition to its flexibility and tractability to mathematical analysis, BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            CD-HIT: accelerated for clustering the next-generation sequencing data

            Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions. Availability: http://cd-hit.org. Contact: liwz@sdsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
              • Record: found
              • Abstract: found
              • Article: not found

              Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

              In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.

                Author and article information

                Contributors
                URI : https://loop.frontiersin.org/people/1508108/overviewRole: Role: Role: Role: Role:
                URI : https://loop.frontiersin.org/people/493406/overviewRole: Role: Role: Role: Role: Role:
                Journal
                Front Plant Sci
                Front Plant Sci
                Front. Plant Sci.
                Frontiers in Plant Science
                Frontiers Media S.A.
                1664-462X
                03 June 2025
                2025
                : 16
                : 1411525
                Affiliations
                [1] 1 Bioinformatics Facility, Center for Integrated BioSystems, Utah State University , Logan, UT, United States
                [2] 2 Department of Plants, Soils, and Climate, College of Agriculture and Applied Science, Utah State University , Logan, UT, United States
                Author notes

                Edited by: Dinesh Pandey, G. B. Pant University of Agriculture and Technology, India

                Reviewed by: Xupo Ding, Guangxi Minzu University, China

                Shu Wang, Southwest Forestry University, China

                *Correspondence: Rakesh Kaundal, rkaundal@ 123456usu.edu
                Article
                10.3389/fpls.2025.1411525
                12170542
                40530297
                6a84390e-f7b3-41ad-a63c-a44a886b9eca
                Copyright © 2025 Duhan and Kaundal

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 03 April 2024
                : 24 April 2025
                Page count
                Figures: 9, Tables: 4, Equations: 12, References: 61, Pages: 15, Words: 7502
                Funding
                The author(s) declare that financial support was received for the research and/or publication of this article. The authors acknowledge the support to this study from faculty start-up funds to RK from the Center for Integrated BioSystems/Department of Plants, Soils, and Climate, USU. This research was also supported by the Utah Agricultural Experiment Station (UAES), USU, and approved as journal paper number 9889. The funding body did not play any roles in the design of this study and collection, analysis, and interpretation of data and in writing of this manuscript. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
                Categories
                Plant Science
                Original Research
                Custom metadata
                Plant Systems and Synthetic Biology

                Plant science & Botany
                plants,resistance genes,rgenes,deep learning,cnn,defense mechanism
                Plant science & Botany
                plants, resistance genes, rgenes, deep learning, cnn, defense mechanism

                Comments

                Comment on this article

                Related Documents Log