4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine learning-guided discovery and design of non-hemolytic peptides

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Reducing hurdles to clinical trials without compromising the therapeutic promises of peptide candidates becomes an essential step in peptide-based drug design. Machine-learning models are cost-effective and time-saving strategies used to predict biological activities from primary sequences. Their limitations lie in the diversity of peptide sequences and biological information within these models. Additional outlier detection methods are needed to set the boundaries for reliable predictions; the applicability domain. Antimicrobial peptides (AMPs) constitute an extensive library of peptides offering promising avenues against antibiotic-resistant infections. Most AMPs present in clinical trials are administrated topically due to their hemolytic toxicity. Here we developed machine learning models and outlier detection methods that ensure robust predictions for the discovery of AMPs and the design of novel peptides with reduced hemolytic activity. Our best models, gradient boosting classifiers, predicted the hemolytic nature from any peptide sequence with 95–97% accuracy. Nearly 70% of AMPs were predicted as hemolytic peptides. Applying multivariate outlier detection models, we found that 273 AMPs (~ 9%) could not be predicted reliably. Our combined approach led to the discovery of 34 high-confidence non-hemolytic natural AMPs, the de novo design of 507 non-hemolytic peptides, and the guidelines for non-hemolytic peptide design.

          Related collections

          Most cited references43

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          DBS: a fast and informative segmentation algorithm for DNA copy number analysis

          Background Genome-wide DNA copy number changes are the hallmark events in the initiation and progression of cancers. Quantitative analysis of somatic copy number alterations (CNAs) has broad applications in cancer research. With the increasing capacity of high-throughput sequencing technologies, fast and efficient segmentation algorithms are required when characterizing high density CNAs data. Results A fast and informative segmentation algorithm, DBS (Deviation Binary Segmentation), is developed and discussed. The DBS method is based on the least absolute error principles and is inspired by the segmentation method rooted in the circular binary segmentation procedure. DBS uses point-by-point model calculation to ensure the accuracy of segmentation and combines a binary search algorithm with heuristics derived from the Central Limit Theorem. The DBS algorithm is very efficient requiring a computational complexity of O(n*log n), and is faster than its predecessors. Moreover, DBS measures the change-point amplitude of mean values of two adjacent segments at a breakpoint, where the significant degree of change-point amplitude is determined by the weighted average deviation at breakpoints. Accordingly, using the constructed binary tree of significant degree, DBS informs whether the results of segmentation are over- or under-segmented. Conclusion DBS is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Multicollinearity

            Aylin Alin (2010)
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              TLR-independent control of innate immunity in Caenorhabditis elegans by the TIR domain adaptor protein TIR-1, an ortholog of human SARM.

              Both plants and animals respond to infection by synthesizing compounds that directly inhibit or kill invading pathogens. We report here the identification of infection-inducible antimicrobial peptides in Caenorhabditis elegans. Expression of two of these peptides, NLP-29 and NLP-31, was differentially regulated by fungal and bacterial infection and was controlled in part by tir-1, which encodes an ortholog of SARM, a Toll-interleukin 1 receptor (TIR) domain protein. Inactivation of tir-1 by RNA interference caused increased susceptibility to infection. We identify protein partners for TIR-1 and show that the small GTPase Rab1 and the f subunit of ATP synthase participate specifically in the control of antimicrobial peptide gene expression. As the activity of tir-1 was independent of the single nematode Toll-like receptor, TIR-1 may represent a component of a previously uncharacterized, but conserved, innate immune signaling pathway.
                Bookmark

                Author and article information

                Contributors
                fabien.plisson@cinvestav.mx
                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group UK (London )
                2045-2322
                6 October 2020
                6 October 2020
                2020
                : 10
                : 16581
                Affiliations
                [1 ]GRID grid.418275.d, ISNI 0000 0001 2165 8782, CONACYT, Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), , Centro de Investigación Y de Estudios Avanzados del IPN, ; 36824 Irapuato, Guanajuato Mexico
                [2 ]GRID grid.418275.d, ISNI 0000 0001 2165 8782, Unidad de Genómica Avanzada, Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), , Centro de Investigación Y de Estudios Avanzados del IPN, ; 36824 Irapuato, Guanajuato Mexico
                Article
                73644
                10.1038/s41598-020-73644-6
                7538962
                33024236
                cb379214-d9b9-491d-a9cc-4fa123a49d38
                © The Author(s) 2020

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 19 July 2020
                : 18 September 2020
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100003141, Consejo Nacional de Ciencia y Tecnología;
                Funded by: Cátedras fellowship
                Funded by: Ciencia de Frontera Fc-2016-2604
                Funded by: National postgraduate scholarship
                Categories
                Article
                Custom metadata
                © The Author(s) 2020

                Uncategorized
                protein sequence analyses,cheminformatics,peptides,data acquisition,data mining,data processing,machine learning,protein design,statistical methods

                Comments

                Comment on this article