5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb.tools/map4/coconut_tmap/), and a support vector machine (SVM) trained with MAP4 correctly assigned the origin for 94% of plant, 89% of fungal, and 89% of bacterial NPs in this subset. An online tool based on an SVM trained with the entire subset correctly assigned the origin of further NPs with similar performance ( https://np-svm-map4.gdb.tools/). Origin information might be useful when searching for biosynthetic genes of NPs isolated from plants but produced by endophytic microorganisms.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s13321-021-00559-3.

          Related collections

          Most cited references66

          • Record: found
          • Abstract: found
          • Article: not found

          Natural Products as Sources of New Drugs over the Nearly Four Decades from 01/1981 to 09/2019

          This review is an updated and expanded version of the five prior reviews that were published in this journal in 1997, 2003, 2007, 2012, and 2016. For all approved therapeutic agents, the time frame has been extended to cover the almost 39 years from the first of January 1981 to the 30th of September 2019 for all diseases worldwide and from ∼1946 (earliest so far identified) to the 30th of September 2019 for all approved antitumor drugs worldwide. As in earlier reviews, only the first approval of any drug is counted, irrespective of how many "biosimilars" or added approvals were subsequently identified. As in the 2012 and 2016 reviews, we have continued to utilize our secondary subdivision of a "natural product mimic", or "NM", to join the original primary divisions, and the designation "natural product botanical", or "NB", to cover those botanical "defined mixtures" now recognized as drug entities by the FDA (and similar organizations). From the data presented in this review, the utilization of natural products and/or synthetic variations using their novel structures, in order to discover and develop the final drug entity, is still alive and well. For example, in the area of cancer, over the time frame from 1946 to 1980, of the 75 small molecules, 40, or 53.3%, are N or ND. In the 1981 to date time frame the equivalent figures for the N* compounds of the 185 small molecules are 62, or 33.5%, though to these can be added the 58 S* and S*/NMs, bringing the figure to 64.9%. In other areas, the influence of natural product structures is quite marked with, as expected from prior information, the anti-infective area being dependent on natural products and their structures, though as can be seen in the review there are still disease areas (shown in Table 2) for which there are no drugs derived from natural products. Although combinatorial chemistry techniques have succeeded as methods of optimizing structures and have been used very successfully in the optimization of many recently approved agents, we are still able to identify only two de novo combinatorial compounds (one of which is a little speculative) approved as drugs in this 39-year time frame, though there is also one drug that was developed using the "fragment-binding methodology" and approved in 2012. We have also added a discussion of candidate drug entities currently in clinical trials as "warheads" and some very interesting preliminary reports on sources of novel antibiotics from Nature due to the absolute requirement for new agents to combat plasmid-borne resistance genes now in the general populace. We continue to draw the attention of readers to the recognition that a significant number of natural product drugs/leads are actually produced by microbes and/or microbial interactions with the "host from whence it was isolated"; thus we consider that this area of natural product research should be expanded significantly.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Hidden World within Plants: Ecological and Evolutionary Considerations for Defining Functioning of Microbial Endophytes.

            All plants are inhabited internally by diverse microbial communities comprising bacterial, archaeal, fungal, and protistic taxa. These microorganisms showing endophytic lifestyles play crucial roles in plant development, growth, fitness, and diversification. The increasing awareness of and information on endophytes provide insight into the complexity of the plant microbiome. The nature of plant-endophyte interactions ranges from mutualism to pathogenicity. This depends on a set of abiotic and biotic factors, including the genotypes of plants and microbes, environmental conditions, and the dynamic network of interactions within the plant biome. In this review, we address the concept of endophytism, considering the latest insights into evolution, plant ecosystem functioning, and multipartite interactions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              ClassyFire: automated chemical classification with a comprehensive, computable taxonomy

              Background Scientists have long been driven by the desire to describe, organize, classify, and compare objects using taxonomies and/or ontologies. In contrast to biology, geology, and many other scientific disciplines, the world of chemistry still lacks a standardized chemical ontology or taxonomy. Several attempts at chemical classification have been made; but they have mostly been limited to either manual, or semi-automated proof-of-principle applications. This is regrettable as comprehensive chemical classification and description tools could not only improve our understanding of chemistry but also improve the linkage between chemistry and many other fields. For instance, the chemical classification of a compound could help predict its metabolic fate in humans, its druggability or potential hazards associated with it, among others. However, the sheer number (tens of millions of compounds) and complexity of chemical structures is such that any manual classification effort would prove to be near impossible. Results We have developed a comprehensive, flexible, and computable, purely structure-based chemical taxonomy (ChemOnt), along with a computer program (ClassyFire) that uses only chemical structures and structural features to automatically assign all known chemical compounds to a taxonomy consisting of >4800 different categories. This new chemical taxonomy consists of up to 11 different levels (Kingdom, SuperClass, Class, SubClass, etc.) with each of the categories defined by unambiguous, computable structural rules. Furthermore each category is named using a consensus-based nomenclature and described (in English) based on the characteristic common structural properties of the compounds it contains. The ClassyFire webserver is freely accessible at http://classyfire.wishartlab.com/. Moreover, a Ruby API version is available at https://bitbucket.org/wishartlab/classyfire_api, which provides programmatic access to the ClassyFire server and database. ClassyFire has been used to annotate over 77 million compounds and has already been integrated into other software packages to automatically generate textual descriptions for, and/or infer biological properties of over 100,000 compounds. Additional examples and applications are provided in this paper. Conclusion ClassyFire, in combination with ChemOnt (ClassyFire’s comprehensive chemical taxonomy), now allows chemists and cheminformaticians to perform large-scale, rapid and automated chemical classification. Moreover, a freely accessible API allows easy access to more than 77 million “ClassyFire” classified compounds. The results can be used to help annotate well studied, as well as lesser-known compounds. In addition, these chemical classifications can be used as input for data integration, and many other cheminformatics-related tasks. Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0174-y) contains supplementary material, which is available to authorized users.
                Bookmark

                Author and article information

                Contributors
                jean-louis.reymond@dcb.unibe.ch
                Journal
                J Cheminform
                J Cheminform
                Journal of Cheminformatics
                Springer International Publishing (Cham )
                1758-2946
                18 October 2021
                18 October 2021
                2021
                : 13
                : 82
                Affiliations
                GRID grid.5734.5, ISNI 0000 0001 0726 5157, 1 Department of Chemistry, Biochemistry and Pharmaceutical Sciences, , University of Bern, ; Freiestrasse 3, 3012 Bern, Switzerland
                Author information
                http://orcid.org/0000-0003-2724-2942
                Article
                559
                10.1186/s13321-021-00559-3
                8524952
                34663470
                502bb8f2-9f9f-4777-88ec-9a6f6d5b464b
                © The Author(s) 2021

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 25 June 2021
                : 2 October 2021
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100001711, schweizerischer nationalfonds zur förderung der wissenschaftlichen forschung;
                Award ID: 200020_178998
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100010663, h2020 european research council;
                Award ID: 885076
                Award Recipient :
                Categories
                Research Article
                Custom metadata
                © The Author(s) 2021

                Chemoinformatics
                natural products,cheminformatics,chemical space,visualization,molecular fingerprints,machine learning,support vector machine

                Comments

                Comment on this article