14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Microbial natural products constitute a wide variety of chemical compounds, many which can have antibiotic, antiviral, or anticancer properties that make them interesting for clinical purposes. Natural product classes include polyketides (PKs), nonribosomal peptides (NRPs), and ribosomally synthesized and post-translationally modified peptides (RiPPs). While variants of biosynthetic gene clusters (BGCs) for known classes of natural products are easy to identify in genome sequences, BGCs for new compound classes escape attention. In particular, evidence is accumulating that for RiPPs, subclasses known thus far may only represent the tip of an iceberg. Here, we present decRiPPter (Data-driven Exploratory Class-independent RiPP TrackER), a RiPP genome mining algorithm aimed at the discovery of novel RiPP classes. DecRiPPter combines a Support Vector Machine (SVM) that identifies candidate RiPP precursors with pan-genomic analyses to identify which of these are encoded within operon-like structures that are part of the accessory genome of a genus. Subsequently, it prioritizes such regions based on the presence of new enzymology and based on patterns of gene cluster and precursor peptide conservation across species. We then applied decRiPPter to mine 1,295 Streptomyces genomes, which led to the identification of 42 new candidate RiPP families that could not be found by existing programs. One of these was studied further and elucidated as a representative of a novel subfamily of lanthipeptides, which we designate class V. The 2D structure of the new RiPP, which we name pristinin A3 ( 1), was solved using nuclear magnetic resonance (NMR), tandem mass spectrometry (MS/MS) data, and chemical labeling. Two previously unidentified modifying enzymes are proposed to create the hallmark lanthionine bridges. Taken together, our work highlights how novel natural product families can be discovered by methods going beyond sequence similarity searches to integrate multiple pathway discovery criteria.

          Abstract

          This study shows that decRiPPter, an innovative algorithmic approach using pan-genomics and machine learning, can discover novel types of ribosomally synthesized peptide (RIPP) natural products, including a new class of lanthipeptides.

          Related collections

          Most cited references110

          • Record: found
          • Abstract: found
          • Article: not found

          MUSCLE: multiple sequence alignment with high accuracy and high throughput.

          We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

            Genomics has revolutionized biological research, but quality assessment of the resulting assembled sequences is complicated and remains mostly limited to technical measures like N50.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Fast and sensitive protein alignment using DIAMOND.

              The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.
                Bookmark

                Author and article information

                Contributors
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ResourcesRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ResourcesRole: SoftwareRole: ValidationRole: Writing – review & editing
                Role: Formal analysisRole: InvestigationRole: Methodology
                Role: Formal analysisRole: Investigation
                Role: Formal analysisRole: InvestigationRole: MethodologyRole: Software
                Role: ConceptualizationRole: InvestigationRole: Methodology
                Role: ConceptualizationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Supervision
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: Project administrationRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Academic Editor
                Journal
                PLoS Biol
                PLoS Biol
                plos
                plosbiol
                PLoS Biology
                Public Library of Science (San Francisco, CA USA )
                1544-9173
                1545-7885
                22 December 2020
                December 2020
                22 December 2020
                : 18
                : 12
                : e3001026
                Affiliations
                [1 ] Institute of Biology, Leiden University, the Netherlands
                [2 ] Verily Life Sciences, South San Francisco, CA, United States of America
                [3 ] DOE Joint Genome Institute, Walnut Creek, CA, United States of America
                [4 ] Department of Molecular Biology, Princeton University, NJ, United States of America
                [5 ] Department of Bioengineering, Stanford University, CA, United States of America
                [6 ] Netherlands Institute for Ecology (NIOO-KNAW), Wageningen, the Netherlands
                [7 ] Bioinformatics group, Wageningen University, the Netherlands
                Universitat zu Koln, GERMANY
                Author notes

                I have read the journal's policy and the authors of this manuscript have the following competing interests: P.C. is currently an employee of Verily Life Sciences. M.H. is currently an employee of LifeMine Therapeutics. M.S.D. is a member of the Scientific Advisory Board of DeepBiome Therapeutics. M.A.F. is a cofounder and director of Federation Bio. M.H.M. is on the scientific advisory board of Hexagon Bio and co-founder of Design Pharmaceuticals.

                [¤]

                Current address: LifeMine Therapeutics, Cambridge, Massachusetts, United States of America

                Author information
                https://orcid.org/0000-0002-2802-7649
                https://orcid.org/0000-0003-3837-6137
                https://orcid.org/0000-0003-3447-5293
                https://orcid.org/0000-0002-9604-2912
                https://orcid.org/0000-0003-0341-1561
                https://orcid.org/0000-0002-2191-2821
                Article
                PBIOLOGY-D-20-01457
                10.1371/journal.pbio.3001026
                7794033
                33351797
                f01344a8-7a0f-4e23-83f0-070b80ee10af
                © 2020 Kloosterman et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 19 May 2020
                : 7 December 2020
                Page count
                Figures: 4, Tables: 4, Pages: 38
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/501100003246, Nederlandse Organisatie voor Wetenschappelijk Onderzoek;
                Award ID: 731.014.206
                Award Recipient :
                The work of AK was funded by a grant to GPvW from the Netherlands Organization for Scientific Research (NWO), project nr 731.014.206. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Methods and Resources
                Biology and Life Sciences
                Genetics
                Genomics
                Biology and Life Sciences
                Biochemistry
                Enzymology
                Enzyme Precursors
                Biology and Life Sciences
                Microbiology
                Medical Microbiology
                Microbial Pathogens
                Fungal Pathogens
                Streptomyces
                Medicine and Health Sciences
                Pathology and Laboratory Medicine
                Pathogens
                Microbial Pathogens
                Fungal Pathogens
                Streptomyces
                Biology and Life Sciences
                Mycology
                Fungal Pathogens
                Streptomyces
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Domains
                Physical Sciences
                Chemistry
                Chemical Compounds
                Organic Compounds
                Amino Acids
                Sulfur Containing Amino Acids
                Cysteine
                Physical Sciences
                Chemistry
                Organic Chemistry
                Organic Compounds
                Amino Acids
                Sulfur Containing Amino Acids
                Cysteine
                Biology and Life Sciences
                Biochemistry
                Proteins
                Amino Acids
                Sulfur Containing Amino Acids
                Cysteine
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Gene Prediction
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Gene Prediction
                Medicine and Health Sciences
                Clinical Medicine
                Signs and Symptoms
                Dehydration (Medicine)
                Custom metadata
                vor-update-to-uncorrected-proof
                2021-01-08
                The source code of decRiPPter is freely available online at https://github.com/Alexamk/decRiPPter. Results of the data analysis are available online at https://decrippter.bioinformatics.nl. All training data and code used to generate these, as well as outputs of the data analyses, are available on Zenodo at doi: 10.5281/zenodo.3834818. NMR data will be made available on http://www.np-mrd.org/ when this database opens up for submissions.

                Life sciences
                Life sciences

                Comments

                Comment on this article