Chromosome‐level genome assembly of <i>Prunella vulgaris</i> L. provides insights into pentacyclic triterpenoid biosynthesis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

SUMMARY

Prunella vulgaris is one of the bestselling and widely used medicinal herbs. It is recorded as an ace medicine for cleansing and protecting the liver in Chinese Pharmacopoeia and has been used as the main constitutions of many herbal tea formulas in China for centuries. It is also a traditional folk medicine in Europe and other countries of Asia. Pentacyclic triterpenoids are a major class of bioactive compounds produced in P. vulgaris. However, their biosynthetic mechanism remains to be elucidated. Here, we report a chromosome‐level reference genome of P. vulgaris using an approach combining Illumina, ONT, and Hi‐C technologies. It is 671.95 Mb in size with a scaffold N50 of 49.10 Mb and a complete BUSCO of 98.45%. About 98.31% of the sequence was anchored into 14 pseudochromosomes. Comparative genome analysis revealed a recent WGD in P. vulgaris. Genome‐wide analysis identified 35 932 protein‐coding genes (PCGs), of which 59 encode enzymes involved in 2,3‐oxidosqualene biosynthesis. In addition, 10 PvOSC, 358 PvCYP, and 177 PvUGT genes were identified, of which five PvOSCs, 25 PvCYPs, and 9 PvUGTs were predicted to be involved in the biosynthesis of pentacyclic triterpenoids. Biochemical activity assay of PvOSC2, PvOSC4, and PvOSC6 recombinant proteins showed that they were mixed amyrin synthase (MAS), lupeol synthase (LUS), and β‐amyrin synthase (BAS), respectively. The results provide a solid foundation for further elucidating the biosynthetic mechanism of pentacyclic triterpenoids in P. vulgaris.

Significance Statement

The first chromosome‐level reference genome of Prunella vulgaris and the Prunella genus was reported. Fifty‐nine 2,3‐oxidosqualene biosynthesis‐related genes and 39 pentacyclic triterpenoid biosynthesis‐related PvOSCs, PvCYPs, and PvUGTs genes were systematically analyzed. PvOSC2, PvOSC4, and PvOSC6 were experimentally verified to be mixed amyrin synthase, lupeol synthase, and β‐amyrin synthase, respectively. The results provide a solid foundation for further elucidating the biosynthetic mechanism of bioactive compounds in the widely used medicinal plant, P. vulgaris.

Related collections

Most cited references 130

Record: found
Abstract: found
Article: found

Is Open Access

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li, Richard Durbin (2009)

Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk

0 comments Cited 12733 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

Lam-Tung Nguyen, Heiko Schmidt, Arndt von Haeseler … (2014)

Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%.

0 comments Cited 8351 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

HISAT: a fast spliced aligner with low memory requirements.

Daehwan Kim, Ben Langmead, Steven L Salzberg (2019)

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

0 comments Cited 7231 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Shanfa Lu: (View ORCID Profile)

Journal

Title: The Plant Journal

Abbreviated Title: The Plant Journal

Publisher: Wiley

ISSN (Print): 0960-7412

ISSN (Electronic): 1365-313X

Publication date Created: May 2024

Publication date (Electronic): January 16 2024

Publication date (Print): May 2024

Volume: 118

Issue: 3

Pages: 731-752

Affiliations

[1 ] Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College Beijing 100193 China

[2 ] Engineering Research Center of Chinese Medicine Resource, Ministry of Education Beijing 100193 China

Article

DOI: 10.1111/tpj.16629

SO-VID: 1a205e7d-a408-4eb3-b392-7289cb96d188

License:

http://onlinelibrary.wiley.com/termsAndConditions#vor

History

Data availability:

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.