A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes.

Results

We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters.

Conclusions

These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: found

Is Open Access

De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification

Rohini Garg, Ravi K. Patel, Akhilesh K. Tyagi … (2011)

Chickpea ranks third among the food legume crops production in the world. However, the genomic resources available for chickpea are still very limited. In the present study, the transcriptome of chickpea was sequenced with short reads on Illumina Genome Analyzer platform. We have assessed the effect of sequence quality, various assembly parameters and assembly programs on the final assembly output. We assembled ∼107million high-quality trimmed reads using Velvet followed by Oases with optimal parameters into a non-redundant set of 53 409 transcripts (≥100 bp), representing about 28 Mb of unique transcriptome sequence. The average length of transcripts was 523 bp and N50 length of 900 bp with coverage of 25.7 rpkm (reads per kilobase per million). At the protein level, a total of 45 636 (85.5%) chickpea transcripts showed significant similarity with unigenes/predicted proteins from other legumes or sequenced plant genomes. Functional categorization revealed the conservation of genes involved in various biological processes in chickpea. In addition, we identified simple sequence repeat motifs in transcripts. The chickpea transcripts set generated here provides a resource for gene discovery and development of functional molecular markers. In addition, the strategy for de novo assembly of transcriptome data presented here will be helpful in other similar transcriptome studies.

0 comments Cited 200 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

Qiong-Yi Zhao, Yi Wang, Yi-Meng Kong … (2011)

Background With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data. Results To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies. Conclusions Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.

0 comments Cited 200 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

Cheng-Ying Shi, Hua Yang, Chao-Ling Wei … (2011)

Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.

0 comments Cited 189 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Warren R Francis

Lynne M Christianson

Rainer Kiko

Meghan L Powers

Nathan C Shaner

Steven H D Haddock

Journal

Journal ID (nlm-ta): BMC Genomics

Journal ID (iso-abbrev): BMC Genomics

Title: BMC Genomics

Publisher: BioMed Central

ISSN (Electronic): 1471-2164

Publication date Collection: 2013

Publication date (Electronic): 12 March 2013

Volume: 14

Page: 167

Affiliations

[1 ]Monterey Bay Aquarium Research Institute, , 7700 Sandholdt Rd, Moss Landing, CA 95039, USA

[2 ]Department of Ocean Sciences, University of California Santa Cruz, Santa Cruz, CA, USA

[3 ]Helmholtz Center for Ocean Research Kiel, GEOMAR, , Hohenbergstr. 2, Kiel, 24105, Germany

[4 ]The Scintillon Institute, , 9924 Mesa Rim Rd., San Diego, CA, 92121, USA

Article

Publisher ID: 1471-2164-14-167

DOI: 10.1186/1471-2164-14-167

PMC ID: 3655071

PubMed ID: 23496952

SO-VID: 4f6777e3-5100-4a4a-8a00-5fd087cf2e6c

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Ctenophores

Most cited references 17

De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

Author and article information

Contributors

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 43

Cited by 42

Most referenced authors 1,095