Transcriptome walking: a laboratory-oriented GUI-based approach to mRNA identification from deep-sequenced data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Deep sequencing technology provides efficient and economical production of large numbers of randomly positioned, relatively short, estimates of base identities in DNA molecules. Application of this technology to mRNA samples allows rapid examination of the molecular genetic environment in individual cells or tissues, the transcriptome. However, assembly of such short sequences into complete mRNA creates a challenge that limits the usefulness of the technology, particularly when no, or limited, genomic data is available. Several approaches to this problem have been developed, but there is still no general method to rapidly obtain an mRNA sequence from deep sequence data when a specific molecule, or family of molecules, are of interest. A frequent requirement is to identify specific mRNA molecules from tissues that are being investigated by methods such as electrophysiology, immunocytology and pharmacology. To be widely useful, any approach must be relatively simple to use in the laboratory by operators without extensive statistical or bioinformatics knowledge, and with readily available hardware.

Findings

An approach was developed that allows de novo assembly of individual mRNA sequences in two linked stages: sequence discovery and sequence completion. Both stages rely on computer assisted, Graphical User Interface (GUI)-guided, user interaction with the data, but proceed relatively efficiently once discovery is complete. The method grows a discovered sequence by repeated passes through the complete raw data in a series of steps, and is hence termed ‘transcriptome walking’. All of the operations required for transcriptome analysis are combined in one program that presents a relatively simple user interface and runs on a standard desktop, or laptop computer, but takes advantage of multi-core processors, when available. Complete mRNA sequence identifications usually require less than 24 hours. This approach has already identified previously unknown mRNA sequences in two animal species that currently lack any significant genome or transcriptome data.

Conclusions

As deep sequencing data becomes more widely available, accessible methods for extracting useful sequence information in the biological or medical laboratory will be of increasing importance. The approach described here does not rely on detailed knowledge of bioinformatic algorithms, and allows users with basic knowledge of molecular biology and standard laboratory computing equipment, but limited software or bioinformatics experience, to extract complete gene sequences from deep-sequencing data.

Related collections

Most cited references 7

Record: found
Abstract: found
Article: not found

Assemblathon 1: a competitive assessment of de novo short read assembly methods.

D. Earl, K. Bradnam, J. St. John … (2011)

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.

0 comments Cited 213 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

Qiong-Yi Zhao, Yi Wang, Yi-Meng Kong … (2011)

Background With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. De novo assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data. Results To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including k-mer values, genome complexity, coverage depth, directional reads, etc. Seven program conditions, four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large k-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies. Conclusions Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting de novo assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. De novo assembly of C. sinensis transcriptome was greatly improved using some optimized methods.

0 comments Cited 200 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

De novo fragment assembly with short mate-paired reads: Does the read length matter?

Pavel A Pevzner, Mark Chaisson, Dumitru Brinza (2009)

Increasing read length is currently viewed as the crucial condition for fragment assembly with next-generation sequencing technologies. However, introducing mate-paired reads (separated by a gap of length, GapLength) opens a possibility to transform short mate-pairs into long mate-reads of length approximately GapLength, and thus raises the question as to whether the read length (as opposed to GapLength) even matters. We describe a new tool, EULER-USR, for assembling mate-paired short reads and use it to analyze the question of whether the read length matters. We further complement the ongoing experimental efforts to maximize read length by a new computational approach for increasing the effective read length. While the common practice is to trim the error-prone tails of the reads, we present an approach that substitutes trimming with error correction using repeat graphs. An important and counterintuitive implication of this result is that one may extend sequencing reactions that degrade with length "past their prime" to where the error rate grows above what is normally acceptable for fragment assembly.

0 comments Cited 80 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Res Notes

Journal ID (iso-abbrev): BMC Res Notes

Title: BMC Research Notes

Publisher: BioMed Central

ISSN (Electronic): 1756-0500

Publication date Collection: 2012

Publication date (Electronic): 5 December 2012

Volume: 5

Page: 673

Affiliations

[1 ]Department of Physiology and Biophysics, Dalhousie University, PO BOX 15000, Halifax, NS B3H 4R2, Canada

Article

Publisher ID: 1756-0500-5-673

DOI: 10.1186/1756-0500-5-673

PMC ID: 3538525

PubMed ID: 23217191

SO-VID: a363bc97-fe0f-45fb-ae13-fd30df92c6bf

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Transcriptome walking: a laboratory-oriented GUI-based approach to mRNA identification from deep-sequenced data

Read this article at

Abstract

Background

Findings

Conclusions

Related collections

Sex and gender-sensitive medicine

Most cited references 7

Assemblathon 1: a competitive assessment of de novo short read assembly methods.

Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study

De novo fragment assembly with short mate-paired reads: Does the read length matter?

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 18

Cited by 7

Most referenced authors 592