134
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RNA-Seq Assembly – Are We There Yet?

      review-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Transcriptomic sequence resources represent invaluable assets for research, in particular for non-model species without a sequenced genome. To date, the Next Generation Sequencing technologies 454/Roche and Illumina have been used to generate transcriptome sequence databases by mRNA-Seq for more than fifty different plant species. While some of the databases were successfully used for downstream applications, such as proteomics, the assembly parameters indicate that the assemblies do not yet accurately reflect the actual plant transcriptomes. Two different assembly strategies have been used, overlap consensus based assemblers for long reads and Eulerian path/de Bruijn graph assembler for short reads. In this review, we discuss the challenges and solutions to the transcriptome assembly problem. A list of quality control parameters and the necessary scripts to produce them are provided.

          Related collections

          Most cited references60

          • Record: found
          • Abstract: found
          • Article: not found

          TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets.

          TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            BLAT--the BLAST-like alignment tool.

            W. Kent (2002)
            Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT's speed stems from an index of all nonoverlapping K-mers in the genome. This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly. BLAT has several major stages. It uses the index to find regions in the genome likely to be homologous to the query sequence. It performs an alignment between homologous regions. It stitches together these aligned regions (often exons) into larger alignments (typically genes). Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible. This paper describes how BLAT was optimized. Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches. BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications. http://genome.ucsc.edu hosts a web-based BLAT server for the human genome.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

              Background Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. Results With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. Conclusion In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.
                Bookmark

                Author and article information

                Journal
                Front Plant Sci
                Front Plant Sci
                Front. Plant Sci.
                Frontiers in Plant Science
                Frontiers Research Foundation
                1664-462X
                25 September 2012
                2012
                : 3
                : 220
                Affiliations
                [1] 1simpleCenter of Excellence on Plant Sciences (CEPLAS), Institute for Plant Biochemistry, Heinrich Heine University Düsseldorf, Germany
                [2] 2simpleCenter of Excellence on Plant Sciences (CEPLAS), Institute for Plant Developmental and Molecular Biology, Heinrich Heine University Düsseldorf, Germany
                Author notes

                Edited by: Bjoern Usadel, Rheinisch-Westfaelische Technische Hochschule Aachen University, Germany

                Reviewed by: Jose M. Jimenez-Gomez, Max Planck Institute for Plant Breeding, Germany; Marc Lohse, Max Planck Institute of Molecular Plant Physiology, Germany

                *Correspondence: Andrea Bräutigam, Institute for Plant Biochemistry, 26.03.01.Room 32, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany. e-mail: andrea.braeutigam@ 123456uni-duesseldorf.de

                This article was submitted to Frontiers in Plant Systems Biology, a specialty of Frontiers in Plant Science.

                Article
                10.3389/fpls.2012.00220
                3457010
                23056003
                041f8707-d958-4341-8fec-f0166cec6870
                Copyright © 2012 Schliesky, Gowik, Weber and Bräutigam.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and subject to any copyright notices concerning any third-party graphics etc.

                History
                : 06 August 2012
                : 05 September 2012
                Page count
                Figures: 2, Tables: 2, Equations: 0, References: 74, Pages: 12, Words: 10797
                Categories
                Plant Science
                Review Article

                Plant science & Botany
                plant,assembly,next generation sequencing,ngs,rna-seq,transcriptome
                Plant science & Botany
                plant, assembly, next generation sequencing, ngs, rna-seq, transcriptome

                Comments

                Comment on this article