1,319
views
0
recommends
+1 Recommend
0 collections
    41
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      TopHat: discovering splice junctions with RNA-Seq

      research-article
      1 , * , 2 ,   1
      Bioinformatics
      Oxford University Press

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites.

          Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development.

          Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu

          Contact: cole@ 123456cs.umd.edu

          Supplementary information: Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          Whole-genome sequencing and variant discovery in C. elegans.

          Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the data and to evaluate coverage and representation. Massively parallel sequencing facilitates strain-to-reference comparison for genome-wide sequence variant discovery. Owing to the short-read-length sequences produced, we developed a revised approach to determine the regions of the genome to which short reads could be uniquely mapped. We then aligned Solexa reads from C. elegans strain CB4858 to the reference, and screened for single-nucleotide polymorphisms (SNPs) and small indels. This study demonstrates the utility of massively parallel short read sequencing for whole genome resequencing and for accurate discovery of genome-wide polymorphisms.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            SeqAn An efficient, generic C++ library for sequence analysis

            Background The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. Results To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use. Conclusion We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Replacing suffix trees with enhanced suffix arrays

                Bookmark

                Author and article information

                Journal
                Bioinformatics
                bioinformatics
                bioinfo
                Bioinformatics
                Oxford University Press
                1367-4803
                1460-2059
                1 May 2009
                16 March 2009
                16 March 2009
                : 25
                : 9
                : 1105-1111
                Affiliations
                1Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742 and 2Department of Mathematics, University of California, Berkeley, CA 94720, USA
                Author notes
                *To whom correspondence should be addressed.

                Associate Editor: Ivo Hofacker

                Article
                btp120
                10.1093/bioinformatics/btp120
                2672628
                19289445
                01882866-8513-4e09-93e6-c337d6ece027
                © 2009 The Author(s)

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 23 October 2008
                : 24 February 2009
                : 26 February 2009
                Categories
                Original Papers
                Sequence Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article