97
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms

      research-article
      1 , 2 , 3 , 1 , *
      Nature biotechnology

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoids mapping reads, a time-consuming step in all current methods, it provides quantification estimates much faster than do existing approaches (typically 20 times faster) without loss of accuracy. By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads.

          Related collections

          Most cited references13

          • Record: found
          • Abstract: found
          • Article: not found

          Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data

          Massively-parallel cDNA sequencing has opened the way to deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here, we present the Trinity methodology for de novo full-length transcriptome reconstruction, and evaluate it on samples from fission yeast, mouse, and whitefly – an insect whose genome has not yet been sequenced. Trinity fully reconstructs a large fraction of the transcripts present in the data, also reporting alternative splice isoforms and transcripts from recently duplicated genes. In all cases, Trinity performs better than other available de novo transcriptome assembly programs, and its sensitivity is comparable to methods relying on genome alignments. Our approach provides a unified and general solution for transcriptome reconstruction in any sample, especially in the complete absence of a reference genome.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

            The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Streaming fragment assignment for real-time analysis of sequencing experiments

              We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods.
                Bookmark

                Author and article information

                Journal
                9604648
                20305
                Nat Biotechnol
                Nat. Biotechnol.
                Nature biotechnology
                1087-0156
                1546-1696
                16 April 2014
                20 April 2014
                May 2014
                01 November 2014
                : 32
                : 5
                : 462-464
                Affiliations
                [1 ]Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
                [2 ]Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
                [3 ]Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
                Author notes
                [* ]Corresponding author carlk@ 123456cs.cmu.edu
                Article
                NIHMS572812
                10.1038/nbt.2862
                4077321
                24752080
                c7fecda4-bd67-4aca-8de5-1fe3484a6961
                History
                Categories
                Article

                Biotechnology
                Biotechnology

                Comments

                Comment on this article