+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A survey of best practices for RNA-seq data analysis

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.

          Electronic supplementary material

          The online version of this article (doi:10.1186/s13059-016-0881-8) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references 205

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Fast and accurate short read alignment with Burrows–Wheeler transform

          Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: Contact:
            • Record: found
            • Abstract: not found
            • Article: not found

            Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

              • Record: found
              • Abstract: found
              • Article: not found

              Fast gapped-read alignment with Bowtie 2.

              As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

                Author and article information

                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                26 January 2016
                26 January 2016
                : 17
                [ ]Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32603 USA
                [ ]Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012 Valencia, Spain
                [ ]Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK
                [ ]Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ UK
                [ ]Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020 Valencia, Spain
                [ ]Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77 Stockholm, Sweden
                [ ]Center for Molecular Medicine, Karolinska Institutet, 17177 Stockholm, Sweden
                [ ]Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176 Stockholm, Sweden
                [ ]Science for Life Laboratory, 17121 Solna, Sweden
                [ ]Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014 Helsinki, Finland
                [ ]School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6 BC Canada
                [ ]Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614 Poznań, Poland
                [ ]Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland
                [ ]Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084 China
                [ ]School of Life Sciences, Tsinghua University, Beijing, 100084 China
                [ ]Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697-2300 USA
                [ ]Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697 USA
                © Conesa et al. 2016

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

                Funded by: FundRef, Seventh Framework Programme;
                Award ID: 36000 - STATegra
                Award Recipient :
                Funded by: National Basic Research Program of China
                Award ID: 2012CB316504
                Award Recipient :
                Funded by: JDRF
                Award ID: 2-2013-32
                Award Recipient :
                Funded by: FundRef, Sigrid Juséliuksen Säätiö;
                Custom metadata
                © The Author(s) 2016



                Comment on this article