38
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

          Related collections

          Most cited references 32

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

           Bo Li,  Colin Dewey (2011)
          Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Near-optimal probabilistic RNA-seq quantification.

            We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

               ,  ,   (2013)
              Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature. We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications. featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.
                Bookmark

                Author and article information

                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000Research
                F1000Research (London, UK )
                2046-1402
                31 October 2016
                2016
                : 5
                Affiliations
                [1 ]Cancer Research UK Cambridge Institute, Cambridge, UK
                [2 ]EMBL European Bioinformatics Institute, Cambridge, UK
                [3 ]St Vincent’s Institute of Medical Research, Fitzroy, Australia
                [4 ]Wellcome Trust Sanger Institute, Cambridge, UK
                [1 ]Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA
                [1 ]Clinical Bioinformatics laboratory, Imagine Institute, Paris Descartes University - Sorbonne Paris Cité, Paris, France
                Cancer Research UK Cambridge Research Institute, UK
                [1 ]Department of Computational Biology and Medical Sciences, University of Tokyo, Tokyo, Japan
                Cancer Research UK Cambridge Research Institute, UK
                [1 ]Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA
                Cancer Research UK Cambridge Research Institute, UK
                [1 ]Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
                Cancer Research UK Cambridge Research Institute, UK
                [1 ]Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, Singapore, Singapore
                Cancer Research UK Cambridge Research Institute, UK
                Author notes

                A.T.L.L. developed and tested the workflow on all datasets. A.T.L.L. and D.J.M. implemented improvements to the software packages required by the workflow. J.C.M. provided direction to the software and workflow development. All authors wrote and approved the final manuscript.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: None declared.

                Competing interests: No competing interests were disclosed.

                Competing interests: None declared.

                Competing interests: No competing interests were disclosed.

                Competing interests: None declared.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests are declared.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests are declared.

                Article
                10.12688/f1000research.9501.2
                5112579
                27909575
                6e24044f-bf9b-4a17-95c1-53c23008d1b6
                Copyright: © 2016 Lun ATL et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                Product
                Funding
                Funded by: Cancer Research UK
                Award ID: A17197
                Funded by: National Health and Medical Research Council of Australia
                Funded by: EMBL
                A.T.L.L. and J.C.M. were supported by core funding from Cancer Research UK (award no. A17197). D.J.M. was supported by a CJ Martin Fellowship from the National Health and Medical Research Council of Australia. D.J.M and J.C.M. were also supported by core funding from EMBL.
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Software Tool Article
                Articles
                Bioinformatics
                Genomics

                single cell, rna-seq, bioinformatics, bioconductor, workflow

                Comments

                Comment on this article