102
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

          Related collections

          Most cited references60

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

          In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            limma powers differential expression analyses for RNA-sequencing and microarray studies

            limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

              Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au
                Bookmark

                Author and article information

                Journal
                F1000Res
                F1000Res
                F1000Research
                F1000Research
                F1000Research (London, UK )
                2046-1402
                31 October 2016
                2016
                : 5
                : 2122
                Affiliations
                [1 ]Cancer Research UK Cambridge Institute, Cambridge, UK
                [2 ]EMBL European Bioinformatics Institute, Cambridge, UK
                [3 ]St Vincent’s Institute of Medical Research, Fitzroy, Australia
                [4 ]Wellcome Trust Sanger Institute, Cambridge, UK
                [1 ]Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA
                [1 ]Clinical Bioinformatics laboratory, Imagine Institute, Paris Descartes University - Sorbonne Paris Cité, Paris, France
                Cancer Research UK Cambridge Research Institute, UK
                [1 ]Department of Computational Biology and Medical Sciences, University of Tokyo, Tokyo, Japan
                Cancer Research UK Cambridge Research Institute, UK
                [1 ]Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA
                Cancer Research UK Cambridge Research Institute, UK
                [1 ]Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
                Cancer Research UK Cambridge Research Institute, UK
                [1 ]Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, Singapore, Singapore
                Cancer Research UK Cambridge Research Institute, UK
                Author notes

                A.T.L.L. developed and tested the workflow on all datasets. A.T.L.L. and D.J.M. implemented improvements to the software packages required by the workflow. J.C.M. provided direction to the software and workflow development. All authors wrote and approved the final manuscript.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests were disclosed.

                Competing interests: None declared.

                Competing interests: No competing interests were disclosed.

                Competing interests: None declared.

                Competing interests: No competing interests were disclosed.

                Competing interests: None declared.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests are declared.

                Competing interests: No competing interests were disclosed.

                Competing interests: No competing interests are declared.

                Article
                10.12688/f1000research.9501.2
                5112579
                27909575
                6e24044f-bf9b-4a17-95c1-53c23008d1b6
                Copyright: © 2016 Lun ATL et al.

                This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 24 October 2016
                Funding
                Funded by: Cancer Research UK
                Award ID: A17197
                Funded by: National Health and Medical Research Council of Australia
                Funded by: EMBL
                A.T.L.L. and J.C.M. were supported by core funding from Cancer Research UK (award no. A17197). D.J.M. was supported by a CJ Martin Fellowship from the National Health and Medical Research Council of Australia. D.J.M and J.C.M. were also supported by core funding from EMBL.
                The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Software Tool Article
                Articles
                Bioinformatics
                Genomics

                single cell,rna-seq,bioinformatics,bioconductor,workflow
                single cell, rna-seq, bioinformatics, bioconductor, workflow

                Comments

                Comment on this article