Computational Methods for Single-Cell RNA Sequencing

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Single-cell RNA sequencing (scRNA-seq) has provided a high-dimensional catalog of millions of cells across species and diseases. These data have spurred the development of hundreds of computational tools to derive novel biological insights. Here, we outline the components of scRNA-seq analytical pipelines and the computational methods that underlie these steps. We describe available methods, highlight well-executed benchmarking studies, and identify opportunities for additional benchmarking studies and computational methods. As the biochemical approaches for single-cell omics advance, we propose coupled development of robust analytical pipelines suited for the challenges that new data present and principled selection of analytical methods that are suited for the biological questions to be addressed.

Related collections

Most cited references 165

Record: found
Abstract: found
Article: found

Is Open Access

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael Love, Wolfgang Huber, Simon Anders (2014)

In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.

0 comments Cited 22885 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

STAR: ultrafast universal RNA-seq aligner.

Alexander Dobin, Carrie A. Davis, Felix Schlesinger … (2013)

Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

0 comments Cited 13271 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

A. Subramanian, P. Tamayo, V. K. Mootha … (2005)

Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

0 comments Cited 12503 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Title: Annual Review of Biomedical Data Science

Abbreviated Title: Annu. Rev. Biomed. Data Sci.

Publisher: Annual Reviews

ISSN (Print): 2574-3414

ISSN (Electronic): 2574-3414

Publication date Created: July 20 2020

Publication date (Print): July 20 2020

Volume: 3

Issue: 1

Pages: 339-364

Affiliations

[1 ]Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;

[2 ]Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA;

[3 ]Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA

[4 ]Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

[5 ]Department of Chemistry, Institute for Medical Engineering & Science (IMES), and Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

[6 ]Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

Article

DOI: 10.1146/annurev-biodatasci-012220-100601

SO-VID: 7dc6bff9-3167-44ce-9db0-0311fae0e2ea

History

Data availability:

Comments

Comment on this article

scite_

Cited by 33

See all cited by

Most referenced authors 5,018

See all reference authors

Computational Methods for Single-Cell RNA Sequencing

Read this article at

Abstract

Related collections

NeuroImaging Methods

Most cited references 165

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

STAR: ultrafast universal RNA-seq aligner.

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

Author and article information

Journal

Affiliations

Article

History

Comments

Comment on this article

Similar content 530

Cited by 33

Most referenced authors 5,018