SCANPY : large-scale single-cell gene expression data analysis

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells ( https://github.com/theislab/Scanpy). Along with Scanpy, we present AnnData, a generic class for handling annotated data matrices ( https://github.com/theislab/anndata).

Related collections

Most cited references 16

Record: found
Abstract: found
Article: found

Is Open Access

Fast unfolding of communities in large networks

Renaud Lambiotte, Etienne Lefebvre, Vincent D Blondel … (2008)

We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .

0 comments Cited 764 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

Aaron Lun, Davis J. McCarthy, John Marioni … (2016)

Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.

0 comments Cited 621 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

The NumPy array: a structure for efficient numerical computation

Gael Varoquaux, S. Chris Colbert, Stefan van der Walt (2011)

In the Python world, NumPy arrays are the standard representation for numerical data. Here, we show how these arrays enable efficient implementation of numerical computations in a high-level language. Overall, three techniques are applied to improve performance: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts. We first present the NumPy array structure, then show how to use it for efficient computation, and finally how to share array data with other libraries.

0 comments Cited 536 times – based on 0 reviews

Preprint

     Review now

Bookmark

All references

Author and article information

Contributors

F. Alexander Wolf:

ORCID: http://orcid.org/0000-0002-8760-7838

alex.wolf@helmholtz-muenchen.de

Fabian J. Theis: fabian.theis@helmholtz-muenchen.de

Journal

Journal ID (nlm-ta): Genome Biol

Journal ID (iso-abbrev): Genome Biol

Title: Genome Biology

Publisher: BioMed Central (London )

ISSN (Print): 1474-7596

ISSN (Electronic): 1474-760X

Publication date (Electronic): 6 February 2018

Publication date PMC-release: 6 February 2018

Publication date Collection: 2018

Volume: 19

Electronic Location Identifier: 15

Affiliations

[1 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Helmholtz Zentrum München – German Research Center for Environmental Health, , Institute of Computational Biology, ; Munich, Neuherberg Germany

[2 ]ISNI 0000000123222966, GRID grid.6936.a, Department of Mathematics, , Technische Universität München, ; Munich, Germany

Author information

F. Alexander Wolf http://orcid.org/0000-0002-8760-7838

Article

Publisher ID: 1382

DOI: 10.1186/s13059-017-1382-0

PMC ID: 5802054

PubMed ID: 29409532

SO-VID: d2acbbfd-83ad-4fa7-ab5c-c2dec1b04f9c

License:

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 16 August 2017

Date accepted : 20 December 2017

Funding

Funded by: Helmholtz-Gemeinschaft

Award ID: Helmholtz Postdoc Grant

Custom metadata

ScienceOpen disciplines: Genetics

Keywords: single-cell transcriptomics,machine learning,scalability,graph analysis,clustering,pseudotemporal ordering,trajectory inference,differential expression testing,visualization,bioinformatics

Data availability:

ScienceOpen disciplines: Genetics

Keywords: single-cell transcriptomics, machine learning, scalability, graph analysis, clustering, pseudotemporal ordering, trajectory inference, differential expression testing, visualization, bioinformatics

SCANPY: large-scale single-cell gene expression data analysis

Read this article at

Abstract

Related collections

iGEM

Most cited references 16

Fast unfolding of communities in large networks

A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

The NumPy array: a structure for efficient numerical computation

Author and article information

Contributors

Journal

Affiliations

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 207

Cited by 1,985

Most referenced authors 2,487