165
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      SCANPY: large-scale single-cell gene expression data analysis

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. Its Python-based implementation efficiently deals with data sets of more than one million cells ( https://github.com/theislab/Scanpy). Along with Scanpy, we present AnnData, a generic class for handling annotated data matrices ( https://github.com/theislab/anndata).

          Related collections

          Most cited references16

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Fast unfolding of communities in large networks

          We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor

            Single-cell RNA sequencing (scRNA-seq) is widely used to profile the transcriptome of individual cells. This provides biological resolution that cannot be matched by bulk RNA sequencing, at the cost of increased technical noise and data complexity. The differences between scRNA-seq and bulk RNA-seq data mean that the analysis of the former cannot be performed by recycling bioinformatics pipelines for the latter. Rather, dedicated single-cell methods are required at various steps to exploit the cellular resolution while accounting for technical noise. This article describes a computational workflow for low-level analyses of scRNA-seq data, based primarily on software packages from the open-source Bioconductor project. It covers basic steps including quality control, data exploration and normalization, as well as more complex procedures such as cell cycle phase assignment, identification of highly variable and correlated genes, clustering into subpopulations and marker gene detection. Analyses were demonstrated on gene-level count data from several publicly available datasets involving haematopoietic stem cells, brain-derived cells, T-helper cells and mouse embryonic stem cells. This will provide a range of usage scenarios from which readers can construct their own analysis pipelines.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              The NumPy array: a structure for efficient numerical computation

              In the Python world, NumPy arrays are the standard representation for numerical data. Here, we show how these arrays enable efficient implementation of numerical computations in a high-level language. Overall, three techniques are applied to improve performance: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts. We first present the NumPy array structure, then show how to use it for efficient computation, and finally how to share array data with other libraries.
                Bookmark

                Author and article information

                Contributors
                alex.wolf@helmholtz-muenchen.de
                fabian.theis@helmholtz-muenchen.de
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                6 February 2018
                6 February 2018
                2018
                : 19
                : 15
                Affiliations
                [1 ]ISNI 0000 0004 0483 2525, GRID grid.4567.0, Helmholtz Zentrum München – German Research Center for Environmental Health, , Institute of Computational Biology, ; Munich, Neuherberg Germany
                [2 ]ISNI 0000000123222966, GRID grid.6936.a, Department of Mathematics, , Technische Universität München, ; Munich, Germany
                Author information
                http://orcid.org/0000-0002-8760-7838
                Article
                1382
                10.1186/s13059-017-1382-0
                5802054
                29409532
                d2acbbfd-83ad-4fa7-ab5c-c2dec1b04f9c
                © The Author(s) 2018

                Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 16 August 2017
                : 20 December 2017
                Funding
                Funded by: Helmholtz-Gemeinschaft
                Award ID: Helmholtz Postdoc Grant
                Categories
                Software
                Custom metadata
                © The Author(s) 2018

                Genetics
                single-cell transcriptomics,machine learning,scalability,graph analysis,clustering,pseudotemporal ordering,trajectory inference,differential expression testing,visualization,bioinformatics

                Comments

                Comment on this article