5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Assessment of computational methods for the analysis of single-cell ATAC-seq data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans), lead to inherent data sparsity (1–10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (10–45% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level.

          Results

          We present a benchmarking framework that is applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were compared by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed.

          Conclusions

          This reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC is the only method able to analyze a large dataset (> 80,000 cells).

          Related collections

          Most cited references37

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          BEDTools: a flexible suite of utilities for comparing genomic features

          Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            An Integrated Encyclopedia of DNA Elements in the Human Genome

            Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Fast unfolding of communities in large networks

              Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008
                Bookmark

                Author and article information

                Contributors
                lpinello@mgh.harvard.edu
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                18 November 2019
                18 November 2019
                2019
                : 20
                : 241
                Affiliations
                [1 ]ISNI 0000 0004 0386 9924, GRID grid.32224.35, Molecular Pathology Unit, , Massachusetts General Hospital Research Institute, ; Charlestown, MA 02129 USA
                [2 ]ISNI 0000 0004 0386 9924, GRID grid.32224.35, Center for Cancer Research, , Massachusetts General Hospital, ; Charlestown, MA 02129 USA
                [3 ]ISNI 000000041936754X, GRID grid.38142.3c, Department of Pathology, , Harvard Medical School, ; Boston, MA 02115 USA
                [4 ]GRID grid.66859.34, Broad Institute of Harvard and MIT, ; Cambridge, MA 02142 USA
                [5 ]ISNI 000000041936754X, GRID grid.38142.3c, Department of Stem Cell and Regenerative Biology, , Harvard University, ; Cambridge, MA 02138 USA
                [6 ]ISNI 0000 0001 1941 7111, GRID grid.5802.f, Faculty of Biology, Computational Biology and Data Mining Lab, , Johannes Gutenberg University of Mainz, ; 55128 Mainz, Germany
                [7 ]ISNI 000000041936754X, GRID grid.38142.3c, Department of Chemistry and Chemical Biology, , Harvard University, ; Cambridge, MA 02142 USA
                Author information
                http://orcid.org/0000-0003-1109-3823
                Article
                1854
                10.1186/s13059-019-1854-5
                6859644
                31739806
                889eaf34-5497-4f70-a602-c1dbc3eef468
                © The Author(s). 2019

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 4 June 2019
                : 3 October 2019
                Funding
                Funded by: Chan Zuckerberg Initiative DAF
                Award ID: 2018- 182734
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000051, National Human Genome Research Institute;
                Award ID: R00HG008399
                Award Recipient :
                Funded by: National Human Genome Research Institute (US)
                Award ID: R35HG010717
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2019

                Genetics
                scatac-seq,feature matrix,benchmarking,regulatory genomics,clustering,visualization,featurization,dimensionality reduction

                Comments

                Comment on this article