14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      SCHEMA: A general framework for integrating heterogeneous single-cell modalities

      Preprint
      , , ,
      bioRxiv

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Advances in single-cell technologies now allow researchers to simultaneously measure diverse biological data modalities, including transcriptomic, proteomic, or spatial features. Sophisticated cross-modality analysis promises unprecedented and more complete insight into complex biology. Unfortunately, current methods for single-cell RNA-seq data integration focus on aligning patterns across multiple experiments and are not designed for the setting in which a single experiment simultaneously measures multiple modalities. We therefore propose S chema, a general, scalable, and flexible framework for integrating diverse modalities profiled within a single experimental setting. S chema has intuitive parameters that facilitate exploratory analysis and complements standard techniques like PCA and CCA (canonical correlation analysis). We first process each modality to compute a similarity (equivalently, distance) measure between cells. S chema is a framework for calibrating the agreement between these modality-specific distance measures. Given a primary dataset (i.e., modality) and an arbitrary number of secondary datasets, we search for an affine transformation of the primary dataset so that distances in the transformed space have the maximum possible agreement with the secondary datasets. Crucially, the user can calibrate this by specifying a limit on how much the primary dataset can be distorted; methods like CCA lack this control. Mathematically, we quantify the concept of agreement between datasets as the correlation between their respective sets of pairwise squared distances and offer some theoretical justification for this design choice. Our optimization problem can be formulated as a quadratic program (QP), allowing for a fast algorithm with intuitive free parameters and an appealing feature-weighting interpretation. Using S chema, we integrated spatial and transcriptomic modalities from the Slide-Seq experiment [1] and identified a gene-set in granule cells whose expression covaries with the spatial density of the cells. We investigated a multi-modal dataset from 10x Genomics [2], analyzing the selection pressure on residues in the CDR3 region of T-cell receptors. We also used S chema to improve UMAP & tSNE visualizations by infusing metadata into RNA-seq information. As the number and diversity of multi-modal datasets grows, S chema will be an essential component in the single-cell analyst’s arsenal.

          Availability

          Our implementation is freely available as the Python package schema_learn and code is available at http://schema.csail.mit.edu.

          Related collections

          Author and article information

          Journal
          bioRxiv
          November 08 2019
          Article
          10.1101/834549
          963af3af-21ff-4a97-86aa-414d6d3845ad
          © 2019
          History

          Quantitative & Systems biology,Biophysics
          Quantitative & Systems biology, Biophysics

          Comments

          Comment on this article