15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Removal of Batch Effects using Distribution-Matching Residual Networks

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Real world datasets naturally contain variability whose origin is in varying conditions of measurement instruments, augmenting the variability of the physical phenomena of interest. An important example is biological data, where differences in the calibration of the measuring instrument might be an obstacle to valid and effective statistical analysis, for example when comparing measurements obtained from two identical instruments, or even from the same instrument at different times. In this manuscript we propose a deep learning approach for overcoming such calibration differences, by learning a map that matches two multi-dimensional distributions, measured in two different batches. We apply our method to real world CyTOF and single-cell RNA-seq datasets, notoriously subject to instrument calibration effects, and demonstrate its effectiveness to calibrate identical replicated samples, by removing batch effects that cannot be eliminated satisfactory using existing methods. Furthermore, our experiments suggest that a network that was trained on a replicates of one sample measured in two different batches can effectively calibrate data of other samples, hence eliminate batch effects.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: not found
          • Article: not found

          Extracting and composing robust features with denoising autoencoders

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Standardization and quality control for high-dimensional mass cytometry studies of human samples.

            Mass cytometry (CyTOF), a mass spectrometry-based single cell phenotyping technology, allows utilization of over 35 antibodies in a single sample and is a promising tool for translational human immunology studies. Although several analysis tools are available to interpret the complex data sets generated, a robust method for standardization and quality control within and across studies is needed. Here we report an efficient and easily adaptable method to monitor quality of individual samples in human immunology studies and to facilitate reproducible data analysis. Samples to be assessed are spiked with a defined amount of reference peripheral blood mononuclear cells from a healthy donor, derived from a single large blood draw. The presence of known standardized numbers and phenotypic profiles of these reference cells greatly facilitates sample analysis by allowing for: 1) quality control for consistent staining of each antibody in the panel, 2) identification of potential batch effects, and 3) implementation of a robust gating strategy. We demonstrate the utility of this method using peripheral blood and bronchoalveolar lavage samples from HIV(+) patients by characterizing their CD8(+) T-cell phenotypes and cytokine expression, respectively. Our results indicate that this method allows quality control of experimental conditions and results in highly reproducible population frequencies through a robust gating strategy. © 2016 International Society for Advancement of Cytometry.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              High-throughput flow cytometry data normalization for clinical trials.

              Flow cytometry datasets from clinical trials generate very large datasets and are usually highly standardized, focusing on endpoints that are well defined apriori. Staining variability of individual makers is not uncommon and complicates manual gating, requiring the analyst to adapt gates for each sample, which is unwieldy for large datasets. It can lead to unreliable measurements, especially if a template-gating approach is used without further correction to the gates. In this article, a computational framework is presented for normalizing the fluorescence intensity of multiple markers in specific cell populations across samples that is suitable for high-throughput processing of large clinical trial datasets. Previous approaches to normalization have been global and applied to all cells or data with debris removed. They provided no mechanism to handle specific cell subsets. This approach integrates tightly with the gating process so that normalization is performed during gating and is local to the specific cell subsets exhibiting variability. This improves peak alignment and the performance of the algorithm. The performance of this algorithm is demonstrated on two clinical trial datasets from the HIV Vaccine Trials Network (HVTN) and the Immune Tolerance Network (ITN). In the ITN data set we show that local normalization combined with template gating can account for sample-to-sample variability as effectively as manual gating. In the HVTN dataset, it is shown that local normalization mitigates false-positive vaccine response calls in an intracellular cytokine staining assay. In both datasets, local normalization performs better than global normalization. The normalization framework allows the use of template gates even in the presence of sample-to-sample staining variability, mitigates the subjectivity and bias of manual gating, and decreases the time necessary to analyze large datasets. © 2013 International Society for Advancement of Cytometry.
                Bookmark

                Author and article information

                Journal
                2016-10-13
                Article
                1610.04181
                975cbf01-eba1-416b-9e3e-47fbd7a9b8e0

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                stat.ML

                Machine learning
                Machine learning

                Comments

                Comment on this article