16
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Gene expression profiling (GEP) via microarray analysis is a widely used tool for assessing risk and other patient diagnostics in clinical settings. However, non-biological factors such as systematic changes in sample preparation, differences in scanners, and other potential batch effects are often unavoidable in long-term studies and meta-analysis. In order to reduce the impact of batch effects on microarray data, Johnson, Rabinovic, and Li developed ComBat for use when combining batches of gene expression microarray data.

          We propose a modification to ComBat that centers data to the location and scale of a pre-determined, ‘gold-standard’ batch. This modified ComBat (M-Combat) is designed specifically in the context of meta-analysis and batch effect adjustment for use with predictive models that are validated and fixed on historical data from a ‘gold-standard’ batch.

          Results

          We combined data from MIRT across two batches (‘Old’ and ‘New’ Kit sample preparation) as well as external data sets from the HOVON-65/GMMG-HD4 and MRC-IX trials into a combined set, first without transformation and then with both ComBat and M-ComBat transformations. Fixed and validated gene risk signatures developed at MIRT on the Old Kit standard (GEP5, GEP70, and GEP80 risk scores) were compared across these combined data sets.

          Both ComBat and M-ComBat eliminated all of the differences among probes caused by systematic batch effects (over 98 % of all untransformed probes were significantly different by ANOVA with 0.01 q-value threshold reduced to zero significant probes with ComBat and M-ComBat). The agreement in mean and distribution of risk scores, as well as the proportion of high-risk subjects identified, coincided with the ‘gold-standard’ batch more with M-ComBat than with ComBat. The performance of risk scores improved overall using either ComBat or M-Combat; however, using M-ComBat and the original, optimal risk cutoffs allowed for greater ability in our study to identify smaller cohorts of high-risk subjects.

          Conclusion

          M-ComBat is a practical modification to an accepted method that offers greater power to control the location and scale of batch-effect adjusted data. M-ComBat allows for historical models to function as intended on future samples despite known, often unavoidable systematic changes to gene expression data.

          Related collections

          Most cited references19

          • Record: found
          • Abstract: found
          • Article: not found

          A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1.

          To molecularly define high-risk disease, we performed microarray analysis on tumor cells from 532 newly diagnosed patients with multiple myeloma (MM) treated on 2 separate protocols. Using log-rank tests of expression quartiles, 70 genes, 30% mapping to chromosome 1 (P < .001), were linked to early disease-related death. Importantly, most up-regulated genes mapped to chromosome 1q, and down-regulated genes mapped to chromosome 1p. The ratio of mean expression levels of up-regulated to down-regulated genes defined a high-risk score present in 13% of patients with shorter durations of complete remission, event-free survival, and overall survival (training set: hazard ratio [HR], 5.16; P < .001; test cohort: HR, 4.75; P < .001). The high-risk score also was an independent predictor of outcome endpoints in multivariate analysis (P < .001) that included the International Staging System and high-risk translocations. In a comparison of paired baseline and relapse samples, the high-risk score frequency rose to 76% at relapse and predicted short postrelapse survival (P < .05). Multivariate discriminant analysis revealed that a 17-gene subset could predict outcome as well as the 70-gene model. Our data suggest that altered transcriptional regulation of genes mapping to chromosome 1 may contribute to disease progression, and that expression profiling can be used to identify high-risk disease and guide therapeutic interventions.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods

            The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by “batch effects,” the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Global gene expression profiling of multiple myeloma, monoclonal gammopathy of undetermined significance, and normal bone marrow plasma cells.

              Bone marrow plasma cells (PCs) from 74 patients with newly diagnosed multiple myeloma (MM), 5 with monoclonal gammopathy of undetermined significance (MGUS), and 31 healthy volunteers (normal PCs) were purified by CD138(+) selection. Gene expression of purified PCs and 7 MM cell lines were profiled using high-density oligonucleotide microarrays interrogating about 6800 genes. On hierarchical clustering analysis, normal and MM PCs were differentiated and 4 distinct subgroups of MM (MM1, MM2, MM3, and MM4) were identified. The expression pattern of MM1 was similar to normal PCs and MGUS, whereas MM4 was similar to MM cell lines. Clinical parameters linked to poor prognosis, abnormal karyotype (P =.002) and high serum beta(2)-microglobulin levels (P =.0005), were most prevalent in MM4. Also, genes involved in DNA metabolism and cell cycle control were overexpressed in a comparison of MM1 and MM4. In addition, using chi(2) and Wilcoxon rank sum tests, 120 novel candidate disease genes were identified that discriminate normal and malignant PCs (P <.0001); many are involved in adhesion, apoptosis, cell cycle, drug resistance, growth arrest, oncogenesis, signaling, and transcription. A total of 156 genes, including FGFR3 and CCND1, exhibited highly elevated ("spiked") expression in at least 4 of the 74 MM cases (range, 4-25 spikes). Elevated expression of these 2 genes was caused by the translocation t(4;14)(p16;q32) or t(11;14)(q13;q32). Thus, novel candidate MM disease genes have been identified using gene expression profiling and this profiling has led to the development of a gene-based classification system for MM.
                Bookmark

                Author and article information

                Contributors
                CKStein@uams.edu
                pingping@crab.org
                EpsteinJoshua@uams.edu
                AFBuros@uams.edu
                adamr@crab.org
                johnc@crab.org
                GJMorgan@uams.edu
                BarlogieBart@uams.edu
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                25 February 2015
                25 February 2015
                2015
                : 16
                : 1
                : 63
                Affiliations
                [ ]Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR USA
                [ ]Cancer Research and Biostatistics, Seattle, WA USA
                Article
                478
                10.1186/s12859-015-0478-3
                4355992
                25591917
                2aab5cec-b09a-4997-8a34-75fa76d52af1
                © Stein et al.; licensee BioMed Central. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 8 January 2015
                : 27 January 2015
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2015

                Bioinformatics & Computational biology
                microarray analysis,gene expression profiling (gep),batch effect,meta-analysis,multiple myeloma (mm),combat,m-combat

                Comments

                Comment on this article