62
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75% of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: not found
          • Article: not found

          Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation.

          Y. H. Yang (2002)
          There are many sources of systematic variation in cDNA microarray experiments which affect the measured gene expression levels (e.g. differences in labeling efficiency between the two fluorescent dyes). The term normalization refers to the process of removing such variation. A constant adjustment is often used to force the distribution of the intensity log ratios to have a median of zero for each slide. However, such global normalization approaches are not adequate in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments. The selection of appropriate controls for normalization is discussed and a novel set of controls (microarray sample pool, MSP) is introduced to aid in intensity-dependent normalization. Lastly, to allow for comparisons of expression levels across slides, a robust method based on maximum likelihood estimation is proposed to adjust for scale differences among slides.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1.

            To molecularly define high-risk disease, we performed microarray analysis on tumor cells from 532 newly diagnosed patients with multiple myeloma (MM) treated on 2 separate protocols. Using log-rank tests of expression quartiles, 70 genes, 30% mapping to chromosome 1 (P < .001), were linked to early disease-related death. Importantly, most up-regulated genes mapped to chromosome 1q, and down-regulated genes mapped to chromosome 1p. The ratio of mean expression levels of up-regulated to down-regulated genes defined a high-risk score present in 13% of patients with shorter durations of complete remission, event-free survival, and overall survival (training set: hazard ratio [HR], 5.16; P < .001; test cohort: HR, 4.75; P < .001). The high-risk score also was an independent predictor of outcome endpoints in multivariate analysis (P < .001) that included the International Staging System and high-risk translocations. In a comparison of paired baseline and relapse samples, the high-risk score frequency rose to 76% at relapse and predicted short postrelapse survival (P < .05). Multivariate discriminant analysis revealed that a 17-gene subset could predict outcome as well as the 70-gene model. Our data suggest that altered transcriptional regulation of genes mapping to chromosome 1 may contribute to disease progression, and that expression profiling can be used to identify high-risk disease and guide therapeutic interventions.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer.

              We developed a multigene predictor of pathologic complete response (pCR) to preoperative weekly paclitaxel and fluorouracil-doxorubicin-cyclophosphamide (T/FAC) chemotherapy and assessed its predictive accuracy on independent cases. One hundred thirty-three patients with stage I-III breast cancer were included. Pretreatment gene expression profiling was performed with oligonecleotide microarrays on fine-needle aspiration specimens. We developed predictors of pCR from 82 cases and assessed accuracy on 51 independent cases. Overall pCR rate was 26% in both cohorts. In the training set, 56 probes were identified as differentially expressed between pCR versus residual disease, at a false discovery rate of 1%. We examined the performance of 780 distinct classifiers (set of genes + prediction algorithm) in full cross-validation. Many predictors performed equally well. A nominally best 30-probe set Diagonal Linear Discriminant Analysis classifier was selected for independent validation. It showed significantly higher sensitivity (92% v 61%) than a clinical predictor including age, grade, and estrogen receptor status. The negative predictive value (96% v 86%) and area under the curve (0.877 v 0.811) were nominally better but not statistically significant. The combination of genomic and clinical information yielded a predictor not significantly different from the genomic predictor alone. In 31 samples, RNA was hybridized in replicate with resulting predictions that were 97% concordant. A 30-probe set pharmacogenomic predictor predicted pCR to T/FAC chemotherapy with high sensitivity and negative predictive value. This test correctly identified all but one of the patients who achieved pCR (12 of 13 patients) and all but one of those who were predicted to have residual disease had residual cancer (27 of 28 patients).
                Bookmark

                Author and article information

                Journal
                Pharmacogenomics J
                The Pharmacogenomics Journal
                Nature Publishing Group
                1470-269X
                1473-1150
                August 2010
                30 July 2010
                : 10
                : 4
                : 278-291
                Affiliations
                [1 ]simpleSystems Analytics Inc. , Waltham, MA, USA
                [2 ]simpleNovartis Pharma AG, NIBR, Biomarker Development Department , Basel, Switzerland
                [3 ]simpleSpheromics , Kontiolahti, Finland
                [4 ]simpleDepartment of Molecular Biology, Biomedical Research Foundation of the Academy of Athens and Department of Pharmacology, National and Kapodistrian University of Athens Medical School , Athens, Greece
                [5 ]simpleCMINDS Research Center, Department of Electrical and Computer Engineering, University of Massachusetts at Lowell , Lowell, MA, USA
                [6 ]simpleAlmac Diagnostics , Craigavon, UK
                [7 ]simpleShanghai Information Center for Life Sciences, Chinese Academy of Sciences , Shanghai, China
                [8 ]simpleDivision of Systems Biology, National Center for Toxicological Research, US Food and Drug Administration , Jefferson, AR, USA
                [9 ]simpleCollege of Life Sciences, Northeast Forestry University , Harbin, Heilongjiang, China
                [10 ]simpleLineberger Comprehensive Cancer Center, University of North Carolina , Chapel Hill, NC, USA
                [11 ]simpleGeneGo Inc. , St Joseph, MI, USA
                [12 ]simpleThe Hamner Institute of Health Sciences, Research Triangle Park , NC, USA
                [13 ]simpleClinical and Translational Sciences Institute, Northwestern University , Chicago, IL, USA
                [14 ]simpleDepartment of Clinical Research, Riverside Cancer Care Center , Newport News, VA, USA
                [15 ]simpleR&D Division, SABiosciences Corporation , Frederick, MD, USA
                [16 ]simpleMyeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences , Little Rock, AR, USA
                [17 ]simpleDepartment of Bioengineering, University of Illinois at Urbana-Champaign , Urbana, IL, USA
                [18 ]simpleDepartment of Biological Sciences, University of Southern Mississippi , Hattiesburg, MS, USA
                [19 ]simpleZ-Tech, an ICF International Company at National Center for Toxicological Research, US Food and Drug Administration , Jefferson, AR, USA
                [20 ]simpleBiostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park , NC, USA
                Author notes
                [* ]simpleDepartment of Bioinformatics, Systems Analytics , 55 Moody Street, Suite 21, Waltham, MA 02453, USA. E-mail: johnz@ 123456SystemsAnalytics.com
                Article
                tpj201057
                10.1038/tpj.2010.57
                2920074
                20676067
                30f8ac22-b628-44dd-98ef-aa60771af9aa
                Copyright © 2010 Macmillan Publishers Limited

                This work is licensed under the Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/

                History
                : 22 November 2009
                : 23 May 2010
                : 24 May 2010
                Categories
                Original Article

                Pharmacology & Pharmaceutical medicine
                batch effect,maqc-ii,cross-batch prediction,microarray,batch effect removal,gene expression

                Comments

                Comment on this article