5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang and Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5,157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel

          Related collections

          Most cited references35

          • Record: found
          • Abstract: not found
          • Article: not found

          Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

          T. Golub (1999)
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A Bayesian Analysis of Some Nonparametric Problems

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

              Recent genomic analyses of pathologically defined tumor types identify "within-a-tissue" disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies. Copyright © 2014 Elsevier Inc. All rights reserved.
                Bookmark

                Author and article information

                Journal
                12 October 2018
                Article
                1810.05450
                64f38bc5-45ed-43d5-b204-20bf86faffce

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                27 pages, 8 figures. For associated R package, see https://github.com/ococrook/sugsvarsel
                stat.ME q-bio.GN stat.AP

                Applications,Methodology,Genetics
                Applications, Methodology, Genetics

                Comments

                Comment on this article