0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Learning a Latent Space of Highly Multidimensional Cancer Data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We introduce a Unified Disentanglement Network (UFDN) trained on The Cancer Genome Atlas (TCGA), which we refer to as UFDN-TCGA. We demonstrate that UFDN-TCGA learns a biologically relevant, low-dimensional latent space of high-dimensional gene expression data by applying our network to two classification tasks of cancer status and cancer type. UFDN-TCGA performs comparably to random forest methods. The UFDN allows for continuous, partial interpolation between distinct cancer types. Furthermore, we perform an analysis of differentially expressed genes between skin cutaneous melanoma (SKCM) samples and the same samples interpolated into glioblastoma (GBM). We demonstrate that our interpolations consist of relevant metagenes that recapitulate known glioblastoma mechanisms.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Deep Learning based multi-omics integration robustly predicts survival in liver cancer

          Identifying robust survival subgroups of hepatocellular carcinoma (HCC) will significantly improve patient care. Currently, endeavor of integrating multi-omics data to explicitly predict HCC survival from multiple patient cohorts is lacking. To fill this gap, we present a deep learning (DL)-based model on HCC that robustly differentiates survival subpopulations of patients in six cohorts. We built the DL-based, survival-sensitive model on 360 HCC patients' data using RNA sequencing (RNA-Seq), miRNA sequencing (miRNA-Seq), and methylation data from The Cancer Genome Atlas (TCGA), which predicts prognosis as good as an alternative model where genomics and clinical data are both considered. This DL-based model provides two optimal subgroups of patients with significant survival differences (P = 7.13e-6) and good model fitness [concordance index (C-index) = 0.68]. More aggressive subtype is associated with frequent TP53 inactivation mutations, higher expression of stemness markers (KRT19 and EPCAM) and tumor marker BIRC5, and activated Wnt and Akt signaling pathways. We validated this multi-omics model on five external datasets of various omics types: LIRI-JP cohort (n = 230, C-index = 0.75), NCI cohort (n = 221, C-index = 0.67), Chinese cohort (n = 166, C-index = 0.69), E-TABM-36 cohort (n = 40, C-index = 0.77), and Hawaiian cohort (n = 27, C-index = 0.82). This is the first study to employ DL to identify multi-omics features linked to the differential survival of patients with HCC. Given its robustness over multiple cohorts, we expect this workflow to be useful at predicting HCC prognosis prediction. Clin Cancer Res; 24(6); 1248-59. ©2017 AACR.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            CXCR4 is a major chemokine receptor on glioma cells and mediates their survival.

            Chemokines were described originally in the context of providing migrational cues for leukocytes. They are now known to have broader activities, including those that favor tumor growth. We addressed whether and which chemokines may be important promoters of the growth of the incurable brain neoplasm, malignant gliomas. Analyses of 16 human glioma lines for the expression of chemokine receptors belonging to the CXCR and CCR series revealed low to negligible levels of all receptors, with the exception of CXCR4 that was expressed by 13 of 16 lines. All six resected human glioma specimens showed similarly high CXCR4 expression. The CXCR4 on glioma lines is a signaling receptor in that its agonist, stromal cell-derived factor-1 (SDF-1; CXCL12), produced rapid phosphorylation of mitogen-activated protein kinases. Furthermore, SDF-1 induced the phosphorylation of Akt (protein kinase B), a kinase associated with survival, and prevented the apoptosis of glioma cells when serum was withdrawn from the culture medium. SDF-1 also mediated glioma chemotaxis, in accordance with this better known role of chemokines. We conclude that glioma cells express a predominant chemokine receptor, CXCR4, and that this functions to regulate survival in part through activating pathways such as Akt.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders.

              The Cancer Genome Atlas (TCGA) has profiled over 10,000 tumors across 33 different cancer-types for many genomic features, including gene expression levels. Gene expression measurements capture substantial information about the state of each tumor. Certain classes of deep neural network models are capable of learning a meaningful latent space. Such a latent space could be used to explore and generate hypothetical gene expression profiles under various types of molecular and genetic perturbation. For example, one might wish to use such a model to predict a tumor's response to specific therapies or to characterize complex gene expression activations existing in differential proportions in different tumors. Variational autoencoders (VAEs) are a deep neural network approach capable of generating meaningful latent spaces for image and text data. In this work, we sought to determine the extent to which a VAE can be trained to model cancer gene expression, and whether or not such a VAE would capture biologically-relevant features. In the following report, we introduce a VAE trained on TCGA pan-cancer RNA-seq data, identify specific patterns in the VAE encoded features, and discuss potential merits of the approach. We name our method "Tybalt" after an instigative, cat-like character who sets a cascading chain of events in motion in Shakespeare's "Romeo and Juliet". From a systems biology perspective, Tybalt could one day aid in cancer stratification or predict specific activated expression patterns that would result from genetic changes or treatment effects.
                Bookmark

                Author and article information

                Contributors
                Journal
                9711271
                20660
                Pac Symp Biocomput
                Pac Symp Biocomput
                Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
                2335-6928
                2335-6936
                29 November 2019
                2020
                01 January 2020
                : 25
                : 379-390
                Affiliations
                Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
                Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
                Author notes
                Article
                NIHMS1061168
                6934353
                31797612
                9ac7047b-de18-4655-862e-e023a42b2b52

                Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.

                History
                Categories
                Article

                machine learning,rnaseq,image translation,disentangled latent spaces,multidimensional data

                Comments

                Comment on this article