7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genomic region detection via Spatial Convex Clustering

      research-article
      1 , * , 1 , 2 , 3
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Several modern genomic technologies, such as DNA-Methylation arrays, measure spatially registered probes that number in the hundreds of thousands across multiple chromosomes. The measured probes are by themselves less interesting scientifically; instead scientists seek to discover biologically interpretable genomic regions comprised of contiguous groups of probes which may act as biomarkers of disease or serve as a dimension-reducing pre-processing step for downstream analyses. In this paper, we introduce an unsupervised feature learning technique which maps technological units (probes) to biological units (genomic regions) that are common across all subjects. We use ideas from fusion penalties and convex clustering to introduce a method for Spatial Convex Clustering, or SpaCC. Our method is specifically tailored to detecting multi-subject regions of methylation, but we also test our approach on the well-studied problem of detecting segments of copy number variation. We formulate our method as a convex optimization problem, develop a massively parallelizable algorithm to find its solution, and introduce automated approaches for handling missing values and determining tuning parameters. Through simulation studies based on real methylation and copy number variation data, we show that SpaCC exhibits significant performance gains relative to existing methods. Finally, we illustrate SpaCC’s advantages as a pre-processing technique that reduces large-scale genomics data into a smaller number of genomic regions through several cancer epigenetics case studies on subtype discovery, network estimation, and epigenetic-wide association.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: not found
          • Article: not found

          Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Sparse inverse covariance estimation with the graphical lasso.

            We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The Adaptive Lasso and Its Oracle Properties

              Hui Zou (2006)
                Bookmark

                Author and article information

                Contributors
                Role: Formal analysisRole: InvestigationRole: SoftwareRole: Writing – original draftRole: Writing – review & editing
                Role: MethodologyRole: SupervisionRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2018
                11 September 2018
                : 13
                : 9
                : e0203007
                Affiliations
                [1 ] Department of Statistics, Rice University, Houston, TX, United States of America
                [2 ] Department of Electrical and Computer Engineering, Rice University, Houston, TX, United States of America
                [3 ] Jan and Dan Duncan Neurological Research Institute and Department of Pediatrics-Neurology, Baylor College of Medicine, Houston, TX, United States of America
                Chuo University, JAPAN
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0002-3327-7204
                Article
                PONE-D-18-20246
                10.1371/journal.pone.0203007
                6133280
                30204756
                f4274f97-b1f0-42a7-b8bf-83355bccdb43
                © 2018 Nagorski, Allen

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 20 July 2018
                : 13 August 2018
                Page count
                Figures: 8, Tables: 6, Pages: 22
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000121, Division of Mathematical Sciences;
                Award ID: 1264058
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000121, Division of Mathematical Sciences;
                Award ID: 1554821
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100000054, National Cancer Institute;
                Award ID: CA096520
                Award Recipient :
                This work was supported by National Science Foundation - Division of Mathematical Sciences. Grant Number: 1264058. URL: https://www.nsf.gov. Recipient: GA; National Science Foundation - Division of Mathematical Sciences. Grant Number: 1554821. URL: https://www.nsf.gov. Recipient: GA; and National Institutes of Health - National Cancer Institute T32 Training program in Biostatistics for Cancer Research. Grant Number: CA096520. URL: https://www.nih.gov. Recipient: JN. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and life sciences
                Cell biology
                Chromosome biology
                Chromatin
                Chromatin modification
                DNA methylation
                Biology and life sciences
                Genetics
                Epigenetics
                Chromatin
                Chromatin modification
                DNA methylation
                Biology and life sciences
                Genetics
                Gene expression
                Chromatin
                Chromatin modification
                DNA methylation
                Biology and life sciences
                Genetics
                DNA
                DNA modification
                DNA methylation
                Biology and life sciences
                Biochemistry
                Nucleic acids
                DNA
                DNA modification
                DNA methylation
                Biology and life sciences
                Genetics
                Epigenetics
                DNA modification
                DNA methylation
                Biology and life sciences
                Genetics
                Gene expression
                DNA modification
                DNA methylation
                Physical Sciences
                Chemistry
                Chemical Reactions
                Methylation
                Biology and Life Sciences
                Genetics
                Epigenetics
                Biology and Life Sciences
                Genetics
                Genomics
                Functional Genomics
                Medicine and Health Sciences
                Oncology
                Cancers and Neoplasms
                Breast Tumors
                Breast Cancer
                Biology and Life Sciences
                Computational Biology
                Genome Complexity
                Copy Number Variation
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Complexity
                Copy Number Variation
                Medicine and Health Sciences
                Oncology
                Cancers and Neoplasms
                Lung and Intrathoracic Tumors
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Custom metadata
                Data are available from Dryad doi: 10.5061/dryad.h8m74.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article