14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Motivation

          Mammalian genomes can contain thousands of enhancers but only a subset are actively driving gene expression in a given cellular context. Integrated genomic datasets can be harnessed to predict active enhancers. One challenge in integration of large genomic datasets is the increasing heterogeneity: continuous, binary and discrete features may all be relevant. Coupled with the typically small numbers of training examples, semi-supervised approaches for heterogeneous data are needed; however, current enhancer prediction methods are not designed to handle heterogeneous data in the semi-supervised paradigm.

          Results

          We implemented a Dirichlet Process Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over features. We derived a novel variational inference algorithm to handle semi-supervised learning tasks where certain observations are forced to cluster together. We applied this model to enhancer candidates in mouse heart tissues based on heterogeneous features. We constrained a small number of known active enhancers to appear in the same cluster, and 47 additional regions clustered with them. Many of these are located near heart-specific genes. The model also predicted 1176 active promoters, suggesting that it can discover new enhancers and promoters.

          Availability and implementation

          We created the ‘dphmix’ Python package: https://pypi.org/project/dphmix/.

          Supplementary information

          Supplementary data are available at Bioinformatics online.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: not found

          Unsupervised pattern discovery in human chromatin structure through genomic segmentation.

          We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and open chromatin, all derived from a human chronic myeloid leukemia cell line. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, transcriptional regulator CTCF-binding regions and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Epigenetic signatures distinguish multiple classes of enhancers with distinct cellular functions.

            Epigenetic regulation of gene enhancer elements is important for establishing and maintaining the identity of cells. Gene enhancer elements are thought to exist in either active or poised states distinguishable by chromatin features, but a complete understanding of the regulation of enhancers is lacking. Here, by using mouse embryonic stem cells and their differentiated derivatives, as well as terminally differentiated cells, we report the coexistence of multiple, defined classes of enhancers that serve distinct cellular functions. Specifically, we found that active enhancers can be subclassified based on varying levels of H3K4me1, H3K27ac, and H3K36me3 and the pSer2/5 forms of RNA polymerase II. The abundance of these histone modifications positively correlates with the expression of associated genes and cellular functions consistent with the identity of the cell type. Poised enhancers can also be subclassified based on presence or absence of H3K27me3 and H3K9me3, conservation, genomic location, expression levels of associated genes, and predicted function of associated genes. These findings not only refine the repertoire of histone modifications at both active and poised gene enhancer elements but also raise the possibility that enhancers associated with distinct cellular functions are partitioned based on specific combinations of histone modifications.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Gibbs Sampling Methods for Stick-Breaking Priors

                Bookmark

                Author and article information

                Contributors
                Role: Associate Editor
                Journal
                Bioinformatics
                Bioinformatics
                bioinformatics
                Bioinformatics
                Oxford University Press
                1367-4803
                1367-4811
                15 September 2019
                07 February 2019
                07 February 2019
                : 35
                : 18
                : 3232-3239
                Affiliations
                [1 ] Department of Computer Science, University of Toronto , Toronto, ON, Canada
                [2 ] Department of Cell & Systems Biology, University of Toronto , Toronto, ON, Canada
                [3 ] Centre for the Analysis of Genome Evolution and Function, University of Toronto , Toronto, ON, Canada
                Author notes
                To whom correspondence should be addressed. E-mail: alan.moses@ 123456utoronto.ca
                Article
                btz064
                10.1093/bioinformatics/btz064
                6748727
                30753279
                fc9e1a31-cc8d-4ace-a584-413475ab34e6
                © The Author(s) 2019. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

                History
                : 12 October 2018
                : 03 January 2019
                : 06 February 2019
                Page count
                Pages: 8
                Funding
                Funded by: Natural Sciences and Engineering Research Council of Canada 10.13039/501100000038
                Funded by: NSERC 10.13039/501100000038
                Funded by: Canada Foundation for Innovation 10.13039/501100000196
                Funded by: Ontario Ministry of Research and Innovation
                Funded by: Connaught International Scholarships
                Categories
                Original Papers
                Genome Analysis

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article