10
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MrTADFinder: A network modularity based approach to identify topologically associating domains in multiple resolutions

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Genome-wide proximity ligation based assays such as Hi-C have revealed that eukaryotic genomes are organized into structural units called topologically associating domains (TADs). From a visual examination of the chromosomal contact map, however, it is clear that the organization of the domains is not simple or obvious. Instead, TADs exhibit various length scales and, in many cases, a nested arrangement. Here, by exploiting the resemblance between TADs in a chromosomal contact map and densely connected modules in a network, we formulate TAD identification as a network optimization problem and propose an algorithm, MrTADFinder, to identify TADs from intra-chromosomal contact maps. MrTADFinder is based on the network-science concept of modularity. A key component of it is deriving an appropriate background model for contacts in a random chain, by numerically solving a set of matrix equations. The background model preserves the observed coverage of each genomic bin as well as the distance dependence of the contact frequency for any pair of bins exhibited by the empirical map. Also, by introducing a tunable resolution parameter, MrTADFinder provides a self-consistent approach for identifying TADs at different length scales, hence the acronym "Mr" standing for Multiple Resolutions. We then apply MrTADFinder to various Hi-C datasets. The identified domain boundaries are marked by characteristic signatures in chromatin marks and transcription factors (TF) that are consistent with earlier work. Moreover, by calling TADs at different length scales, we observe that boundary signatures change with resolution, with different chromatin features having different characteristic length scales. Furthermore, we report an enrichment of HOT (high-occupancy target) regions near TAD boundaries and investigate the role of different TFs in determining boundaries at various resolutions. To further explore the interplay between TADs and epigenetic marks, as tumor mutational burden is known to be coupled to chromatin structure, we examine how somatic mutations are distributed across boundaries and find a clear stepwise pattern. Overall, MrTADFinder provides a novel computational framework to explore the multi-scale structures in Hi-C contact maps.

          Author summary

          The accommodation of the roughly 2m of DNA in the nuclei of mammalian cells results in an intricate structure, in which the topologically associating domains (TADs) formed by densely interacting genomic regions emerge as a fundamental structural unit. Identification of TADs is essential for understanding the role of 3D genome organization in gene regulation. By viewing the chromosomal contact map as a network, TADs correspond to the densely connected regions in the network. Motivated by this mapping, we propose a novel method, MrTADFinder, to identify TADs based on the concept of modularity in network science. Using MrTADFinder, we identify domains at various resolutions, and further explore the interplay between domains and other chromatin features like transcription factors binding and histone modifications at different resolutions. Overall, MrTADFinder provides a new computational framework to investigate the multiple length scales that are built inside the organization of the genome.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Modularity and community structure in networks

          M. Newman (2006)
          Many networks of interest in the sciences, including a variety of social and biological networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure has attracted considerable recent attention. One of the most sensitive detection methods is optimization of the quality function known as "modularity" over the possible divisions of a network, but direct application of this method using, for instance, simulated annealing is computationally costly. Here we show that the modularity can be reformulated in terms of the eigenvectors of a new characteristic matrix for the network, which we call the modularity matrix, and that this reformulation leads to a spectral algorithm for community detection that returns results of better quality than competing methods in noticeably shorter running times. We demonstrate the algorithm with applications to several network data sets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Architectural protein subclasses shape 3D organization of genomes during lineage commitment.

            Understanding the topological configurations of chromatin may reveal valuable insights into how the genome and epigenome act in concert to control cell fate during development. Here, we generate high-resolution architecture maps across seven genomic loci in embryonic stem cells and neural progenitor cells. We observe a hierarchy of 3D interactions that undergo marked reorganization at the submegabase scale during differentiation. Distinct combinations of CCCTC-binding factor (CTCF), Mediator, and cohesin show widespread enrichment in chromatin interactions at different length scales. CTCF/cohesin anchor long-range constitutive interactions that might form the topological basis for invariant subdomains. Conversely, Mediator/cohesin bridge short-range enhancer-promoter interactions within and between larger subdomains. Knockdown of Smc1 or Med12 in embryonic stem cells results in disruption of spatial architecture and downregulation of genes found in cohesin-mediated interactions. We conclude that cell-type-specific chromatin organization occurs at the submegabase scale and that architectural proteins shape the genome in hierarchical length scales. Copyright © 2013 Elsevier Inc. All rights reserved.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Resolution limit in community detection

              Detecting community structure is fundamental to clarify the link between structure and function in complex networks and is used for practical applications in many disciplines. A successful method relies on the optimization of a quantity called modularity [Newman and Girvan, Phys. Rev. E 69, 026113 (2004)], which is a quality index of a partition of a network into communities. We find that modularity optimization may fail to identify modules smaller than a scale which depends on the total number L of links of the network and on the degree of interconnectedness of the modules, even in cases where modules are unambiguously defined. The probability that a module conceals well-defined substructures is the highest if the number of links internal to the module is of the order of \sqrt{2L} or smaller. We discuss the practical consequences of this result by analyzing partitions obtained through modularity optimization in artificial and real networks.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: Project administrationRole: ResourcesRole: SoftwareRole: SupervisionRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Formal analysisRole: Visualization
                Role: Funding acquisitionRole: Project administrationRole: SupervisionRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                24 July 2017
                July 2017
                : 13
                : 7
                : e1005647
                Affiliations
                [1 ] Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States of America
                [2 ] Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, United States of America
                [3 ] Department of Computer Science, Yale University, New Haven, CT, United States of America
                University of California, San Diego, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                [¤]

                Current address: Department of Computational Biology, St Jude Children's Research Hospital, Memphis, TN, United States of America

                Author information
                http://orcid.org/0000-0003-2337-485X
                http://orcid.org/0000-0002-2626-5897
                http://orcid.org/0000-0002-9746-3719
                Article
                PCOMPBIOL-D-16-02110
                10.1371/journal.pcbi.1005647
                5546724
                28742097
                0c0e9e96-b912-4fda-bd67-55012e4545ce
                © 2017 Yan et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 31 December 2016
                : 27 June 2017
                Page count
                Figures: 8, Tables: 0, Pages: 22
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000051, National Human Genome Research Institute;
                Award ID: U41 HG007000
                Award Recipient :
                The work is supported by NHGRI award U41 HG007000. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromatin
                Biology and Life Sciences
                Genetics
                Epigenetics
                Chromatin
                Biology and Life Sciences
                Genetics
                Gene Expression
                Chromatin
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Gene Mapping
                Chromosome Mapping
                Research and Analysis Methods
                Molecular Biology Techniques
                Gene Mapping
                Chromosome Mapping
                Biology and life sciences
                Biochemistry
                Proteins
                DNA-binding proteins
                Transcription Factors
                Biology and Life Sciences
                Genetics
                Gene Expression
                Gene Regulation
                Transcription Factors
                Biology and Life Sciences
                Biochemistry
                Proteins
                Regulatory Proteins
                Transcription Factors
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Biology and Life Sciences
                Genetics
                Genetic Loci
                Biology and Life Sciences
                Cell Biology
                Chromosome Biology
                Chromatin
                Chromatin Modification
                Histone Modification
                Biology and Life Sciences
                Genetics
                Epigenetics
                Chromatin
                Chromatin Modification
                Histone Modification
                Biology and Life Sciences
                Genetics
                Gene Expression
                Chromatin
                Chromatin Modification
                Histone Modification
                Biology and Life Sciences
                Genetics
                Gene Expression
                Histone Modification
                Physical Sciences
                Mathematics
                Optimization
                Biology and Life Sciences
                Genetics
                Mutation
                Somatic Mutation
                Custom metadata
                vor-update-to-uncorrected-proof
                2017-08-07
                All relevant data are within the paper and its Supporting Information files.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article