30
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design

      research-article
      1 , * , 2 , 3 , *
      PLoS ONE
      Public Library of Science

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: not found

          The impact of next-generation sequencing technology on genetics.

          If one accepts that the fundamental pursuit of genetics is to determine the genotypes that explain phenotypes, the meteoric increase of DNA sequence information applied toward that pursuit has nowhere to go but up. The recent introduction of instruments capable of producing millions of DNA sequence reads in a single run is rapidly changing the landscape of genetics, providing the ability to answer questions with heretofore unimaginable speed. These technologies will provide an inexpensive, genome-wide sequence readout as an endpoint to applications ranging from chromatin immunoprecipitation, mutation mapping and polymorphism discovery to noncoding RNA discovery. Here I survey next-generation sequencing technologies and consider how they can provide a more complete picture of how the genome shapes the organism.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Identification of functional elements and regulatory circuits by Drosophila modENCODE.

            To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              DNA binding sites: representation and discovery.

              G Stormo (2000)
              The purpose of this article is to provide a brief history of the development and application of computer algorithms for the analysis and prediction of DNA binding sites. This problem can be conveniently divided into two subproblems. The first is, given a collection of known binding sites, develop a representation of those sites that can be used to search new sequences and reliably predict where additional binding sites occur. The second is, given a set of sequences known to contain binding sites for a common factor, but not knowing where the sites are, discover the location of the sites in each sequence and a representation for the specificity of the protein.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                3 August 2016
                2016
                : 11
                : 8
                : e0160435
                Affiliations
                [1 ]College of Computer and Information Engineering, Tianjin Normal University, Tianjin, China
                [2 ]National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
                [3 ]Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas, United States of America
                New York University, UNITED STATES
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                • Conceived and designed the experiments: SZ.

                • Performed the experiments: SZ.

                • Analyzed the data: SZ YC.

                • Contributed reagents/materials/analysis tools: SZ YC.

                • Wrote the paper: SZ YC.

                • Designed the web server: SZ.

                Author information
                http://orcid.org/0000-0002-4127-0539
                Article
                PONE-D-16-07497
                10.1371/journal.pone.0160435
                4972426
                27487245
                e38b6747-7bd0-4b64-869a-cd3b4245f7f2
                © 2016 Zhang, Chen

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 21 February 2016
                : 19 July 2016
                Page count
                Figures: 8, Tables: 1, Pages: 17
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 61572358
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 61273228
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100006606, Natural Science Foundation of Tianjin City;
                Award ID: 16JCYBJC23600
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/501100006606, Natural Science Foundation of Tianjin City;
                Award ID: 15JCYBJC46600
                Award Recipient :
                The publication of this article has been funded by two grants (61572358 to SZ, 61273228 to YC) from the National Natural Science Foundation of China and two grants (15JCYBJC46600 and 16JCYBJC23600 to SZ) from Natural Science Foundation of Tianjin.
                Categories
                Research Article
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Sequencing Techniques
                Sequence Analysis
                Sequence Motif Analysis
                Research and Analysis Methods
                Molecular Biology Techniques
                Sequencing Techniques
                Sequence Analysis
                Sequence Motif Analysis
                Computer and Information Sciences
                Data Visualization
                Infographics
                Graphs
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Research and Analysis Methods
                Database and Informatics Methods
                Biological Databases
                Genomic Databases
                Biology and Life Sciences
                Computational Biology
                Genome Analysis
                Genomic Databases
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Analysis
                Genomic Databases
                Computer and Information Sciences
                Network Analysis
                Network Motifs
                Research and Analysis Methods
                Model Organisms
                Animal Models
                Drosophila Melanogaster
                Biology and Life Sciences
                Organisms
                Animals
                Invertebrates
                Arthropoda
                Insects
                Drosophila
                Drosophila Melanogaster
                Biology and Life Sciences
                Evolutionary Biology
                Evolutionary Systematics
                Phylogenetics
                Biology and Life Sciences
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Computer and Information Sciences
                Data Management
                Taxonomy
                Evolutionary Systematics
                Phylogenetics
                Biology and life sciences
                Biochemistry
                Proteins
                DNA-binding proteins
                Custom metadata
                All relevant data are within the paper.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article