82
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Peptide recognition domains and transcription factors play crucial roles in cellular signaling. They bind linear stretches of amino acids or nucleotides, respectively, with high specificity. Experimental techniques that assess the binding specificity of these domains, such as microarrays or phage display, can retrieve thousands of distinct ligands, providing detailed insight into binding specificity. In particular, the advent of next-generation sequencing has recently increased the throughput of such methods by several orders of magnitude. These advances have helped reveal the presence of distinct binding specificity classes that co-exist within a set of ligands interacting with the same target. Here, we introduce a software system called MUSI that can rapidly analyze very large data sets of binding sequences to determine the relevant binding specificity patterns. Our pipeline provides two major advances. First, it can detect previously unrecognized multiple specificity patterns in any data set. Second, it offers integrated processing of very large data sets from next-generation sequencing machines. The results are visualized as multiple sequence logos describing the different binding preferences of the protein under investigation. We demonstrate the performance of MUSI by analyzing recent phage display data for human SH3 domains as well as microarray data for mouse transcription factors.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: not found

          Direct binding of Smad3 and Smad4 to critical TGF beta-inducible elements in the promoter of human plasminogen activator inhibitor-type 1 gene.

          Smad proteins play a key role in the intracellular signalling of transforming growth factor beta (TGF beta), which elicits a large variety of cellular responses. Upon TGF beta receptor activation, Smad2 and Smad3 become phosphorylated and form heteromeric complexes with Smad4. These complexes translocate to the nucleus where they control expression of target genes. However, the mechanism by which Smads mediate transcriptional regulation is largely unknown. Human plasminogen activator inhibitor-1 (PAI-1) is a gene that is potently induced by TGF beta. Here we report the identification of Smad3/Smad4 binding sequences, termed CAGA boxes, within the promoter of the human PAI-1 gene. The CAGA boxes confer TGF beta and activin, but not bone morphogenetic protein (BMP) stimulation to a heterologous promoter reporter construct. Importantly, mutation of the three CAGA boxes present in the PAI-1 promoter was found to abolish TGF beta responsiveness. Thus, CAGA elements are essential and sufficient for the induction by TGF beta. In addition, TGFbeta induces the binding of a Smad3/Smad4-containing nuclear complex to CAGA boxes. Furthermore, bacterially expressed Smad3 and Smad4 proteins, but not Smad1 nor Smad2 protein, bind directly to this sequence in vitro. The presence of this box in TGF beta-responsive regions of several other genes suggests that this may be a widely used motif in TGF beta-regulated transcription.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data

            Molecular interactions between protein complexes and DNA carry out essential gene regulatory functions. Uncovering such interactions by means of chromatin-immunoprecipitation coupled with massively parallel sequencing (ChIP-Seq) has recently become the focus of intense interest. We here introduce QuEST (Quantitative Enrichment of Sequence Tags), a powerful statistical framework based on the Kernel Density Estimation approach, which utilizes ChIP-Seq data to determine positions where protein complexes come into contact with DNA. Using QuEST, we discovered several thousand binding sites for the human transcription factors SRF, GABP and NRSF at an average resolution of about 20 base-pairs. MEME-based motif analyses on the QuEST-identified sequences revealed DNA binding by cofactors of SRF, providing evidence that cofactor binding specificity can be obtained from ChIP-Seq data. By combining QuEST analyses with gene ontology (GO) annotations and expression data, we illustrate how general functions of transcription factors can be inferred.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences.

              Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as Drosophila and C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                March 2012
                March 2012
                31 December 2011
                31 December 2011
                : 40
                : 6
                : e47
                Affiliations
                1The Donnelly Centre, 2Banting and Best Department of Medical Research, 3Department of Computer Science, University of Toronto, Toronto, ON, Canada M5S 3E1, 4The Edward S. Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, ON, Canada M5S 3G4, 5Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada M5S 1A4 and 6Swiss Institute of Bioinformatics, Molecular Modeling, Génopode, CH-1015 Lausanne, Switzerland
                Author notes
                *To whom correspondence should be addressed. Tel: +41 21 692 4081; Fax: +41 21 692 4065; Email: david.gfeller@ 123456isb-sib.ch
                Correspondence may also be addressed to Philip M. Kim. Tel: 416-946-3419; Fax: 416-978-8287; Email: pi@ 123456kimlab.org
                Article
                gkr1294
                10.1093/nar/gkr1294
                3315295
                22210894
                80b3455d-7afc-48f4-bb01-7e6e0b609f42
                © The Author(s) 2011. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 17 August 2011
                : 12 December 2011
                : 15 December 2011
                Page count
                Pages: 8
                Categories
                Methods Online

                Genetics
                Genetics

                Comments

                Comment on this article