19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences

      Preprint
      , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.

          Related collections

          Most cited references1

          • Record: found
          • Abstract: found
          • Article: not found

          Transcriptomics in the RNA-seq era.

          The transcriptomics field has developed rapidly with the advent of next-generation sequencing technologies. RNA-seq has now displaced microarrays as the preferred method for gene expression profiling. The comprehensive nature of the data generated has been a boon in terms of transcript identification but analysis challenges remain. Key among these problems is the development of suitable expression metrics for expression level comparisons and methods for identification of differentially expressed genes (and exons). Several approaches have been developed but as yet no consensus exists on the best pipeline to use. De novo transcriptome approaches are increasingly viable for organisms lacking a sequenced genome. The reduction in starting RNA required has enabled the development of new applications such as single cell transcriptomics. The emerging picture of mammalian transcription is complex with further refinement expected with the integration of epigenomic data generated by projects such as ENCODE. Copyright © 2013 Elsevier Ltd. All rights reserved.
            Bookmark

            Author and article information

            Journal
            31 January 2014
            Article
            1402.0042
            336c0200-d846-4f7e-be2a-2c8dc97a7577

            http://arxiv.org/licenses/nonexclusive-distrib/1.0/

            History
            Custom metadata
            q-bio.GN

            Comments

            Comment on this article