Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Fragrep: An Efficient Search Tool for Fragmented Patterns in Genomic Sequences

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      Many classes of non-coding RNAs (ncRNAs; including Y RNAs, vault RNAs, RNase P RNAs, and MRP RNAs, as well as a novel class recently discovered in Dictyostelium discoideum) can be characterized by a pattern of short but well-conserved sequence elements that are separated by poorly conserved regions of sometimes highly variable lengths. Local alignment algorithms such as BLAST are therefore ill-suited for the discovery of new homologs of such ncRNAs in genomic sequences. The Fragrep tool instead implements an efficient algorithm for detecting the pattern fragments that occur in a given order. For each pattern fragment, the mismatch tolerance and bounds on the length of the intervening sequences can be specified separately. Furthermore, matches can be ranked by a statistically well-motivated scoring scheme.

      Related collections

      Most cited references 12

      • Record: found
      • Abstract: found
      • Article: not found

      tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

      We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes.
        Bookmark
        • Record: found
        • Abstract: found
        • Article: not found

        RNAMotif, an RNA secondary structure definition and search algorithm.

        RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures are assembled from a collection of RNA structural motifs. These basic building blocks are used repeatedly, and in various combinations, to form different RNA types and define their unique structural and functional properties. Identification of recurring RNA structural motifs will therefore enhance our understanding of RNA structure and help associate elements of RNA structure with functional and regulatory elements. Our goal was to develop a computer program that can describe an RNA structural element of any complexity and then search any nucleotide sequence database, including the complete prokaryotic and eukaryotic genomes, for these structural elements. Here we describe in detail a new computational motif search algorithm, RNAMotif, and demonstrate its utility with some motif search examples. RNAMotif differs from other motif search tools in two important aspects: first, the structure definition language is more flexible and can specify any type of base-base interaction; second, RNAMotif provides a user controlled scoring section that can be used to add capabilities that patterns alone cannot provide.
          Bookmark
          • Record: found
          • Abstract: found
          • Article: not found

          Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

          We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs. Copyright 2001 Academic Press.
            Bookmark

            Author and article information

            Affiliations
            [1 ]Department of Combinatorics and Geometry, CAS/MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
            [2 ]Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig D-04103, Germany
            [3 ]Institute for Theoretical Chemistry, University of Vienna, Vienna A-1090, Austria
            [4 ]The Santa Fe Institute, Santa Fe, NM 87501, USA
            [5 ]Max Planck Institute for Mathematics in the Sciences, Leipzig D-04103, Germany
            Author notes
            [* ]Corresponding author. mosig@ 123456sibs.ac.cn
            Contributors
            Journal
            Genomics Proteomics Bioinformatics
            Genomics Proteomics Bioinformatics
            Genomics, Proteomics & Bioinformatics
            Elsevier
            1672-0229
            2210-3244
            18 April 2006
            2006
            18 April 2006
            : 4
            : 1
            : 56-60
            16689703 5054030 S1672-0229(06)60017-X 10.1016/S1672-0229(06)60017-X
            © 2006 Beijing Institute of Genomics

            This is an open access article under the CC BY-NC-SA license (http://creativecommons.org/licenses/by-nc-sa/3.0/).

            Categories
            Method

            Comments

            Comment on this article