84
views
0
recommends
+1 Recommend
0 collections
    4
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Query-Dependent Banding (QDB) for Faster RNA Similarity Searches

      research-article
      , *
      PLoS Computational Biology
      Public Library of Science

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN 2.4 to LN 1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

          Author Summary

          Database similarity searching is the sine qua non of computational molecular biology. Well-known and powerful methods exist for primary sequence searches, such as Blast and profile hidden Markov models. However, for RNA analysis, biologists rely not only on primary sequence but also on conserved RNA secondary structure to manually align and compare RNAs, and most computational tools for identifying RNA structural homologs remain too slow for large-scale use. We describe a new algorithm for accelerating one of the most general and powerful classes of methods for RNA sequence and structure analysis, so-called profile SCFG (stochastic context-free grammar) RNA similarity search methods. We describe this approach, called query-dependent banding, in the context of this and other improvements in a practical implementation, the freely available Infernal software package, the basis of the Rfam RNA family database for genome annotation. Infernal is now a faster, more sensitive, and more specific software tool for identifying homologs of structural RNAs.

          Related collections

          Most cited references36

          • Record: found
          • Abstract: found
          • Article: not found

          Pfam: clans, web tools and services

          Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (), the USA (), France () and Sweden ().
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

            To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an approximately 1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http://lagan.stanford.edu.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Vertebrate microRNA genes.

                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS Comput Biol
                pcbi
                PLoS Computational Biology
                Public Library of Science (San Francisco, USA )
                1553-734X
                1553-7358
                March 2007
                30 March 2007
                7 February 2007
                : 3
                : 3
                : e56
                Affiliations
                [1]Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, Virginia, United States of America
                Washington University, United States of America
                Author notes
                * To whom correspondence should be addressed. E-mail: eddys@ 123456janelia.hhmi.org
                Article
                06-PLCB-RA-0503R2 plcb-03-03-24
                10.1371/journal.pcbi.0030056
                1847999
                17397253
                f26205eb-f7fa-4dd8-83b0-2f41106806f4
                Copyright: © 2007 Nawrocki and Eddy. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
                History
                : 1 December 2006
                : 6 February 2007
                Page count
                Pages: 15
                Categories
                Research Article
                Computational Biology
                None
                Custom metadata
                Nawrocki EP, Eddy SR (2007) Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol 3(3): e56. doi: 10.1371/journal.pcbi.0030056

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article