Query-Dependent Banding (QDB) for Faster RNA Similarity Searches

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

When searching sequence databases for RNAs, it is desirable to score both primary sequence and RNA secondary structure similarity. Covariance models (CMs) are probabilistic models well-suited for RNA similarity search applications. However, the computational complexity of CM dynamic programming alignment algorithms has limited their practical application. Here we describe an acceleration method called query-dependent banding (QDB), which uses the probabilistic query CM to precalculate regions of the dynamic programming lattice that have negligible probability, independently of the target database. We have implemented QDB in the freely available Infernal software package. QDB reduces the average case time complexity of CM alignment from LN ^2.4 to LN ^1.3 for a query RNA of N residues and a target database of L residues, resulting in a 4-fold speedup for typical RNA queries. Combined with other improvements to Infernal, including informative mixture Dirichlet priors on model parameters, benchmarks also show increased sensitivity and specificity resulting from improved parameterization.

Author Summary

Database similarity searching is the sine qua non of computational molecular biology. Well-known and powerful methods exist for primary sequence searches, such as Blast and profile hidden Markov models. However, for RNA analysis, biologists rely not only on primary sequence but also on conserved RNA secondary structure to manually align and compare RNAs, and most computational tools for identifying RNA structural homologs remain too slow for large-scale use. We describe a new algorithm for accelerating one of the most general and powerful classes of methods for RNA sequence and structure analysis, so-called profile SCFG (stochastic context-free grammar) RNA similarity search methods. We describe this approach, called query-dependent banding, in the context of this and other improvements in a practical implementation, the freely available Infernal software package, the basis of the Rfam RNA family database for genome annotation. Infernal is now a faster, more sensitive, and more specific software tool for identifying homologs of structural RNAs.

Related collections

Most cited references 36

Record: found
Abstract: found
Article: not found

Pfam: clans, web tools and services

Robert D. Finn, Jaina Mistry, Benjamin Schuster-Böckler … (2005)

Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (), the USA (), France () and Sweden ().

0 comments Cited 680 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

Michael Brudno, Chuong B. Do, Gregory M. Cooper … (2003)

To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. We present LAGAN, a system for rapid global alignment of two homologous genomic sequences, and Multi-LAGAN, a system for multiple global alignment of genomic sequences. We tested our systems on a data set consisting of greater than 12 Mb of high-quality sequence from 12 vertebrate species. All the sequence was derived from the genomic region orthologous to an approximately 1.5-Mb region on human chromosome 7q31.3. We found that both LAGAN and Multi-LAGAN compare favorably with other leading alignment methods in correctly aligning protein-coding exons, especially between distant homologs such as human and chicken, or human and fugu. Multi-LAGAN produced the most accurate alignments, while requiring just 75 minutes on a personal computer to obtain the multiple alignment of all 12 sequences. Multi-LAGAN is a practical method for generating multiple alignments of long genomic sequences at any evolutionary distance. Our systems are publicly available at http://lagan.stanford.edu.

0 comments Cited 377 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Vertebrate microRNA genes.

Lee P Lim, Margaret Glasner, Soraya Yekta … (2003)

0 comments Cited 295 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

: Role: Editor

Journal

Journal ID (nlm-ta): PLoS Comput Biol

Journal ID (publisher-id): pcbi

Title: PLoS Computational Biology

Publisher: Public Library of Science (San Francisco, USA )

ISSN (Print): 1553-734X

ISSN (Electronic): 1553-7358

Publication date (Print): March 2007

Publication date (Electronic): 30 March 2007

Publication date (Electronic preprint): 7 February 2007

Volume: 3

Issue: 3

Electronic Location Identifier: e56

Affiliations

[1]Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, Virginia, United States of America

Washington University, United States of America

Author notes

* To whom correspondence should be addressed. E-mail: eddys@ 123456janelia.hhmi.org

Article

Publisher ID: 06-PLCB-RA-0503R2 Serial Item and Contribution ID: plcb-03-03-24

DOI: 10.1371/journal.pcbi.0030056

PMC ID: 1847999

PubMed ID: 17397253

SO-VID: f26205eb-f7fa-4dd8-83b0-2f41106806f4

Copyright © Copyright: © 2007 Nawrocki and Eddy. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

History

Date received : 1 December 2006

Date accepted : 6 February 2007

Page count

Pages: 15

Custom metadata

citation Nawrocki EP, Eddy SR (2007) Query-dependent banding (QDB) for faster RNA similarity searches. PLoS Comput Biol 3(3): e56. doi: 10.1371/journal.pcbi.0030056

Query-Dependent Banding (QDB) for Faster RNA Similarity Searches

Read this article at

Abstract

Author Summary

Related collections

Journal of Systems Thinking Preprints

Most cited references 36

Pfam: clans, web tools and services

LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.

Vertebrate microRNA genes.

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Custom metadata

Comments

Comment on this article

Similar content 6

Cited by 68

Most referenced authors 1,001