Computational analysis of bacterial RNA-Seq data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Recent advances in high-throughput RNA sequencing (RNA-seq) have enabled tremendous leaps forward in our understanding of bacterial transcriptomes. However, computational methods for analysis of bacterial transcriptome data have not kept pace with the large and growing data sets generated by RNA-seq technology. Here, we present new algorithms, specific to bacterial gene structures and transcriptomes, for analysis of RNA-seq data. The algorithms are implemented in an open source software system called Rockhopper that supports various stages of bacterial RNA-seq data analysis, including aligning sequencing reads to a genome, constructing transcriptome maps, quantifying transcript abundance, testing for differential gene expression, determining operon structures and visualizing results. We demonstrate the performance of Rockhopper using 2.1 billion sequenced reads from 75 RNA-seq experiments conducted with Escherichia coli, Neisseria gonorrhoeae, Salmonella enterica, Streptococcus pyogenes and Xenorhabdus nematophila. We find that the transcriptome maps generated by our algorithms are highly accurate when compared with focused experimental data from E. coli and N. gonorrhoeae, and we validate our system’s ability to identify novel small RNAs, operons and transcription start sites. Our results suggest that Rockhopper can be used for efficient and accurate analysis of bacterial RNA-seq data, and that it can aid with elucidation of bacterial transcriptomes.

Related collections

Most cited references 37

Record: found
Abstract: not found
Article: not found

Identification of common molecular subsequences.

T.F. Smith, M.S. Waterman (1981)

0 comments Cited 1696 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: found

Is Open Access

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Kim D. Pruitt, Tatiana Tatusova, Garth R. Brown … (2011)

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

0 comments Cited 541 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Small-sample estimation of negative binomial dispersion, with applications to SAGE data.

Mark Robinson, Gordon K. Smyth (2008)

We derive a quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution and compare its performance, in terms of bias, to various other methods. Our estimation scheme outperforms all other methods in very small samples, typical of those from serial analysis of gene expression studies, the motivating data for this study. The impact of dispersion estimation on hypothesis testing is studied. We derive an "exact" test that outperforms the standard approximate asymptotic tests.

0 comments Cited 459 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (iso-abbrev): Nucleic Acids Res

Journal ID (publisher-id): nar

Journal ID (hwp): nar

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date (Print): August 2013

Publication date (Electronic): 28 May 2013

Publication date PMC-release: 28 May 2013

Volume: 41

Issue: 14

Page: e140

Affiliations

¹Department of Microbiology, Boston University School of Medicine, Boston, MA 02118, USA, ²Department of Medicine, Section of Infectious Diseases, Boston University School of Medicine, Boston, MA 02118, USA, ³Department of Microbiology, University of Illinois, Urbana, IL 61801, USA, ⁴Department of Pathology, Center for Molecular and Translational Human Infectious Diseases Research, The Methodist Hospital Research Institute, Houston, TX 77030, USA and ⁵Computer Science Department, Wellesley College, Wellesley, MA 02481, USA

Author notes

*To whom correspondence should be addressed. Tel: +1 781 283 3354; Fax: +1 781 283 3642; Email: btjaden@ 123456wellesley.edu

Present address: Yan Sun, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA.

Article

Publisher ID: gkt444

DOI: 10.1093/nar/gkt444

PMC ID: 3737546

PubMed ID: 23716638

SO-VID: 41b05986-c129-4c9a-add6-a69f106b0040

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

History

Date received : 14 January 2013

Date revision received : 26 March 2013

Date accepted : 1 May 2013

Page count

Pages: 16

Comments

Comment on this article

scite_

Cited by 264

See all cited by

Most referenced authors 1,340

See all reference authors

- Version 1

Computational analysis of bacterial RNA-Seq data

Read this article at

Abstract

Related collections

RNA drug delivery

Most cited references 37

Identification of common molecular subsequences.

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

Small-sample estimation of negative binomial dispersion, with applications to SAGE data.

Author and article information

Journal

Affiliations

Author notes

Article

History

Page count

Categories

Comments

Comment on this article

Similar content 354

Cited by 264

Most referenced authors 1,340