154
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Computational analysis of bacterial RNA-Seq data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Recent advances in high-throughput RNA sequencing (RNA-seq) have enabled tremendous leaps forward in our understanding of bacterial transcriptomes. However, computational methods for analysis of bacterial transcriptome data have not kept pace with the large and growing data sets generated by RNA-seq technology. Here, we present new algorithms, specific to bacterial gene structures and transcriptomes, for analysis of RNA-seq data. The algorithms are implemented in an open source software system called Rockhopper that supports various stages of bacterial RNA-seq data analysis, including aligning sequencing reads to a genome, constructing transcriptome maps, quantifying transcript abundance, testing for differential gene expression, determining operon structures and visualizing results. We demonstrate the performance of Rockhopper using 2.1 billion sequenced reads from 75 RNA-seq experiments conducted with Escherichia coli, Neisseria gonorrhoeae, Salmonella enterica, Streptococcus pyogenes and Xenorhabdus nematophila. We find that the transcriptome maps generated by our algorithms are highly accurate when compared with focused experimental data from E. coli and N. gonorrhoeae, and we validate our system’s ability to identify novel small RNAs, operons and transcription start sites. Our results suggest that Rockhopper can be used for efficient and accurate analysis of bacterial RNA-seq data, and that it can aid with elucidation of bacterial transcriptomes.

          Related collections

          Most cited references37

          • Record: found
          • Abstract: not found
          • Article: not found

          Identification of common molecular subsequences.

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy

            The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Small-sample estimation of negative binomial dispersion, with applications to SAGE data.

              We derive a quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution and compare its performance, in terms of bias, to various other methods. Our estimation scheme outperforms all other methods in very small samples, typical of those from serial analysis of gene expression studies, the motivating data for this study. The impact of dispersion estimation on hypothesis testing is studied. We derive an "exact" test that outperforms the standard approximate asymptotic tests.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Res
                nar
                nar
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                August 2013
                28 May 2013
                28 May 2013
                : 41
                : 14
                : e140
                Affiliations
                1Department of Microbiology, Boston University School of Medicine, Boston, MA 02118, USA, 2Department of Medicine, Section of Infectious Diseases, Boston University School of Medicine, Boston, MA 02118, USA, 3Department of Microbiology, University of Illinois, Urbana, IL 61801, USA, 4Department of Pathology, Center for Molecular and Translational Human Infectious Diseases Research, The Methodist Hospital Research Institute, Houston, TX 77030, USA and 5Computer Science Department, Wellesley College, Wellesley, MA 02481, USA
                Author notes
                *To whom correspondence should be addressed. Tel: +1 781 283 3354; Fax: +1 781 283 3642; Email: btjaden@ 123456wellesley.edu

                Present address: Yan Sun, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA.

                Article
                gkt444
                10.1093/nar/gkt444
                3737546
                23716638
                41b05986-c129-4c9a-add6-a69f106b0040
                © The Author(s) 2013. Published by Oxford University Press.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 January 2013
                : 26 March 2013
                : 1 May 2013
                Page count
                Pages: 16
                Categories
                Methods Online

                Genetics
                Genetics

                Comments

                Comment on this article