Blog
About

17
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Bacterial antisense RNAs are mainly the product of transcriptional noise

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Most of the antisense transcripts in bacteria are the product of transcriptional noise derived from spurious promoters.

          Abstract

          cis-Encoded antisense RNAs (asRNAs) are widespread along bacterial transcriptomes. However, the role of most of these RNAs remains unknown, and there is an ongoing discussion as to what extent these transcripts are the result of transcriptional noise. We show, by comparative transcriptomics of 20 bacterial species and one chloroplast, that the number of asRNAs is exponentially dependent on the genomic AT content and that expression of asRNA at low levels exerts little impact in terms of energy consumption. A transcription model simulating mRNA and asRNA production indicates that the asRNA regulatory effect is only observed above certain expression thresholds, substantially higher than physiological transcript levels. These predictions were verified experimentally by overexpressing nine different asRNAs in Mycoplasma pneumoniae. Our results suggest that most of the antisense transcripts found in bacteria are the consequence of transcriptional noise, arising at spurious promoters throughout the genome.

          Related collections

          Most cited references 67

          • Record: found
          • Abstract: found
          • Article: not found

          Integrative Genomics Viewer

          To the Editor Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole genome sequencing, epigenetic surveys, expression profiling of coding and non-coding RNAs, SNP and copy number profiling, and functional assays. Analysis of these large, diverse datasets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large datasets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data poses a significant challenge to the development of such tools. To address this challenge we developed the Integrative Genomics Viewer (IGV), a lightweight visualization tool that enables intuitive real-time exploration of diverse, large-scale genomic datasets on standard desktop computers. It supports flexible integration of a wide range of genomic data types including aligned sequence reads, mutations, copy number, RNAi screens, gene expression, methylation, and genomic annotations (Figure S1). The IGV makes use of efficient, multi-resolution file formats to enable real-time exploration of arbitrarily large datasets over all resolution scales, while consuming minimal resources on the client computer (see Supplementary Text). Navigation through a dataset is similar to Google Maps, allowing the user to zoom and pan seamlessly across the genome at any level of detail from whole-genome to base pair (Figure S2). Datasets can be loaded from local or remote sources, including cloud-based resources, enabling investigators to view their own genomic datasets alongside publicly available data from, for example, The Cancer Genome Atlas (TCGA) 1 , 1000 Genomes (www.1000genomes.org/), and ENCODE 2 (www.genome.gov/10005107) projects. In addition, IGV allows collaborators to load and share data locally or remotely over the Web. IGV supports concurrent visualization of diverse data types across hundreds, and up to thousands of samples, and correlation of these integrated datasets with clinical and phenotypic variables. A researcher can define arbitrary sample annotations and associate them with data tracks using a simple tab-delimited file format (see Supplementary Text). These might include, for example, sample identifier (used to link different types of data for the same patient or tissue sample), phenotype, outcome, cluster membership, or any other clinical or experimental label. Annotations are displayed as a heatmap but more importantly are used for grouping, sorting, filtering, and overlaying diverse data types to yield a comprehensive picture of the integrated dataset. This is illustrated in Figure 1, a view of copy number, expression, mutation, and clinical data from 202 glioblastoma samples from the TCGA project in a 3 kb region around the EGFR locus 1, 3 . The investigator first grouped samples by tumor subtype, then by data type (copy number and expression), and finally sorted them by median copy number over the EGFR locus. A shared sample identifier links the copy number and expression tracks, maintaining their relative sort order within the subtypes. Mutation data is overlaid on corresponding copy number and expression tracks, based on shared participant identifier annotations. Several trends in the data stand out, such as a strong correlation between copy number and expression and an overrepresentation of EGFR amplified samples in the Classical subtype. IGV’s scalable architecture makes it well suited for genome-wide exploration of next-generation sequencing (NGS) datasets, including both basic aligned read data as well as derived results, such as read coverage. NGS datasets can approach terabytes in size, so careful management of data is necessary to conserve compute resources and to prevent information overload. IGV varies the displayed level of detail according to resolution scale. At very wide views, such as the whole genome, IGV represents NGS data by a simple coverage plot. Coverage data is often useful for assessing overall quality and diagnosing technical issues in sequencing runs (Figure S3), as well as analysis of ChIP-Seq 4 and RNA-Seq 5 experiments (Figures S4 and S5). As the user zooms below the ~50 kb range, individual aligned reads become visible (Figure 2) and putative SNPs are highlighted as allele counts in the coverage plot. Alignment details for each read are available in popup windows (Figures S6 and S7). Zooming further, individual base mismatches become visible, highlighted by color and intensity according to base call and quality. At this level, the investigator may sort reads by base, quality, strand, sample and other attributes to assess the evidence of a variant. This type of visual inspection can be an efficient and powerful tool for variant call validation, eliminating many false positives and aiding in confirmation of true findings (Figures S6 and S7). Many sequencing protocols produce reads from both ends (“paired ends”) of genomic fragments of known size distribution. IGV uses this information to color-code paired ends if their insert sizes are larger than expected, fall on different chromosomes, or have unexpected pair orientations. Such pairs, when consistent across multiple reads, can be indicative of a genomic rearrangement. When coloring aberrant paired ends, each chromosome is assigned a unique color, so that intra- (same color) and inter- (different color) chromosomal events are readily distinguished (Figures 2 and S8). We note that misalignments, particularly in repeat regions, can also yield unexpected insert sizes, and can be diagnosed with the IGV (Figure S9). There are a number of stand-alone, desktop genome browsers available today 6 including Artemis 7 , EagleView 8 , MapView 9 , Tablet 10 , Savant 11 , Apollo 12 , and the Integrated Genome Browser 13 . Many of them have features that overlap with IGV, particularly for NGS sequence alignment and genome annotation viewing. The Integrated Genome Browser also supports viewing array-based data. See Supplementary Table 1 and Supplementary Text for more detail. IGV focuses on the emerging integrative nature of genomic studies, placing equal emphasis on array-based platforms, such as expression and copy-number arrays, next-generation sequencing, as well as clinical and other sample metadata. Indeed, an important and unique feature of IGV is the ability to view all these different data types together and to use the sample metadata to dynamically group, sort, and filter datasets (Figure 1 above). Another important characteristic of IGV is fast data loading and real-time pan and zoom – at all scales of genome resolution and all dataset sizes, including datasets comprising hundreds of samples. Finally, we have placed great emphasis on the ease of installation and use of IGV, with the goal of making both the viewing and sharing of their data accessible to non-informatics end users. IGV is open source software and freely available at http://www.broadinstitute.org/igv/, including full documentation on use of the software. Supplementary Material 1
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Probability-based protein identification by searching sequence databases using mass spectrometry data

            Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Mapping short DNA sequencing reads and calling variants using mapping quality scores.

              New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.
                Bookmark

                Author and article information

                Journal
                Sci Adv
                Sci Adv
                SciAdv
                advances
                Science Advances
                American Association for the Advancement of Science
                2375-2548
                March 2016
                04 March 2016
                : 2
                : 3
                Affiliations
                [1 ]EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain.
                [2 ]Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain.
                [3 ]MSD Animal Health, Bioprocess Technology and Support, 5830 AB Boxmeer, Netherlands.
                [4 ]Laboratory of Systems and Synthetic Biology, Wageningen University, 6700 EJ Wageningen, Netherlands.
                [5 ]Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de València, 46980 Paterna, València, Spain.
                [6 ]Área de Genómica y Salud, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana (FISABIO)—Salud Pública, 46020 Valencia, Spain.
                [7 ]European Molecular Biology Laboratory, 69117 Heidelberg, Germany.
                [8 ]Max Delbrück Centre (MDC) for Molecular Medicine, 13125 Berlin, Germany.
                [9 ]Department of Synthetic Biology and Bioenergy, J. Craig Venter Institute, La Jolla, CA 92037, USA.
                [10 ]Department of Synthetic Biology and Bioenergy, J. Craig Venter Institute, Rockville, MD 20850, USA.
                [11 ]Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain.
                Author notes
                [* ]Corresponding author. E-mail: luis.serrano@ 123456crg.eu (L.S.); maria.lluch@ 123456crg.es (M.L.-S.)
                Article
                1501363
                10.1126/sciadv.1501363
                4783119
                26973873
                Copyright © 2016, The Authors

                This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100004963, Seventh Framework Programme;
                Award ID: ID0EWRBI4690
                Award ID: FP7/2007–2013
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100000781, European Research Council;
                Award ID: ID0EHSBI4691
                Award ID: 232913
                Award Recipient :
                Funded by: Fundación Botín, the Spanish Ministry of Economy and Competitiveness ;
                Award ID: ID0EYSBI4692
                Award ID: BIO2007-61762
                Award Recipient :
                Funded by: National Plan of R + D + i;
                Award ID: ID0EHTBI4693
                Award Recipient :
                Funded by: ISCIII – Subdirección General de Evaluación y Fomento de la Investigación;
                Award ID: ID0EUTBI4694
                Award ID: PI10/01702
                Award Recipient :
                Funded by: European Regional Development Fund (ERDF);
                Award ID: ID0EDUBI4695
                Award Recipient :
                Funded by: Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013–2017’;
                Award ID: ID0EPZBI4696
                Award ID: SEV-2012-0208
                Award Recipient :
                Funded by: Spanish Ministry of Economy and Competitivity;
                Award ID: ID0E5ZBI4697
                Award ID: BFU2012-39816-C02-01
                Award Recipient :
                Categories
                Research Article
                Research Articles
                SciAdv r-articles
                Biomolecules
                Custom metadata
                Abel Bellen

                bacterial antisense rnas, rna

                Comments

                Comment on this article