14
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identification of a dinucleotide signature that discriminates coding from non-coding long RNAs

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          To date, the main criterion by which long ncRNAs (lncRNAs) are discriminated from mRNAs is based on the capacity of the transcripts to encode a protein. However, it becomes important to identify non-ORF-based sequence characteristics that can be used to parse between ncRNAs and mRNAs. In this study, we first established an extremely selective workflow to define a highly refined database of lncRNAs which was used for comparison with mRNAs. Then using this highly selective collection of lncRNAs, we found the CG dinucleotide frequencies were clearly distinct. In addition, we showed that the bias in CG dinucleotide frequency was conserved in human and mouse genomes. We propose that this sequence feature will serve as a useful classifier in transcript classification pipelines. We also suggest that our refined database of “bona fide” lncRNAs will be valuable for the discovery of other sequence characteristics distinct to lncRNAs.

          Related collections

          Most cited references30

          • Record: found
          • Abstract: found
          • Article: not found

          RNA maps reveal new RNA classes and a possible function for pervasive transcription.

          Significant fractions of eukaryotic genomes give rise to RNA, much of which is unannotated and has reduced protein-coding potential. The genomic origins and the associations of human nuclear and cytosolic polyadenylated RNAs longer than 200 nucleotides (nt) and whole-cell RNAs less than 200 nt were investigated in this genome-wide study. Subcellular addresses for nucleotides present in detected RNAs were assigned, and their potential processing into short RNAs was investigated. Taken together, these observations suggest a novel role for some unannotated RNAs as primary transcripts for the production of short RNAs. Three potentially functional classes of RNAs have been identified, two of which are syntenically conserved and correlate with the expression state of protein-coding genes. These data support a highly interleaved organization of the human transcriptome.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Small silencing RNAs: an expanding universe.

            Since the discovery in 1993 of the first small silencing RNA, a dizzying number of small RNA classes have been identified, including microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs). These classes differ in their biogenesis, their modes of target regulation and in the biological pathways they regulate. There is a growing realization that, despite their differences, these distinct small RNA pathways are interconnected, and that small RNA pathways compete and collaborate as they regulate genes and protect the genome from external and internal threats.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Rfam: an RNA family database.

              Rfam is a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Rfam is available on the web in the UK at http://www.sanger.ac.uk/Software/Rfam/ and in the US at http://rfam.wustl.edu/. These websites allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation. The database can also be downloaded in flatfile form and searched locally using the INFERNAL package (http://infernal.wustl.edu/). The first release of Rfam (1.0) contains 25 families, which annotate over 50 000 non-coding RNA genes in the taxonomic divisions of the EMBL nucleotide database.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Genet
                Front Genet
                Front. Genet.
                Frontiers in Genetics
                Frontiers Media S.A.
                1664-8021
                07 August 2014
                09 September 2014
                2014
                : 5
                : 316
                Affiliations
                [1] 1CNRS UMR7216, Epigenetics and Cell Fate, Université Paris Diderot, Sorbonne Paris Cité Paris, France
                [2] 2The University of Queensland Diamantina Institute, The University of Queensland Brisbane, QLD, Australia
                Author notes

                Edited by: Manja Marz, University of Marburg, Germany

                Reviewed by: David Langenberger, ecSeq Bioinformatics, Germany; Pedro Miramontes, Universidad Nacional Autónoma de México, Mexico

                *Correspondence: Florent Hubé, UMR7216 - Epigénétique et Destin Cellulaire, Université Paris 7 Diderot, Bâtiment Lamarck - 4ème étage, Case Courrier 7042, 35, rue Hélène Brion, 75013 Paris, France e-mail: florent.hube@ 123456univ-paris-diderot.fr

                This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.

                Article
                10.3389/fgene.2014.00316
                4158813
                a5e250b6-fbb0-4330-a0f2-bfb4ab54a9fd
                Copyright © 2014 Ulveling, Dinger, Francastel and Hubé.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 26 June 2014
                : 22 August 2014
                Page count
                Figures: 2, Tables: 0, Equations: 0, References: 40, Pages: 5, Words: 4149
                Categories
                Genetics
                Hypothesis and Theory Article

                Genetics
                ncrna,mrna,cg dinucleotide,sequence biais,pseudogene,intron,exon,database
                Genetics
                ncrna, mrna, cg dinucleotide, sequence biais, pseudogene, intron, exon, database

                Comments

                Comment on this article