Blog
About

7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      lncScore: alignment-free identification of long noncoding RNA from assembled novel transcripts

      1 , 2 , a , 1 , b , 2 , 3 , 4 , 5

      Scientific Reports

      Nature Publishing Group

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          RNA-Seq based transcriptome assembly has been widely used to identify novel lncRNAs. However, the best-performing transcript reconstruction methods merely identified 21% of full-length protein-coding transcripts from H. sapiens. Those partial-length protein-coding transcripts are more likely to be classified as lncRNAs due to their incomplete CDS, leading to higher false positive rate for lncRNA identification. Furthermore, potential sequencing or assembly error that gain or abolish stop codons also complicates ORF-based prediction of lncRNAs. Therefore, it remains a challenge to identify lncRNAs from the assembled transcripts, particularly the partial-length ones. Here, we present a novel alignment-free tool, lncScore, which uses a logistic regression model with 11 carefully selected features. Compared to other state-of-the-art alignment-free tools (e.g. CPAT, CNCI, and PLEK), lncScore outperforms them on accurately distinguishing lncRNAs from mRNAs, especially partial-length mRNAs in the human and mouse datasets. In addition, lncScore also performed well on transcripts from five other species (Zebrafish, Fly, C. elegans, Rat, and Sheep). To speed up the prediction, multithreading is implemented within lncScore, and it only took 2 minute to classify 64,756 transcripts and 54 seconds to train a new model with 21,000 transcripts with 12 threads, which is much faster than other tools. lncScore is available at https://github.com/WGLab/lncScore.

          Related collections

          Most cited references 35

          • Record: found
          • Abstract: found
          • Article: not found

          GENCODE: the reference human genome annotation for The ENCODE Project.

          The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.

            The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Evolution and functions of long noncoding RNAs.

              RNA is not only a messenger operating between DNA and protein. Transcription of essentially the entire eukaryotic genome generates a myriad of non-protein-coding RNA species that show complex overlapping patterns of expression and regulation. Although long noncoding RNAs (lncRNAs) are among the least well-understood of these transcript species, they cannot all be dismissed as merely transcriptional "noise." Here, we review the evolution of lncRNAs and their roles in transcriptional regulation, epigenetic gene regulation, and disease.
                Bookmark

                Author and article information

                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group
                2045-2322
                06 October 2016
                2016
                : 6
                Affiliations
                [1 ]Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics , Nanjing 210016, China
                [2 ]Zilkha Neurogenetic Institute, Keck School of Medicine, University of Southern California , Los Angeles, CA 90089, USA
                [3 ]Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California , Los Angeles, CA 90089, USA
                [4 ]Institute for Genomic Medicine, Columbia University Medical Center , New York, NY 10032, USA
                [5 ]Department of Biomedical Informatics, Columbia University Medical Center , New York, NY 10032, USA
                Author notes
                Article
                srep34838
                10.1038/srep34838
                5052565
                27708423
                Copyright © 2016, The Author(s)

                This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

                Categories
                Article

                Uncategorized

                Comments

                Comment on this article