Blog
About

91
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Genome-wide discovery and characterization of maize long non-coding RNAs

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Long non-coding RNAs (lncRNAs) are transcripts that are 200 bp or longer, do not encode proteins, and potentially play important roles in eukaryotic gene regulation. However, the number, characteristics and expression inheritance pattern of lncRNAs in maize are still largely unknown.

          Results

          By exploiting available public EST databases, maize whole genome sequence annotation and RNA-seq datasets from 30 different experiments, we identified 20,163 putative lncRNAs. Of these lncRNAs, more than 90% are predicted to be the precursors of small RNAs, while 1,704 are considered to be high-confidence lncRNAs. High confidence lncRNAs have an average transcript length of 463 bp and genes encoding them contain fewer exons than annotated genes. By analyzing the expression pattern of these lncRNAs in 13 distinct tissues and 105 maize recombinant inbred lines, we show that more than 50% of the high confidence lncRNAs are expressed in a tissue-specific manner, a result that is supported by epigenetic marks. Intriguingly, the inheritance of lncRNA expression patterns in 105 recombinant inbred lines reveals apparent transgressive segregation, and maize lncRNAs are less affected by cis- than by trans-genetic factors.

          Conclusions

          We integrate all available transcriptomic datasets to identify a comprehensive set of maize lncRNAs, provide a unique annotation resource of the maize genome and a genome-wide characterization of maize lncRNAs, and explore the genetic control of their expression using expression quantitative trait locus mapping.

          Related collections

          Most cited references 69

          • Record: found
          • Abstract: found
          • Article: not found

          TopHat: discovering splice junctions with RNA-Seq

          Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: cole@cs.umd.edu Supplementary information: Supplementary data are available at Bioinformatics online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Mapping and quantifying mammalian transcriptomes by RNA-Seq.

            We have mapped and quantified mouse transcriptomes by deeply sequencing them and recording how frequently each gene is represented in the sequence sample (RNA-Seq). This provides a digital measure of the presence and prevalence of transcripts from known and previously unknown genes. We report reference measurements composed of 41-52 million mapped 25-base-pair reads for poly(A)-selected RNA from adult mouse brain, liver and skeletal muscle tissues. We used RNA standards to quantify transcript prevalence and to test the linear range of transcript detection, which spanned five orders of magnitude. Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3' untranscribed regions, as well as new candidate microRNA precursors. RNA splice events, which are not readily measured by standard gene expression microarray or serial analysis of gene expression methods, were detected directly by mapping splice-crossing sequence reads. We observed 1.45 x 10(5) distinct splices, and alternative splices were prominent, with 3,500 different genes expressing one or more alternate internal splices.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms

              High-throughput mRNA sequencing (RNA-Seq) holds the promise of simultaneous transcript discovery and abundance estimation 1-3 . We introduce an algorithm for transcript assembly coupled with a statistical model for RNA-Seq experiments that produces estimates of abundances. Our algorithms are implemented in an open source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed more than 430 million paired 75bp RNA-Seq reads from a mouse myoblast cell line representing a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Analysis of transcript expression over the time series revealed complete switches in the dominant transcription start site (TSS) or splice-isoform in 330 genes, along with more subtle shifts in a further 1,304 genes. These dynamics suggest substantial regulatory flexibility and complexity in this well-studied model of muscle development.
                Bookmark

                Author and article information

                Contributors
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central
                1465-6906
                1465-6914
                2014
                27 February 2014
                : 15
                : 2
                : R40
                Affiliations
                [1 ]Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN 55108, USA
                [2 ]Department of Plant Biology, University of Minnesota, Saint Paul, MN 55108, USA
                [3 ]Department of Plant Biology, Cornell University, Ithaca, NY 14853, USA
                [4 ]Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
                [5 ]Department Agronomy, Iowa State University, Ames, IA 50011, USA
                [6 ]Center for Plant Genomics, Iowa State University, Ames, IA 50011-3650, USA
                [7 ]Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
                [8 ]Informatics Research Core Facility, University of Missouri, Columbia, MO 65211, USA
                [9 ]Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
                [10 ]Current address: Pioneer Hi-Bred, Johnston, IA 50131, USA
                Article
                gb-2014-15-2-r40
                10.1186/gb-2014-15-2-r40
                4053991
                24576388
                Copyright © 2014 Li et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                Categories
                Research

                Genetics

                Comments

                Comment on this article