24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Prediction and Quantification of Splice Events from RNA-Seq Data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Analysis of splice variants from short read RNA-seq data remains a challenging problem. Here we present a novel method for the genome-guided prediction and quantification of splice events from RNA-seq data, which enables the analysis of unannotated and complex splice events. Splice junctions and exons are predicted from reads mapped to a reference genome and are assembled into a genome-wide splice graph. Splice events are identified recursively from the graph and are quantified locally based on reads extending across the start or end of each splice variant. We assess prediction accuracy based on simulated and real RNA-seq data, and illustrate how different read aligners (GSNAP, HISAT2, STAR, TopHat2) affect prediction results. We validate our approach for quantification based on simulated data, and compare local estimates of relative splice variant usage with those from other methods (MISO, Cufflinks) based on simulated and real RNA-seq data. In a proof-of-concept study of splice variants in 16 normal human tissues (Illumina Body Map 2.0) we identify 249 internal exons that belong to known genes but are not related to annotated exons. Using independent RNA samples from 14 matched normal human tissues, we validate 9/9 of these exons by RT-PCR and 216/249 by paired-end RNA-seq (2 x 250 bp). These results indicate that de novo prediction of splice variants remains beneficial even in well-studied systems. An implementation of our method is freely available as an R/Bioconductor package

          .

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          Profile hidden Markov models.

          S. Eddy (1998)
          The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement standard pairwise comparison methods for large-scale sequence analysis. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Assessment of transcript reconstruction methods for RNA-seq

            RNA sequencing (RNA-seq) is transforming genome biology, enabling comprehensive transcriptome profiling with unprecendented accuracy and detail. Due to technical limitations of current high-throughput sequencing platforms, transcript identity, structure and expression level must be inferred programmatically from partial sequence reads of fragmented gene products. We evaluated 24 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates, but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations in transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Systematic evaluation of spliced alignment programs for RNA-seq data

              High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. to assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. in total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.
                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2016
                24 May 2016
                : 11
                : 5
                : e0156132
                Affiliations
                [1 ]Department of Bioinformatics and Computational Biology, Genentech Inc., South San Francisco, CA, United States of America
                [2 ]Department of Molecular Biology, Genentech Inc., South San Francisco, CA, United States of America
                University of California, Los Angeles, UNITED STATES
                Author notes

                Competing Interests: The authors of this manuscript have read the journal’s policy and have the following competing interests: All authors are or have been employees of Genentech Inc. and some hold shares in Roche. RG is an employee of 23AndMe Inc. This does not alter the authors’ adherence to PLOS ONE policies on sharing data and materials.

                Conceived and designed the experiments: LDG RG. Analyzed the data: LDG RG. Wrote the paper: LDG RG. Provided advice on method and software development: GP ML TDW. Performed RT-PCR experiments: YC. Provided advice and oversaw sequencing experiments: SS.

                [¤]

                Current address: 23andMe Inc., Mountain View, CA, United States of America

                Article
                PONE-D-15-21817
                10.1371/journal.pone.0156132
                4878813
                27218464
                29be6d6e-c655-4cc2-9dcc-c20cc037d0ae
                © 2016 Goldstein et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 5 June 2015
                : 18 April 2016
                Page count
                Figures: 6, Tables: 1, Pages: 18
                Funding
                The study was funded by Genentech Inc. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Gene Mapping
                Exon Mapping
                Research and Analysis Methods
                Molecular Biology Techniques
                Gene Mapping
                Exon Mapping
                Computer and Information Sciences
                Data Visualization
                Infographics
                Graphs
                Biology and Life Sciences
                Molecular Biology
                Molecular Biology Techniques
                Artificial Gene Amplification and Extension
                Polymerase Chain Reaction
                Reverse Transcriptase-Polymerase Chain Reaction
                Research and Analysis Methods
                Molecular Biology Techniques
                Artificial Gene Amplification and Extension
                Polymerase Chain Reaction
                Reverse Transcriptase-Polymerase Chain Reaction
                Research and Analysis Methods
                Simulation and Modeling
                Biology and life sciences
                Molecular biology
                Molecular biology techniques
                Sequencing techniques
                RNA sequencing
                Research and analysis methods
                Molecular biology techniques
                Sequencing techniques
                RNA sequencing
                Biology and Life Sciences
                Biochemistry
                Proteins
                Protein Domains
                Biology and Life Sciences
                Computational Biology
                Genome Complexity
                Introns
                Biology and Life Sciences
                Genetics
                Genomics
                Genome Complexity
                Introns
                Computer and Information Sciences
                Computer Software
                Custom metadata
                RNA-seq data have been deposited at the European Genome-phenome Archive ( http://www.ebi.ac.uk/ega/) under accession number EGAS00001001026.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article