15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      RDDpred: a condition-specific RNA-editing prediction model from RNA-seq data

      research-article
      , ,
      BMC Genomics
      BioMed Central
      The Fourteenth Asia Pacific Bioinformatics Conference (APBC 2016) (APBC 2016)
      11 - 13 January 2016
      RNA-editing, Condition-specific, Machine-learning, Random forest, RNA-seq, Systematic artefact

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          RNA-editing is an important post-transcriptional RNA sequence modification performed by two catalytic enzymes, "ADAR"(A-to-I) and "APOBEC"(C-to-U). By utilizing high-throughput sequencing technologies, the biological function of RNA-editing has been actively investigated. Currently, RNA-editing is considered to be a key regulator that controls various cellular functions, such as protein activity, alternative splicing pattern of mRNA, and substitution of miRNA targeting site. DARNED, a public RDD database, reported that there are more than 300-thousands RNA-editing sites detected in human genome(hg19). Moreover, multiple studies suggested that RNA-editing events occur in highly specific conditions. According to DARNED, 97.62 % of registered editing sites were detected in a single tissue or in a specific condition, which also supports that the RNA-editing events occur condition-specifically. Since RNA-seq can capture the whole landscape of transcriptome, RNA-seq is widely used for RDD prediction. However, significant amounts of false positives or artefacts can be generated when detecting RNA-editing from RNA-seq. Since it is difficult to perform experimental validation at the whole-transcriptome scale, there should be a powerful computational tool to distinguish true RNA-editing events from artefacts.

          Result

          We developed RDDpred, a Random Forest RDD classifier. RDDpred reports potentially true RNA-editing events from RNA-seq data. RDDpred was tested with two publicly available RNA-editing datasets and successfully reproduced RDDs reported in the two studies (90 %, 95 %) while rejecting false-discoveries (NPV: 75 %, 84 %).

          Conclusion

          RDDpred automatically compiles condition-specific training examples without experimental validations and then construct a RDD classifier. As far as we know, RDDpred is the very first machine-learning based automated pipeline for RDD prediction. We believe that RDDpred will be very useful and can contribute significantly to the study of condition-specific RNA-editing. RDDpred is available at http://biohealth.snu.ac.kr/software/RDDpred.

          Related collections

          Most cited references20

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

          (2013)
          Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e.g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors. Availability: http://samtools.sourceforge.net. Contact: hengli@broadinstitute.org.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            RADAR: a rigorously annotated database of A-to-I RNA editing

            We present RADAR—a rigorously annotated database of A-to-I RNA editing (available at http://RNAedit.com). The identification of A-to-I RNA editing sites has been dramatically accelerated in the past few years by high-throughput RNA sequencing studies. RADAR includes a comprehensive collection of A-to-I RNA editing sites identified in humans (Homo sapiens), mice (Mus musculus) and flies (Drosophila melanogaster), together with extensive manually curated annotations for each editing site. RADAR also includes an expandable listing of tissue-specific editing levels for each editing site, which will facilitate the assignment of biological functions to specific editing sites.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Systematic evaluation of spliced alignment programs for RNA-seq data

              High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. to assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. in total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.
                Bookmark

                Author and article information

                Contributors
                mdy89@snu.ac.kr
                Jotunnheim@snu.ac.kr
                sunkim.bioinfo@snu.ac.kr
                Conference
                BMC Genomics
                BMC Genomics
                BMC Genomics
                BioMed Central (London )
                1471-2164
                11 January 2016
                11 January 2016
                2016
                : 17
                Issue : Suppl 1 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. Articles have undergone the journal's standard peer review process. The Supplement Editors declare that they have no competing interests.
                : 5
                Affiliations
                [ ]Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
                [ ]Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
                [ ]Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea
                Article
                2301
                10.1186/s12864-015-2301-y
                4895604
                26817607
                d45ace04-2f3b-499a-bb51-500db8b61be9
                © Kim et al. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                The Fourteenth Asia Pacific Bioinformatics Conference (APBC 2016)
                APBC 2016
                San Francisco, CA, USA
                11 - 13 January 2016
                History
                Categories
                Proceedings
                Custom metadata
                © The Author(s) 2016

                Genetics
                rna-editing,condition-specific,machine-learning,random forest,rna-seq,systematic artefact
                Genetics
                rna-editing, condition-specific, machine-learning, random forest, rna-seq, systematic artefact

                Comments

                Comment on this article