19
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task.

          Results

          We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes.

          Conclusion

          QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1A-Seq, Par-CLIP, RIP-Seq, etc.

          Electronic supplementary material

          The online version of this article (10.1186/s12859-017-1808-4) contains supplementary material, which is available to authorized users.

          Related collections

          Most cited references34

          • Record: found
          • Abstract: found
          • Article: not found

          STAR: ultrafast universal RNA-seq aligner.

          Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

            Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org). Contact: mrobinson@wehi.edu.au
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              HISAT: a fast spliced aligner with low memory requirements.

              HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.
                Bookmark

                Author and article information

                Contributors
                liulian19860905@163.com
                zhangsw@nwpu.edu.cn
                yufei.huang@utsa.edu
                jia.meng@xjtlu.edu.cn
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                31 August 2017
                31 August 2017
                2017
                : 18
                : 387
                Affiliations
                [1 ]ISNI 0000 0001 0307 1240, GRID grid.440588.5, Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, , Northwestern Polytechnical University, ; Xi’an, 710072 China
                [2 ]ISNI 0000000121845633, GRID grid.215352.2, Department of Electrical and Computation Engineering, , University of Texas at San Antonio, ; San Antonio, TX 78230 USA
                [3 ]ISNI 0000 0004 1765 4000, GRID grid.440701.6, Department of Biological Sciences, HRINU, SUERI, , Xi’an Jiaotong-Liverpool University, ; Suzhou, Jiangsu 215123 China
                [4 ]ISNI 0000 0004 1936 8470, GRID grid.10025.36, Institute of Integrative Biology, , University of Liverpool, ; L7 8TX, Liverpool, UK
                Article
                1808
                10.1186/s12859-017-1808-4
                5667504
                28859631
                97278107-9e56-4c71-8276-941a394c865f
                © The Author(s). 2017

                Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 26 May 2017
                : 22 August 2017
                Funding
                Funded by: National Natural Science Foundation of China
                Award ID: 61473232,91430111
                Award ID: 31671373 ,61401370
                Award Recipient :
                Funded by: Jiangsu University Natural Science Program
                Award ID: 16KJB180027
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100006545, National Institute on Minority Health and Health Disparities;
                Award ID: G12MD007591
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000009, Foundation for the National Institutes of Health;
                Award ID: R01GM113245
                Award Recipient :
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2017

                Bioinformatics & Computational biology
                differential methylation analysis,m6a,negative binomial distribution,rna methylation,small-sample size

                Comments

                Comment on this article