12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          One of the main benefits of using modern RNA-Sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses. Our investigation into 95 RNA-Seq datasets from seven plant and animal species (totaling 1,951 GB) indicates an average of roughly 22% of all reads are MMRs. Here we present a machine learning-based tool called GeneQC ( Gene expression Quality Control), which can accurately estimate the reliability of each gene's expression level derived from an RNA-Seq dataset. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability. Application of GeneQC reveals high level of mapping uncertainty in plant samples and limited, severe mapping uncertainty in animal samples. GeneQC is freely available at http://bmbl.sdstate.edu/GeneQC/home.html.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: not found
          • Article: not found

          Ridge Regression: Biased Estimation for Nonorthogonal Problems

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The transcriptional landscape of the yeast genome defined by RNA sequencing.

            The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              FunRich: An open access standalone functional enrichment and interaction network analysis tool.

              As high-throughput techniques including proteomics become more accessible to individual laboratories, there is an urgent need for a user-friendly bioinformatics analysis system. Here, we describe FunRich, an open access, standalone functional enrichment and network analysis tool. FunRich is designed to be used by biologists with minimal or no support from computational and database experts. Using FunRich, users can perform functional enrichment analysis on background databases that are integrated from heterogeneous genomic and proteomic resources (>1.5 million annotations). Besides default human specific FunRich database, users can download data from the UniProt database, which currently supports 20 different taxonomies against which enrichment analysis can be performed. Moreover, the users can build their own custom databases and perform the enrichment analysis irrespective of organism. In addition to proteomics datasets, the custom database allows for the tool to be used for genomics, lipidomics and metabolomics datasets. Thus, FunRich allows for complete database customization and thereby permits for the tool to be exploited as a skeleton for enrichment analysis irrespective of the data type or organism used. FunRich (http://www.funrich.org) is user-friendly and provides graphical representation (Venn, pie charts, bar graphs, column, heatmap and doughnuts) of the data with customizable font, scale and color (publication quality).
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Genet
                Front Genet
                Front. Genet.
                Frontiers in Genetics
                Frontiers Media S.A.
                1664-8021
                14 August 2018
                2018
                : 9
                : 313
                Affiliations
                [1] 1Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, South Dakota State University , Brookings, SD, United States
                [2] 2Department of Mathematics and Statistics, South Dakota State University , Brookings, SD, United States
                [3] 3Center for Applied Mathematics, Tianjin University , Tianjin, China
                [4] 4Department of Electrical Engineering and Computer Science, South Dakota State University , Brookings, SD, United States
                Author notes

                Edited by: Dariusz Mrozek, Silesian University of Technology, Poland

                Reviewed by: Xiangxiang Zeng, Xiamen University, China; Shihao Shen, University of California, Los Angeles, United States

                *Correspondence: Qin Ma Qin.Ma@ 123456sdstate.edu

                This article was submitted to Bioinformatics and Computational Biology, a section of the journal Frontiers in Genetics

                Article
                10.3389/fgene.2018.00313
                6102479
                30154828
                cd5a727b-2d92-4b1c-9f50-82ae4db9d8b6
                Copyright © 2018 McDermaid, Chen, Zhang, Wang, Gu, Xie and Ma.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 25 May 2018
                : 23 July 2018
                Page count
                Figures: 3, Tables: 3, Equations: 13, References: 69, Pages: 11, Words: 7847
                Categories
                Genetics
                Original Research

                Genetics
                gene expression,rna-seq read alignment,mapping uncertainty,machine learning,elastic-net,mixture model fitting,k-means clustering,em-algorithm

                Comments

                Comment on this article