5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      An efficient ensemble method for missing value imputation in microarray gene expression data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The genomics data analysis has been widely used to study disease genes and drug targets. However, the existence of missing values in genomics datasets poses a significant problem, which severely hinders the use of genomics data. Current imputation methods based on a single learner often explores less known genomic data information for imputation and thus causes the imputation performance loss.

          Results

          In this study, multiple single imputation methods are combined into an imputation method by ensemble learning. In the ensemble method, the bootstrap sampling is applied for predictions of missing values by each component method, and these predictions are weighted and summed to produce the final prediction. The optimal weights are learned from known gene data in the sense of minimizing a cost function about the imputation error. And the expression of the optimal weights is derived in closed form. Additionally, the performance of the ensemble method is analytically investigated, in terms of the sum of squared regression errors. The proposed method is simulated on several typical genomic datasets and compared with the state-of-the-art imputation methods at different noise levels, sample sizes and data missing rates. Experimental results show that the proposed method achieves the improved imputation performance in terms of the imputation accuracy, robustness and generalization.

          Conclusion

          The ensemble method possesses the superior imputation performance since it can make use of known data information more efficiently for missing data imputation by integrating diverse imputation methods and learning the integration weights in a data-driven way.

          Related collections

          Most cited references45

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          CIRI: an efficient and unbiased algorithm for de novo circular RNA identification

          Recent studies reveal that circular RNAs (circRNAs) are a novel class of abundant, stable and ubiquitous noncoding RNA molecules in animals. Comprehensive detection of circRNAs from high-throughput transcriptome data is an initial and crucial step to study their biogenesis and function. Here, we present a novel chiastic clipping signal-based algorithm, CIRI, to unbiasedly and accurately detect circRNAs from transcriptome data by employing multiple filtration strategies. By applying CIRI to ENCODE RNA-seq data, we for the first time identify and experimentally validate the prevalence of intronic/intergenic circRNAs as well as fragments specific to them in the human transcriptome. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0571-3) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Missing value estimation methods for DNA microarrays

            Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Gene expression during the life cycle of Drosophila melanogaster.

              Molecular genetic studies of Drosophila melanogaster have led to profound advances in understanding the regulation of development. Here we report gene expression patterns for nearly one-third of all Drosophila genes during a complete time course of development. Mutations that eliminate eye or germline tissue were used to further analyze tissue-specific gene expression programs. These studies define major characteristics of the transcriptional programs that underlie the life cycle, compare development in males and females, and show that large-scale gene expression data collected from whole animals can be used to identify genes expressed in particular tissues and organs or genes involved in specific biological and biochemical processes.
                Bookmark

                Author and article information

                Contributors
                sunbiao@tju.edu.cn
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                13 April 2021
                13 April 2021
                2021
                : 22
                : 188
                Affiliations
                [1 ]GRID grid.33763.32, ISNI 0000 0004 1761 2484, School of Electrical and Information Engineering, , Tianjin University, ; Tianjin, 300072 China
                [2 ]State Key Laboratory of Digital Publishing Technology, Beijing, 100871 China
                [3 ]GRID grid.412518.b, ISNI 0000 0001 0008 0619, China Institute of FTZ Supply Chain, , Shanghai Maritime University, ; Shanghai, 201306 China
                Article
                4109
                10.1186/s12859-021-04109-4
                8045198
                33388027
                38815a81-6478-48de-ab4a-6637c74a3d8e
                © The Author(s) 2021

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 30 March 2020
                : 29 March 2021
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100001809, National Natural Science Foundation of China;
                Award ID: 61972282
                Award Recipient :
                Funded by: Opening Project of State Key Laboratory of Digital Publishing Technology
                Award ID: Cndplab-2019-Z001
                Award Recipient :
                Categories
                Methodology Article
                Custom metadata
                © The Author(s) 2021

                Bioinformatics & Computational biology
                gene expression data,imputation,ensemble learning,bootstrap sampling,generalization

                Comments

                Comment on this article