7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer

      research-article
      1 , 2 , , 1 , 2 , 3 ,
      BMC Bioinformatics
      BioMed Central
      International Conference on Intelligent Biology and Medicine (ICIBM 2021)
      8-10 August 2021
      Feature selection, Multi-omics data, Classifier, Representation entropy, Redundancy rate

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          As many complex omics data have been generated during the last two decades, dimensionality reduction problem has been a challenging issue in better mining such data. The omics data typically consists of many features. Accordingly, many feature selection algorithms have been developed. The performance of those feature selection methods often varies by specific data, making the discovery and interpretation of results challenging.

          Methods and results

          In this study, we performed a comprehensive comparative study of five widely used supervised feature selection methods (mRMR, INMIFS, DFS, SVM-RFE-CBR and VWMRmR) for multi-omics datasets. Specifically, we used five representative datasets: gene expression (Exp), exon expression (ExpExon), DNA methylation (hMethyl27), copy number variation (Gistic2), and pathway activity dataset (Paradigm IPLs) from a multi-omics study of acute myeloid leukemia (LAML) from The Cancer Genome Atlas (TCGA). The different feature subsets selected by the aforesaid five different feature selection algorithms are assessed using three evaluation criteria: (1) classification accuracy (Acc), (2) representation entropy (RE) and (3) redundancy rate (RR). Four different classifiers, viz., C4.5, NaiveBayes, KNN, and AdaBoost, were used to measure the classification accuary (Acc) for each selected feature subset. The VWMRmR algorithm obtains the best Acc for three datasets (ExpExon, hMethyl27 and Paradigm IPLs). The VWMRmR algorithm offers the best RR (obtained using normalized mutual information) for three datasets (Exp, Gistic2 and Paradigm IPLs), while it gives the best RR (obtained using Pearson correlation coefficient) for two datasets (Gistic2 and Paradigm IPLs). It also obtains the best RE for three datasets (Exp, Gistic2 and Paradigm IPLs). Overall, the VWMRmR algorithm yields best performance for all three evaluation criteria for majority of the datasets. In addition, we identified signature genes using supervised learning collected from the overlapped top feature set among five feature selection methods. We obtained a 7-gene signature ( ZMIZ1, ENG, FGFR1, PAWR, KRT17, MPO and LAT2) for EXP, a 9-gene signature for ExpExon, a 7-gene signature for hMethyl27, one single-gene signature ( PIK3 CG) for Gistic2 and a 3-gene signature for Paradigm IPLs.

          Conclusion

          We performed a comprehensive comparison of the performance evaluation of five well-known feature selection methods for mining features from various high-dimensional datasets. We identified signature genes using supervised learning for the specific omic data for the disease. The study will help incorporate higher order dependencies among features.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Re-epithelialization and immune cell behaviour in an ex vivo human skin model

          A large body of literature is available on wound healing in humans. Nonetheless, a standardized ex vivo wound model without disruption of the dermal compartment has not been put forward with compelling justification. Here, we present a novel wound model based on application of negative pressure and its effects for epidermal regeneration and immune cell behaviour. Importantly, the basement membrane remained intact after blister roof removal and keratinocytes were absent in the wounded area. Upon six days of culture, the wound was covered with one to three-cell thick K14+Ki67+ keratinocyte layers, indicating that proliferation and migration were involved in wound closure. After eight to twelve days, a multi-layered epidermis was formed expressing epidermal differentiation markers (K10, filaggrin, DSG-1, CDSN). Investigations about immune cell-specific manners revealed more T cells in the blister roof epidermis compared to normal epidermis. We identified several cell populations in blister roof epidermis and suction blister fluid that are absent in normal epidermis which correlated with their decrease in the dermis, indicating a dermal efflux upon negative pressure. Together, our model recapitulates the main features of epithelial wound regeneration, and can be applied for testing wound healing therapies and investigating underlying mechanisms.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Visualizing and interpreting cancer genomics data via the Xena platform

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

              The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.
                Bookmark

                Author and article information

                Contributors
                Zhongming.Zhao@uth.tmc.edu
                Conference
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central (London )
                1471-2105
                28 April 2022
                28 April 2022
                2022
                : 23
                Issue : Suppl 3 Issue sponsor : Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. Supplement Editors were not involved in the peer review of any article that they co-authored. The Supplement Editors declare that they have no other competing interests.
                : 153
                Affiliations
                [1 ]GRID grid.440546.7, ISNI 0000 0004 1779 9509, Department of Computer Science and Engineering, , Aliah University, ; Kolkata, West Bengal 700160 India
                [2 ]GRID grid.267308.8, ISNI 0000 0000 9206 2401, Center for Precision Health, School of Biomedical Informatics, , The University of Texas Health Science Center at Houston, ; Houston, TX 77030 USA
                [3 ]GRID grid.267308.8, ISNI 0000 0000 9206 2401, Human Genetics Center, School of Public Health, , The University of Texas Health Science Center at Houston, ; Houston, TX 77030 USA
                Author information
                http://orcid.org/0000-0003-4107-6784
                http://orcid.org/0000-0002-3477-0914
                Article
                4678
                10.1186/s12859-022-04678-y
                9052461
                35484501
                8e52e4fc-6e4c-4f28-9898-2906ddc38b7f
                © The Author(s) 2022

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit  http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                International Conference on Intelligent Biology and Medicine (ICIBM 2021)
                Philadelphia, PA, USA
                8-10 August 2021
                History
                : 11 April 2022
                : 11 April 2022
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100004917, Cancer Prevention and Research Institute of Texas;
                Award ID: CPRIT RP170668, RP180734 and RP210045
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100000009, Foundation for the National Institutes of Health;
                Award ID: R01LM012806
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/100004917, Cancer Prevention and Research Institute of Texas;
                Award ID: CPRIT 180734
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2022

                Bioinformatics & Computational biology
                feature selection,multi-omics data,classifier,representation entropy,redundancy rate

                Comments

                Comment on this article