+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Overcome Support Vector Machine Diagnosis Overfitting

      1 , 2 , 3
      Cancer Informatics
      Libertas Academica
      SVM, overfitting, biomarker discovery

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic results and enhance clinical decision making. In this work, we comprehensively investigate this problem from both theoretical and practical standpoints to unveil the special characteristics of SVM overfitting. We found that disease diagnosis under an SVM classifier would inevitably encounter overfitting under a Gaussian kernel because of the large data variations generated from high-throughput profiling technologies. Furthermore, we propose a novel sparse-coding kernel approach to overcome SVM overfitting in disease diagnosis. Unlike traditional ad-hoc parametric tuning approaches, it not only robustly conquers the overfitting problem, but also achieves good diagnostic accuracy. To our knowledge, it is the first rigorous method proposed to overcome SVM overfitting. Finally, we propose a novel biomarker discovery algorithm: Gene-Switch-Marker (GSM) to capture meaningful biomarkers by taking advantage of SVM overfitting on single genes.

          Related collections

          Most cited references44

          • Record: found
          • Abstract: not found
          • Book: not found

          Principal Component Analysis

            • Record: found
            • Abstract: found
            • Article: not found

            A comparison of methods for multiclass support vector machines.

            Support vector machines (SVMs) were originally designed for binary classification. How to effectively extend it for multiclass classification is still an ongoing research issue. Several methods have been proposed where typically we construct a multiclass classifier by combining several binary classifiers. Some authors also proposed methods that consider all classes at once. As it is computationally more expensive to solve multiclass problems, comparisons of these methods using large-scale problems have not been seriously conducted. Especially for methods solving multiclass SVM in one step, a much larger optimization problem is required so up to now experiments are limited to small data sets. In this paper we give decomposition implementations for two such "all-together" methods. We then compare their performance with three methods based on binary classifications: "one-against-all," "one-against-one," and directed acyclic graph SVM (DAGSVM). Our experiments indicate that the "one-against-one" and DAG methods are more suitable for practical use than the other methods. Results also show that for large problems methods by considering all data at once in general need fewer support vectors.
              • Record: found
              • Abstract: found
              • Article: not found

              Machine learning in bioinformatics.

              This article reviews machine learning methods for bioinformatics. It presents modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization. Applications in genomics, proteomics, systems biology, evolution and text mining are also shown.

                Author and article information

                Cancer Inform
                Cancer Inform
                Cancer Informatics
                Cancer Informatics
                Libertas Academica
                09 December 2014
                : 13
                : Suppl 1
                : 145-158
                [1 ]Department of Computer and Information Science, Fordham University, New York, NY, USA.
                [2 ]Quantitative Proteomics Center, Columbia University, New York, NY, USA.
                [3 ]Division of Biomedical Informatics, University of California, San Diego, CA, USA.
                Author notes
                © 2014 the author(s), publisher and licensee Libertas Academica Ltd.

                This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.

                : 04 May 2014
                : 13 September 2014
                : 16 September 2014

                Oncology & Radiotherapy
                svm,overfitting,biomarker discovery
                Oncology & Radiotherapy
                svm, overfitting, biomarker discovery


                Comment on this article