376
views
0
recommends
+1 Recommend
0 collections
    3
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis.

      Bioinformatics
      Algorithms, Artificial Intelligence, Cluster Analysis, Diagnosis, Computer-Assisted, methods, Gene Expression Profiling, Genetic Predisposition to Disease, genetics, Genetic Testing, Humans, Neoplasm Proteins, metabolism, Neoplasms, diagnosis, Oligonucleotide Array Sequence Analysis, Pattern Recognition, Automated, Reproducibility of Results, Sensitivity and Specificity, Software, Tumor Markers, Biological, User-Computer Interface

      Read this article at

      ScienceOpenPublisherPubMed
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combination of classifier, gene selection and cross-validation methods, we performed a systematic and comprehensive evaluation of several major algorithms for multicategory classification, several gene selection methods, multiple ensemble classifier methods and two cross-validation designs using 11 datasets spanning 74 diagnostic categories and 41 cancer types and 12 normal tissue types. Multicategory support vector machines (MC-SVMs) are the most effective classifiers in performing accurate cancer diagnosis from gene expression data. The MC-SVM techniques by Crammer and Singer, Weston and Watkins and one-versus-rest were found to be the best methods in this domain. MC-SVMs outperform other popular machine learning algorithms, such as k-nearest neighbors, backpropagation and probabilistic neural networks, often to a remarkable degree. Gene selection techniques can significantly improve the classification performance of both MC-SVMs and other non-SVM learning algorithms. Ensemble classifiers do not generally improve performance of the best non-ensemble models. These results guided the construction of a software system GEMS (Gene Expression Model Selector) that automates high-quality model construction and enforces sound optimization and performance estimation procedures. This is the first such system to be informed by a rigorous comparative analysis of the available algorithms and datasets. The software system GEMS is available for download from http://www.gems-system.org for non-commercial use. alexander.statnikov@vanderbilt.edu.

          Related collections

          Author and article information

          Comments

          Comment on this article