Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Motivation

Biomarker discovery methods are essential to identify a minimal subset of features (e.g., serum markers in predictive medicine) that are relevant to develop prediction models with high accuracy. By now, there exist diverse feature selection methods, which either are embedded, combined, or independent of predictive learning algorithms. Many preceding studies showed the defectiveness of single feature selection results, which cause difficulties for professionals in a variety of fields (e.g., medical practitioners) to analyze and interpret the obtained feature subsets. Whereas each of these methods is highly biased, an ensemble feature selection has the advantage to alleviate and compensate for such biases. Concerning the reliability, validity, and reproducibility of these methods, we examined eight different feature selection methods for binary classification datasets and developed an ensemble feature selection system.

Results

By using an ensemble of feature selection methods, a quantification of the importance of the features could be obtained. The prediction models that have been trained on the selected features showed improved prediction performance.

Electronic supplementary material

The online version of this article (doi:10.1186/s13040-016-0114-4) contains supplementary material, which is available to authorized users.

Related collections

Most cited references 27

Record: found
Abstract: not found
Article: not found

Wrappers for feature subset selection

Ron Kohavi, George H. John (1997)

0 comments Cited 1042 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Selection of relevant features and examples in machine learning

Avrim Blum, Pat Langley (1997)

0 comments Cited 375 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Book: not found

Combining Pattern Classifiers

Ludmila I Kuncheva (2004)

0 comments Cited 239 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Ursula Neumann: u.neumann@wz-straubing.de

Mona Riemenschneider: m.riemenschneider@wz-straubing.de

Jan-Peter Sowa: jan.sowa@uni-due.de

Theodor Baars: theodor.baars@uk-essen.de

Julia Kälsch: julia.kaelsch@uk-essen.de

Ali Canbay: ali.canbac@uni-due.de

Dominik Heider: d.heider@wz-straubing.de

Journal

Journal ID (nlm-ta): BioData Min

Journal ID (iso-abbrev): BioData Min

Title: BioData Mining

Publisher: BioMed Central (London )

ISSN (Electronic): 1756-0381

Publication date (Electronic): 18 November 2016

Publication date PMC-release: 18 November 2016

Publication date Collection: 2016

Volume: 9

Electronic Location Identifier: 36

Affiliations

[1 ]Department of Bioinformatics, Straubing, 94315 Germany

[2 ]University of Applied Science, Weihenstephan-Triesdorf, Freising, 85354 Germany

[3 ]Wissenschaftszentrum Weihenstephan, Technische Universität München, Freising, 85354 Germany

[4 ]Department of Gastroenterology and Hepatology, University Hospital, University Duisburg-Essen, Essen, 45122 Germany

[5 ]Clinic for Cardiology, West German Heart and Vascular Centre Essen, University Hospital, University Duisburg-Essen, Essen, 45122 Germany

Article

Publisher ID: 114

DOI: 10.1186/s13040-016-0114-4

PMC ID: 5116216

PubMed ID: 27891179

SO-VID: 271ff378-27f1-451f-9a48-ad378261f5dc

License:

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

History

Date received : 23 June 2016

Date accepted : 27 October 2016

Funding

Funded by: Deichmann Foundation

Custom metadata

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: machine learning,feature selection,ensemble learning,biomarker discovery,random forest

Data availability:

ScienceOpen disciplines: Bioinformatics & Computational biology

Keywords: machine learning, feature selection, ensemble learning, biomarker discovery, random forest

Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach

Read this article at

Abstract

Motivation

Results

Electronic supplementary material

Related collections

REPO4EU

Most cited references 27

Wrappers for feature subset selection

Selection of relevant features and examples in machine learning

Combining Pattern Classifiers

Author and article information

Contributors

Journal

Affiliations

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 79

Cited by 24

Most referenced authors 1,200