Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Numerous feature selection methods have been applied to the identification of differentially expressed genes in microarray data. These include simple fold change, classical t-statistic and moderated t-statistics. Even though these methods return gene lists that are often dissimilar, few direct comparisons of these exist. We present an empirical study in which we compare some of the most commonly used feature selection methods. We apply these to 9 publicly available datasets, and compare, both the gene lists produced and how these perform in class prediction of test datasets.

Results

In this study, we compared the efficiency of the feature selection methods; significance analysis of microarrays (SAM), analysis of variance (ANOVA), empirical bayes t-statistic, template matching, maxT, between group analysis (BGA), Area under the receiver operating characteristic (ROC) curve, the Welch t-statistic, fold change, rank products, and sets of randomly selected genes. In each case these methods were applied to 9 different binary (two class) microarray datasets. Firstly we found little agreement in gene lists produced by the different methods. Only 8 to 21% of genes were in common across all 10 feature selection methods. Secondly, we evaluated the class prediction efficiency of each gene list in training and test cross-validation using four supervised classifiers.

Conclusion

We report that the choice of feature selection method, the number of genes in the genelist, the number of cases (samples) and the noise in the dataset, substantially influence classification success. Recommendations are made for choice of feature selection. Area under a ROC curve performed well with datasets that had low levels of noise and large sample size. Rank products performs well when datasets had low numbers of samples or high levels of noise. The Empirical bayes t-statistic performed well across a range of sample sizes.

Related collections

Most cited references 33

Record: found
Abstract: found
Article: found

Is Open Access

In silico prediction of protein-protein interactions in human macrophages

Oussema Souiai, Fatma Zahra Guerfali, Slimane Miled … (2015)

Background: Protein-protein interaction (PPI) network analyses are highly valuable in deciphering and understanding the intricate organisation of cellular functions. Nevertheless, the majority of available protein-protein interaction networks are context-less, i.e. without any reference to the spatial, temporal or physiological conditions in which the interactions may occur. In this work, we are proposing a protocol to infer the most likely protein-protein interaction (PPI) network in human macrophages. Results: We integrated the PPI dataset from the Agile Protein Interaction DataAnalyzer (APID) with different meta-data to infer a contextualized macrophage-specific interactome using a combination of statistical methods. The obtained interactome is enriched in experimentally verified interactions and in proteins involved in macrophage-related biological processes (i.e. immune response activation, regulation of apoptosis). As a case study, we used the contextualized interactome to highlight the cellular processes induced upon Mycobacterium tuberculosis infection. Conclusion: Our work confirms that contextualizing interactomes improves the biological significance of bioinformatic analyses. More specifically, studying such inferred network rather than focusing at the gene expression level only, is informative on the processes involved in the host response. Indeed, important immune features such as apoptosis are solely highlighted when the spotlight is on the protein interaction level.

0 comments Cited 1278 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

T. Golub (1999)

0 comments Cited 664 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments.

Rainer Breitling, Patrick Armengaud, Anna Amtmann … (2004)

One of the main objectives in the analysis of microarray experiments is the identification of genes that are differentially expressed under two experimental conditions. This task is complicated by the noisiness of the data and the large number of genes that are examined simultaneously. Here, we present a novel technique for identifying differentially expressed genes that does not originate from a sophisticated statistical model but rather from an analysis of biological reasoning. The new technique, which is based on calculating rank products (RP) from replicate experiments, is fast and simple. At the same time, it provides a straightforward and statistically stringent way to determine the significance level for each gene and allows for the flexible control of the false-detection rate and familywise error rate in the multiple testing situation of a microarray experiment. We use the RP technique on three biological data sets and show that in each case it performs more reliably and consistently than the non-parametric t-test variant implemented in Tusher et al.'s significance analysis of microarrays (SAM). We also show that the RP results are reliable in highly noisy data. An analysis of the physiological function of the identified genes indicates that the RP approach is powerful for identifying biologically relevant expression changes. In addition, using RP can lead to a sharp reduction in the number of replicate experiments needed to obtain reproducible results.

0 comments Cited 561 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date Collection: 2006

Publication date (Electronic): 26 July 2006

Volume: 7

Page: 359

Affiliations

[1 ]Bioinformatics, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland

[2 ]Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Mayer 232, 44 Binney Street, Boston, MA 02115, USA

Article

Publisher ID: 1471-2105-7-359

DOI: 10.1186/1471-2105-7-359

PMC ID: 1544358

PubMed ID: 16872483

SO-VID: 0d4cfed0-019a-4801-9ed3-da3fb9276ec2

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 33

In silico prediction of protein-protein interactions in human macrophages

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 39

Cited by 105

Most referenced authors 1,182