Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, post hoc tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has a priori information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes.

Results

We developed a four step, post hoc pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with post hoc 'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising each pattern. The visual basic code, installation files for StatiGen, and sample data are available as supplementary material.

Conclusion

The PPM procedure is designed to augment current microarray analysis procedures by allowing researchers to incorporate all of the information from post hoc tests to establish unique, overarching gene expression patterns in which there is no overlap in gene membership. In our hands, PPM works well for studies using from three to six treatment groups in which the researcher is interested in treatment-related patterns of gene expression. Hardware/software limitations and extreme number of theoretical expression patterns limit utility for larger numbers of treatment groups. Applied to a published microarray experiment, the StatiGen program successfully flagged patterns that had been manually assigned in prior work, and further identified other gene expression patterns that may be of interest. Thus, over a moderate range of treatment groups, PPM appears to work well. It allows researchers to assign statistical probabilities to patterns of gene expression that fit a priori expectations/hypotheses, it preserves the data's ability to show the researcher interesting, yet unanticipated gene expression patterns, and assigns the majority of ANOVA-significant genes to non-overlapping patterns.

Related collections

Most cited references 71

Record: found
Abstract: found
Article: found

Is Open Access

In silico prediction of protein-protein interactions in human macrophages

Oussema Souiai, Fatma Zahra Guerfali, Slimane Miled … (2015)

Background: Protein-protein interaction (PPI) network analyses are highly valuable in deciphering and understanding the intricate organisation of cellular functions. Nevertheless, the majority of available protein-protein interaction networks are context-less, i.e. without any reference to the spatial, temporal or physiological conditions in which the interactions may occur. In this work, we are proposing a protocol to infer the most likely protein-protein interaction (PPI) network in human macrophages. Results: We integrated the PPI dataset from the Agile Protein Interaction DataAnalyzer (APID) with different meta-data to infer a contextualized macrophage-specific interactome using a combination of statistical methods. The obtained interactome is enriched in experimentally verified interactions and in proteins involved in macrophage-related biological processes (i.e. immune response activation, regulation of apoptosis). As a case study, we used the contextualized interactome to highlight the cellular processes induced upon Mycobacterium tuberculosis infection. Conclusion: Our work confirms that contextualizing interactomes improves the biological significance of bioinformatic analyses. More specifically, studying such inferred network rather than focusing at the gene expression level only, is informative on the processes involved in the host response. Indeed, important immune features such as apoptosis are solely highlighted when the spotlight is on the protein interaction level.

0 comments Cited 1278 times – based on 0 reviews

Preprint

     Review now

Bookmark

Record: found
Abstract: found
Article: not found

Gene regulation and DNA damage in the ageing human brain.

Tao Lu, Ying Pan, Shyan-Yuan Kao … (2004)

The ageing of the human brain is a cause of cognitive decline in the elderly and the major risk factor for Alzheimer's disease. The time in life when brain ageing begins is undefined. Here we show that transcriptional profiling of the human frontal cortex from individuals ranging from 26 to 106 years of age defines a set of genes with reduced expression after age 40. These genes play central roles in synaptic plasticity, vesicular transport and mitochondrial function. This is followed by induction of stress response, antioxidant and DNA repair genes. DNA damage is markedly increased in the promoters of genes with reduced expression in the aged cortex. Moreover, these gene promoters are selectively damaged by oxidative stress in cultured human neurons, and show reduced base-excision DNA repair. Thus, DNA damage may reduce the expression of selectively vulnerable genes involved in learning, memory and neuronal survival, initiating a programme of brain ageing that starts early in adult life.

0 comments Cited 490 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Analysis of variance for gene expression microarray data.

M Kerr, M. Martin, G Churchill (2000)

Spotted cDNA microarrays are emerging as a powerful and cost-effective tool for large-scale analysis of gene expression. Microarrays can be used to measure the relative quantities of specific mRNAs in two or more tissue samples for thousands of genes simultaneously. While the power of this technology has been recognized, many open questions remain about appropriate analysis of microarray data. One question is how to make valid estimates of the relative expression for genes that are not biased by ancillary sources of variation. Recognizing that there is inherent "noise" in microarray data, how does one estimate the error variation associated with an estimated change in expression, i.e., how does one construct the error bars? We demonstrate that ANOVA methods can be used to normalize microarray data and provide estimates of changes in gene expression that are corrected for potential confounding effects. This approach establishes a framework for the general analysis and interpretation of microarray data.

0 comments Cited 248 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central (London )

ISSN (Electronic): 1471-2105

Publication date Collection: 2007

Publication date (Electronic): 5 July 2007

Volume: 8

Page: 240

Affiliations

[1 ]Department of Molecular and Biomedical Pharmacology, University of Kentucky College of Medicine, Lexington, Kentucky, USA

Article

Publisher ID: 1471-2105-8-240

DOI: 10.1186/1471-2105-8-240

PMC ID: 1934919

PubMed ID: 17615071

SO-VID: e158ce99-a070-4e49-9c65-11b5aca05969

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

Read this article at

Abstract

Background

Results

Conclusion

Related collections

Genetoberfest

Most cited references 71

In silico prediction of protein-protein interactions in human macrophages

Gene regulation and DNA damage in the ageing human brain.

Analysis of variance for gene expression microarray data.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 17

Cited by 8

Most referenced authors 814