AUREA: an open-source software system for accurate and user-friendly identification of relative expression molecular signatures

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Public databases such as the NCBI Gene Expression Omnibus contain extensive and exponentially increasing amounts of high-throughput data that can be applied to molecular phenotype characterization. Collectively, these data can be analyzed for such purposes as disease diagnosis or phenotype classification. One family of algorithms that has proven useful for disease classification is based on relative expression analysis and includes the Top-Scoring Pair (TSP), k-Top-Scoring Pairs (k-TSP), Top-Scoring Triplet (TST) and Differential Rank Conservation (DIRAC) algorithms. These relative expression analysis algorithms hold significant advantages for identifying interpretable molecular signatures for disease classification, and have been implemented previously on a variety of computational platforms with varying degrees of usability. To increase the user-base and maximize the utility of these methods, we developed the program AUREA (Adaptive Unified Relative Expression Analyzer)—a cross-platform tool that has a consistent application programming interface (API), an easy-to-use graphical user interface (GUI), fast running times and automated parameter discovery.

Results

Herein, we describe AUREA, an efficient, cohesive, and user-friendly open-source software system that comprises a suite of methods for relative expression analysis. AUREA incorporates existing methods, while extending their capabilities and bringing uniformity to their interfaces. We demonstrate that combining these algorithms and adaptively tuning parameters on the training sets makes these algorithms more consistent in their performance and demonstrate the effectiveness of our adaptive parameter tuner by comparing accuracy across diverse datasets.

Conclusions

We have integrated several relative expression analysis algorithms and provided a unified interface for their implementation while making data acquisition, parameter fixing, data merging, and results analysis ‘point-and-click’ simple. The unified interface and the adaptive parameter tuning of AUREA provide an effective framework in which to investigate the massive amounts of publically available data by both ‘ in silico’ and ‘bench’ scientists. AUREA can be found at http://price.systemsbiology.net/AUREA/.

Related collections

Most cited references 11

Record: found
Abstract: found
Article: not found

Simple decision rules for classifying human cancers from gene expression profiles.

Aik Choon Tan, Raimond L. Winslow, Lei Xu … (2005)

Various studies have shown that cancer tissue samples can be successfully detected and classified by their gene expression patterns using machine learning approaches. One of the challenges in applying these techniques for classifying gene expression data is to extract accurate, readily interpretable rules providing biological insight as to how classification is performed. Current methods generate classifiers that are accurate but difficult to interpret. This is the trade-off between credibility and comprehensibility of the classifiers. Here, we introduce a new classifier in order to address these problems. It is referred to as k-TSP (k-Top Scoring Pairs) and is based on the concept of 'relative expression reversals'. This method generates simple and accurate decision rules that only involve a small number of gene-to-gene expression comparisons, thereby facilitating follow-up studies. In this study, we have compared our approach to other machine learning techniques for class prediction in 19 binary and multi-class gene expression datasets involving human cancers. The k-TSP classifier performs as efficiently as Prediction Analysis of Microarray and support vector machine, and outperforms other learning methods (decision trees, k-nearest neighbour and naïve Bayes). Our approach is easy to interpret as the classifier involves only a small number of informative genes. For these reasons, we consider the k-TSP method to be a useful tool for cancer classification from microarray gene expression data. The software and datasets are available at http://www.ccbm.jhu.edu actan@jhu.edu.

0 comments Cited 118 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Classifying gene expression profiles from pairwise mRNA comparisons.

Donald Geman, Christian d'Avignon, Daniel Q. Naiman … (2004)

We present a new approach to molecular classification based on mRNA comparisons. Our method, referred to as the top-scoring pair(s) (TSP) classifier, is motivated by current technical and practical limitations in using gene expression microarray data for class prediction, for example to detect disease, identify tumors or predict treatment response. Accurate statistical inference from such data is difficult due to the small number of observations, typically tens, relative to the large number of genes, typically thousands. Moreover, conventional methods from machine learning lead to decisions which are usually very difficult to interpret in simple or biologically meaningful terms. In contrast, the TSP classifier provides decision rules which i) involve very few genes and only relative expression values (e.g., comparing the mRNA counts within a single pair of genes); ii) are both accurate and transparent; and iii) provide specific hypotheses for follow-up studies. In particular, the TSP classifier achieves prediction rates with standard cancer data that are as high as those of previous studies which use considerably more genes and complex procedures. Finally, the TSP classifier is parameter-free, thus avoiding the type of over-fitting and inflated estimates of performance that result when all aspects of learning a predictor are not properly cross-validated.

0 comments Cited 89 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Activity-based protein profiling for biochemical pathway discovery in cancer.

Daniel Nomura, Melissa M. Dix, Benjamin Cravatt (2010)

Large-scale profiling methods have uncovered numerous gene and protein expression changes that correlate with tumorigenesis. However, determining the relevance of these expression changes and which biochemical pathways they affect has been hindered by our incomplete understanding of the proteome and its myriad functions and modes of regulation. Activity-based profiling platforms enable both the discovery of cancer-relevant enzymes and selective pharmacological probes to perturb and characterize these proteins in tumour cells. When integrated with other large-scale profiling methods, activity-based proteomics can provide insight into the metabolic and signalling pathways that support cancer pathogenesis and illuminate new strategies for disease diagnosis and treatment.

0 comments Cited 78 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Journal ID (iso-abbrev): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2013

Publication date (Electronic): 5 March 2013

Volume: 14

Page: 78

Affiliations

[1 ]Institute for Systems Biology, Seattle, WA, USA

[2 ]Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA

[3 ]Department of Bioengineering, University of Illinois, Urbana, IL, USA

[4 ]Center for Biophysics and Computational Biology, University of Illinois, Urbana, IL, USA

[5 ]Department of Computer Science, University of Illinois, Urbana, IL

Article

Publisher ID: 1471-2105-14-78

DOI: 10.1186/1471-2105-14-78

PMC ID: 3599560

PubMed ID: 23496976

SO-VID: b528d155-7d8c-4c90-9c86-3782cde25e0e

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

AUREA: an open-source software system for accurate and user-friendly identification of relative expression molecular signatures

Read this article at

Abstract

Background

Results

Conclusions

Related collections

REPO4EU WP2 Databases

Most cited references 11

Simple decision rules for classifying human cancers from gene expression profiles.

Classifying gene expression profiles from pairwise mRNA comparisons.

Activity-based protein profiling for biochemical pathway discovery in cancer.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 45

Cited by 1

Most referenced authors 472