AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry.

Results

We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four.

Conclusions

By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.

Related collections

Most cited references 32

Record: found
Abstract: found
Article: not found

DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Glynn Dennis, Brad T. Sherman, Douglas A Hosack … (2003)

Functional annotation of differentially expressed genes is a necessary and critical step in the analysis of microarray data. The distributed nature of biological knowledge frequently requires researchers to navigate through numerous web-accessible databases gathering information one gene at a time. A more judicious approach is to provide query-based access to an integrated database that disseminates biologically rich information across large datasets and displays graphic summaries of functional information. Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://www.david.niaid.nih.gov) addresses this need via four web-based analysis modules: 1) Annotation Tool - rapidly appends descriptive data from several public databases to lists of genes; 2) GoCharts - assigns genes to Gene Ontology functional categories based on user selected classifications and term specificity level; 3) KeggCharts - assigns genes to KEGG metabolic processes and enables users to view genes in the context of biochemical pathway maps; and 4) DomainCharts - groups genes according to PFAM conserved protein domains. Analysis results and graphical displays remain dynamically linked to primary data and external data repositories, thereby furnishing in-depth as well as broad-based data coverage. The functionality provided by DAVID accelerates the analysis of genome-scale datasets by facilitating the transition from data collection to biological meaning.

0 comments Cited 1427 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

T. Golub (1999)

0 comments Cited 664 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Reprogramming of human somatic cells to pluripotency with defined factors.

In-Hyun Park, Rui Zhao, Jason A. West … (2008)

Pluripotency pertains to the cells of early embryos that can generate all of the tissues in the organism. Embryonic stem cells are embryo-derived cell lines that retain pluripotency and represent invaluable tools for research into the mechanisms of tissue formation. Recently, murine fibroblasts have been reprogrammed directly to pluripotency by ectopic expression of four transcription factors (Oct4, Sox2, Klf4 and Myc) to yield induced pluripotent stem (iPS) cells. Using these same factors, we have derived iPS cells from fetal, neonatal and adult human primary cells, including dermal fibroblasts isolated from a skin biopsy of a healthy research subject. Human iPS cells resemble embryonic stem cells in morphology and gene expression and in the capacity to form teratomas in immune-deficient mice. These data demonstrate that defined factors can reprogramme human cells to pluripotency, and establish a method whereby patient-specific cells might be established in culture.

0 comments Cited 662 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): BMC Bioinformatics

Title: BMC Bioinformatics

Publisher: BioMed Central

ISSN (Electronic): 1471-2105

Publication date Collection: 2010

Publication date (Electronic): 4 March 2010

Volume: 11

Page: 117

Affiliations

[1 ]Biomolecular Science and Engineering Program, University of California, Santa Barbara, CA 93106, USA

[2 ]Molecular, Cellular, and Developmental Biology, University of California, Santa Barbara, CA 93106, USA

Article

Publisher ID: 1471-2105-11-117

DOI: 10.1186/1471-2105-11-117

PMC ID: 2846907

PubMed ID: 20202218

SO-VID: d00af888-0dab-4ff5-9426-1803fa9e3728

License:

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Genetoberfest

Most cited references 32

DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring

Reprogramming of human somatic cells to pluripotency with defined factors.

Author and article information

Journal

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 39

Cited by 32

Most referenced authors 1,411