Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals ∼30–50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.

Related collections

Most cited references 11

Record: found
Abstract: found
Article: not found

Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.

Jason Wong, Linda C. Li (2001)

Recent advances in cDNA and oligonucleotide DNA arrays have made it possible to measure the abundance of mRNA transcripts for many genes simultaneously. The analysis of such experiments is nontrivial because of large data size and many levels of variation introduced at different stages of the experiments. The analysis is further complicated by the large differences that may exist among different probes used to interrogate the same gene. However, an attractive feature of high-density oligonucleotide arrays such as those produced by photolithography and inkjet technology is the standardization of chip manufacturing and hybridization process. As a result, probe-specific biases, although significant, are highly reproducible and predictable, and their adverse effect can be reduced by proper modeling and analysis methods. Here, we propose a statistical model for the probe-level data, and develop model-based estimates for gene expression indexes. We also present model-based methods for identifying and handling cross-hybridizing probes and contaminating array regions. Applications of these results will be presented elsewhere.

0 comments Cited 569 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

Y Takenaka, Rodrigo Nakamura, C Bult … (2002)

Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.

0 comments Cited 499 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.

Fatima Al-Shahrour, Ramón Díaz-Uriarte, Joaquin Dopazo (2004)

We present a simple but powerful procedure to extract Gene Ontology (GO) terms that are significantly over- or under-represented in sets of genes within the context of a genome-scale experiment (DNA microarray, proteomics, etc.). Said procedure has been implemented as a web application, FatiGO, allowing for easy and interactive querying. FatiGO, which takes the multiple-testing nature of statistical contrast into account, currently includes GO associations for diverse organisms (human, mouse, fly, worm and yeast) and the TrEMBL/Swissprot GOAnnotations@EBI correspondences from the European Bioinformatics Institute.

0 comments Cited 376 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Nucleic Acids Res

Journal ID (publisher-id): Nucleic Acids Research

Title: Nucleic Acids Research

Publisher: Oxford University Press

ISSN (Print): 0305-1048

ISSN (Electronic): 1362-4962

Publication date Collection: 2005

Publication date (Print): 2005

Publication date (Electronic): 10 November 2005

Volume: 33

Issue: 20

Page: e175

Affiliations

Molecular and Behavioural Neuroscience Institute and Department of Psychiatry, University of Michigan Ann Arbor, MI 48109, USA

¹Michigan Center for Biological Information, University of Michigan Ann Arbor, MI 48105, USA

²Department of Psychiatry and Center for Neuroscience, University of California Davis, CA 95616, USA

³Department of Psychiatry and Human Behavior, University of California Irvine, CA 92697, USA

⁴Department of Genetics, Stanford University School of Medicine Stanford, CA 94305, USA

⁵Department of Statistics, University of California Berkeley, CA 94720, USA

Author notes

^*To whom correspondence should be addressed. Tel: +1 734 615 7099; Fax: +1 734 647 4130; Email: mengf@ 123456umich.edu

Article

DOI: 10.1093/nar/gni179

PMC ID: 1283542

PubMed ID: 16284200

SO-VID: 709ece99-ab6b-4569-babb-f371b72787a2

License:

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@ 123456oxfordjournals.org

History

Date received : 26 September 2005

Date revision received : 17 October 2005

Date accepted : 25 October 2005

Comments

Comment on this article

scite_

Cited by 729

See all cited by

Most referenced authors 2,779

See all reference authors

- Version 1

Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data

Read this article at

Abstract

Related collections

Genome Integrity

Most cited references 11

Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.

Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 127

Cited by 729

Most referenced authors 2,779