386
views
0
recommends
+1 Recommend
0 collections
    20
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Genome-wide expression profiling is a powerful tool for implicating novel gene ensembles in cellular mechanisms of health and disease. The most popular platform for genome-wide expression profiling is the Affymetrix GeneChip. However, its selection of probes relied on earlier genome and transcriptome annotation which is significantly different from current knowledge. The resultant informatics problems have a profound impact on analysis and interpretation the data. Here, we address these critical issues and offer a solution. We identified several classes of problems at the individual probe level in the existing annotation, under the assumption that current genome and transcriptome databases are more accurate than those used for GeneChip design. We then reorganized probes on more than a dozen popular GeneChips into gene-, transcript- and exon-specific probe sets in light of up-to-date genome, cDNA/EST clustering and single nucleotide polymorphism information. Comparing analysis results between the original and the redefined probe sets reveals ∼30–50% discrepancy in the genes previously identified as differentially expressed, regardless of analysis method. Our results demonstrate that the original Affymetrix probe set definitions are inaccurate, and many conclusions derived from past GeneChip analyses may be significantly flawed. It will be beneficial to re-analyze existing GeneChip data with updated probe set definitions.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: not found

          Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.

          Recent advances in cDNA and oligonucleotide DNA arrays have made it possible to measure the abundance of mRNA transcripts for many genes simultaneously. The analysis of such experiments is nontrivial because of large data size and many levels of variation introduced at different stages of the experiments. The analysis is further complicated by the large differences that may exist among different probes used to interrogate the same gene. However, an attractive feature of high-density oligonucleotide arrays such as those produced by photolithography and inkjet technology is the standardization of chip manufacturing and hybridization process. As a result, probe-specific biases, although significant, are highly reproducible and predictable, and their adverse effect can be reduced by proper modeling and analysis methods. Here, we propose a statistical model for the probe-level data, and develop model-based estimates for gene expression indexes. We also present model-based methods for identifying and handling cross-hybridizing probes and contaminating array regions. Applications of these results will be presented elsewhere.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.

            Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.

              We present a simple but powerful procedure to extract Gene Ontology (GO) terms that are significantly over- or under-represented in sets of genes within the context of a genome-scale experiment (DNA microarray, proteomics, etc.). Said procedure has been implemented as a web application, FatiGO, allowing for easy and interactive querying. FatiGO, which takes the multiple-testing nature of statistical contrast into account, currently includes GO associations for diverse organisms (human, mouse, fly, worm and yeast) and the TrEMBL/Swissprot GOAnnotations@EBI correspondences from the European Bioinformatics Institute.
                Bookmark

                Author and article information

                Journal
                Nucleic Acids Res
                Nucleic Acids Research
                Nucleic Acids Research
                Oxford University Press
                0305-1048
                1362-4962
                2005
                2005
                10 November 2005
                : 33
                : 20
                : e175
                Affiliations
                Molecular and Behavioural Neuroscience Institute and Department of Psychiatry, University of Michigan Ann Arbor, MI 48109, USA
                1Michigan Center for Biological Information, University of Michigan Ann Arbor, MI 48105, USA
                2Department of Psychiatry and Center for Neuroscience, University of California Davis, CA 95616, USA
                3Department of Psychiatry and Human Behavior, University of California Irvine, CA 92697, USA
                4Department of Genetics, Stanford University School of Medicine Stanford, CA 94305, USA
                5Department of Statistics, University of California Berkeley, CA 94720, USA
                Author notes
                *To whom correspondence should be addressed. Tel: +1 734 615 7099; Fax: +1 734 647 4130; Email: mengf@ 123456umich.edu
                Article
                10.1093/nar/gni179
                1283542
                16284200
                709ece99-ab6b-4569-babb-f371b72787a2
                © The Author 2005. Published by Oxford University Press. All rights reserved

                The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@ 123456oxfordjournals.org

                History
                : 26 September 2005
                : 17 October 2005
                : 25 October 2005
                Categories
                Methods Online

                Genetics
                Genetics

                Comments

                Comment on this article