78
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Strategies for aggregating gene expression data: The collapseRows R function

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Genomic and other high dimensional analyses often require one to summarize multiple related variables by a single representative. This task is also variously referred to as collapsing, combining, reducing, or aggregating variables. Examples include summarizing several probe measurements corresponding to a single gene, representing the expression profiles of a co-expression module by a single expression profile, and aggregating cell-type marker information to de-convolute expression data. Several standard statistical summary techniques can be used, but network methods also provide useful alternative methods to find representatives. Currently few collapsing functions are developed and widely applied.

          Results

          We introduce the R function collapseRows that implements several collapsing methods and evaluate its performance in three applications. First, we study a crucial step of the meta-analysis of microarray data: the merging of independent gene expression data sets, which may have been measured on different platforms. Toward this end, we collapse multiple microarray probes for a single gene and then merge the data by gene identifier. We find that choosing the probe with the highest average expression leads to best between-study consistency. Second, we study methods for summarizing the gene expression profiles of a co-expression module. Several gene co-expression network analysis applications show that the optimal collapsing strategy depends on the analysis goal. Third, we study aggregating the information of cell type marker genes when the aim is to predict the abundance of cell types in a tissue sample based on gene expression data ("expression deconvolution"). We apply different collapsing methods to predict cell type abundances in peripheral human blood and in mixtures of blood cell lines. Interestingly, the most accurate prediction method involves choosing the most highly connected "hub" marker gene. Finally, to facilitate biological interpretation of collapsed gene lists, we introduce the function userListEnrichment, which assesses the enrichment of gene lists for known brain and blood cell type markers, and for other published biological pathways.

          Conclusions

          The R function collapseRows implements several standard and network-based collapsing methods. In various genomic applications we provide evidence that both types of methods are robust and biologically relevant tools.

          Related collections

          Most cited references15

          • Record: found
          • Abstract: found
          • Article: not found

          Functional organization of the transcriptome in human brain.

          The enormous complexity of the human brain ultimately derives from a finite set of molecular instructions encoded in the human genome. These instructions can be directly studied by exploring the organization of the brain's transcriptome through systematic analysis of gene coexpression relationships. We analyzed gene coexpression relationships in microarray data generated from specific human brain regions and identified modules of coexpressed genes that correspond to neurons, oligodendrocytes, astrocytes and microglia. These modules provide an initial description of the transcriptional programs that distinguish the major cell classes of the human brain and indicate that cell type-specific information can be obtained from whole brain tissue without isolating homogeneous populations of cells. Other modules corresponded to additional cell types, organelles, synaptic function, gender differences and the subventricular neurogenic niche. We found that subventricular zone astrocytes, which are thought to function as neural stem cells in adults, have a distinct gene expression pattern relative to protoplasmic astrocytes. Our findings provide a new foundation for neurogenetic inquiries by revealing a robust and previously unrecognized organization to the human brain transcriptome.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways.

            Because mouse models play a crucial role in biomedical research related to the human nervous system, understanding the similarities and differences between mouse and human brain is of fundamental importance. Studies comparing transcription in human and mouse have come to varied conclusions, in part because of their relatively small sample sizes or underpowered methodologies. To better characterize gene expression differences between mouse and human, we took a systems-biology approach by using weighted gene coexpression network analysis on more than 1,000 microarrays from brain. We find that global network properties of the brain transcriptome are highly preserved between species. Furthermore, all modules of highly coexpressed genes identified in mouse were identified in human, with those related to conserved cellular functions showing the strongest between-species preservation. Modules corresponding to glial and neuronal cells were sufficiently preserved between mouse and human to permit identification of cross species cell-class marker genes. We also identify several robust human-specific modules, including one strongly correlated with measures of Alzheimer disease progression across multiple data sets, whose hubs are poorly-characterized genes likely involved in Alzheimer disease. We present multiple lines of evidence suggesting links between neurodegenerative disease and glial cell types in human, including human-specific correlation of presenilin-1 with oligodendrocyte markers, and significant enrichment for known neurodegenerative disease genes in microglial modules. Together, this work identifies convergent and divergent pathways in mouse and human, and provides a systematic framework that will be useful for understanding the applicability of mouse models for human brain disorders.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes.

              Quantitative differences in gene expression are thought to contribute to phenotypic differences between individuals. We generated genome-wide transcriptional profiles of lymphocyte samples from 1,240 participants in the San Antonio Family Heart Study. The expression levels of 85% of the 19,648 detected autosomal transcripts were significantly heritable. Linkage analysis uncovered >1,000 cis-regulated transcripts at a false discovery rate of 5% and showed that the expression quantitative trait loci with the most significant linkage evidence are often located at the structural locus of a given transcript. To highlight the usefulness of this much-enlarged map of cis-regulated transcripts for the discovery of genes that influence complex traits in humans, as an example we selected high-density lipoprotein cholesterol concentration as a phenotype of clinical importance, and identified the cis-regulated vanin 1 (VNN1) gene as harboring sequence variants that influence high-density lipoprotein cholesterol concentrations.
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2011
                4 August 2011
                : 12
                : 322
                Affiliations
                [1 ]Interdepartmental Program for Neuroscience, UCLA, Los Angeles, California, USA
                [2 ]Human Genetics Department, UCLA, Los Angeles, California, USA
                [3 ]Biostatistics Department, UCLA, Los Angeles, California, USA
                [4 ]Neurology Department, UCLA, Los Angeles, California, USA
                [5 ]Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, USA
                Article
                1471-2105-12-322
                10.1186/1471-2105-12-322
                3166942
                21816037
                2d77a840-6be4-4f83-9c3d-f4abb3a026cc
                Copyright ©2011 Miller et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 23 May 2011
                : 4 August 2011
                Categories
                Methodology Article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article