24
views
0
recommends
+1 Recommend
0 collections
    8
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Principal component analysis (PCA) has been widely used to visualize high-dimensional metabolomic data in a two- or three-dimensional subspace. In metabolomics, some metabolites (e.g., the top 10 metabolites) have been subjectively selected when using factor loading in PCA, and biological inferences are made for these metabolites. However, this approach may lead to biased biological inferences because these metabolites are not objectively selected with statistical criteria.

          Results

          We propose a statistical procedure that selects metabolites with statistical hypothesis testing of the factor loading in PCA and makes biological inferences about these significant metabolites with a metabolite set enrichment analysis (MSEA). This procedure depends on the fact that the eigenvector in PCA for autoscaled data is proportional to the correlation coefficient between the PC score and each metabolite level. We applied this approach to two sets of metabolomic data from mouse liver samples: 136 of 282 metabolites in the first case study and 66 of 275 metabolites in the second case study were statistically significant. This result suggests that to set the number of metabolites before the analysis is inappropriate because the number of significant metabolites differs in each study when factor loading is used in PCA. Moreover, when an MSEA of these significant metabolites was performed, significant metabolic pathways were detected, which were acceptable in terms of previous biological knowledge.

          Conclusions

          It is essential to select metabolites statistically to make unbiased biological inferences from metabolomic data when using factor loading in PCA. We propose a statistical procedure to select metabolites with statistical hypothesis testing of the factor loading in PCA, and to draw biological inferences about these significant metabolites with MSEA. We have developed an R package “mseapca” to facilitate this approach. The “mseapca” package is publicly available at the CRAN website.

          Related collections

          Most cited references16

          • Record: found
          • Abstract: not found
          • Article: not found

          What is principal component analysis?

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Global functional profiling of gene expression.

            The typical result of a microarray experiment is a list of tens or hundreds of genes found to be differentially regulated in the condition under study. Independent of the methods used to select these genes, the common task faced by any researcher is to translate these lists of genes into a better understanding of the biological phenomena involved. Currently, this is done through a tedious combination of searches through the literature and a number of public databases. We developed Onto-Express (OE) as a novel tool able to automatically translate such lists of differentially regulated genes into functional profiles characterizing the impact of the condition studied. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function, and chromosome location. Statistical significance values are calculated for each category. We demonstrate the validity and the utility of this comprehensive global analysis of gene function by analyzing two breast cancer datasets from two separate laboratories. OE was able to identify correctly all biological processes postulated by the original authors, as well as discover novel relevant mechanisms.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data.

              Datasets resulting from metabolomics or metabolic profiling experiments are becoming increasingly complex. Such datasets may contain underlying factors, such as time (time-resolved or longitudinal measurements), doses or combinations thereof. Currently used biostatistics methods do not take the structure of such complex datasets into account. However, incorporating this structure into the data analysis is important for understanding the biological information in these datasets. We describe ASCA, a new method that can deal with complex multivariate datasets containing an underlying experimental design, such as metabolomics datasets. It is a direct generalization of analysis of variance (ANOVA) for univariate data to the multivariate case. The method allows for easy interpretation of the variation induced by the different factors of the design. The method is illustrated with a dataset from a metabolomics experiment with time and dose factors.
                Bookmark

                Author and article information

                Contributors
                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2014
                21 February 2014
                : 15
                : 51
                Affiliations
                [1 ]Human Metabolome Technologies, Inc, 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata 997-0052, Japan
                Article
                1471-2105-15-51
                10.1186/1471-2105-15-51
                4015128
                24555693
                eae7ed1a-cef9-43b4-857a-4634115bdee2
                Copyright © 2014 Yamamoto et al.; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

                History
                : 25 August 2013
                : 13 February 2014
                Categories
                Methodology Article

                Bioinformatics & Computational biology
                principal component analysis,statistical hypothesis testing of factor loading,metabolite set enrichment analysis

                Comments

                Comment on this article