25
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Regularized estimation of large-scale gene association networks using graphical Gaussian models

      research-article
      1 , , 2 , 3 , 4 , 5
      BMC Bioinformatics
      BioMed Central

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Graphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of samples is the estimation of the matrix of partial correlations. Since the (Moore-Penrose) inverse of the sample covariance matrix leads to poor estimates in this scenario, standard methods are inappropriate and adequate regularization techniques are needed. Popular approaches include biased estimates of the covariance matrix and high-dimensional regression schemes, such as the Lasso and Partial Least Squares.

          Results

          In this article, we investigate a general framework for combining regularized regression methods with the estimation of Graphical Gaussian models. This framework includes various existing methods as well as two new approaches based on ridge regression and adaptive lasso, respectively. These methods are extensively compared both qualitatively and quantitatively within a simulation study and through an application to six diverse real data sets. In addition, all proposed algorithms are implemented in the R package "parcor", available from the R repository CRAN.

          Conclusion

          In our simulation studies, the investigated non-sparse regression methods, i.e. Ridge Regression and Partial Least Squares, exhibit rather conservative behavior when combined with (local) false discovery rate multiple testing in order to decide whether or not an edge is present in the network. For networks with higher densities, the difference in performance of the methods decreases. For sparse networks, we confirm the Lasso's well known tendency towards selecting too many edges, whereas the two-stage adaptive Lasso is an interesting alternative that provides sparser solutions. In our simulations, both sparse and non-sparse methods are able to reconstruct networks with cluster structures. On six real data sets, we also clearly distinguish the results obtained using the non-sparse methods and those obtained using the sparse methods where specification of the regularization parameter automatically means model selection. In five out of six data sets, Partial Least Squares selects very dense networks. Furthermore, for data that violate the assumption of uncorrelated observations (due to replications), the Lasso and the adaptive Lasso yield very complex structures, indicating that they might not be suited under these conditions. The shrinkage approach is more stable than the regression based approaches when using subsampling.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: not found
          • Article: not found

          The Adaptive Lasso and Its Oracle Properties

          Hui Zou (2006)
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Model selection and estimation in the Gaussian graphical model

              M Yuan, Y. Lin (2007)
                Bookmark

                Author and article information

                Journal
                BMC Bioinformatics
                BMC Bioinformatics
                BioMed Central
                1471-2105
                2009
                24 November 2009
                : 10
                : 384
                Affiliations
                [1 ]Machine Learning/Intelligent Data Analysis Group, Berlin Institute of Technology, Franklinstr 28/29, D-10587 Berlin, Germany
                [2 ]Seminar für Statistik, ETH Zurich, CH-8092 Zurich, Switzerland
                [3 ]Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, CH-4031 Basel, Switzerland
                [4 ]Department of Statistics, University of Munich, Ludwigstr 33, D-80539 Munich, Germany
                [5 ]Computational Molecular Medicine Research Group, Department of Medical Informatics, Biometry and Epidemiology, University of Munich, Marchioninistr 15, 81377 Munich, Germany
                Article
                1471-2105-10-384
                10.1186/1471-2105-10-384
                2808166
                19930695
                33231aea-cfab-4c7d-9903-f732adcf44f0
                Copyright ©2009 Krämer et al; licensee BioMed Central Ltd.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 5 May 2009
                : 24 November 2009
                Categories
                Research article

                Bioinformatics & Computational biology
                Bioinformatics & Computational biology

                Comments

                Comment on this article