24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Optimal detection of sparse principal components in high dimension

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: not found
          • Article: not found

          Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            On the distribution of the largest eigenvalue in principal components analysis

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              On Consistency and Sparsity for Principal Components Analysis in High Dimensions.

              Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if p(n)/n→0. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ⪢ n.
                Bookmark

                Author and article information

                Journal
                22 February 2012
                2013-09-19
                Article
                10.1214/13-AOS1127
                1202.5070
                07c8ff1b-2480-48f1-bad9-f4b9add0919e

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                IMS-AOS-AOS1127
                Annals of Statistics 2013, Vol. 41, No. 4, 1780-1815
                Published in at http://dx.doi.org/10.1214/13-AOS1127 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
                math.ST stat.ML stat.TH
                vtex

                Comments

                Comment on this article