Blog
About

  • Record: found
  • Abstract: found
  • Article: found
Is Open Access

Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets

Preprint

Read this article at

Bookmark
      There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

      Abstract

      A fundamental question in many data analysis settings is the problem of discerning the "natural" dimension of a data set. That is, when a data set is drawn from a manifold (possibly with noise), a meaningful aspect of the data is the dimension of that manifold. Various approaches exist for estimating this dimension, such as the method of Secant-Avoidance Projection (SAP). Intuitively, the SAP algorithm seeks to determine a projection which best preserves the lengths of all secants between points in a data set; by applying the algorithm to find the best projections to vector spaces of various dimensions, one may infer the dimension of the manifold of origination. That is, one may learn the dimension at which it is possible to construct a diffeomorphic copy of the data in a lower-dimensional Euclidean space. Using Whitney's embedding theorem, we can relate this information to the natural dimension of the data. A drawback of the SAP algorithm is that a data set with \(T\) points has \(O(T^2)\) secants, making the computation and storage of all secants infeasible for very large data sets. In this paper, we propose a novel algorithm that generalizes the SAP algorithm with an emphasis on addressing this issue. That is, we propose a hierarchical secant-based dimensionality-reduction method, which can be employed for data sets where explicitly calculating all secants is not feasible.

      Related collections

      Most cited references 14

      • Record: found
      • Abstract: not found
      • Article: not found

      Analysis of a complex of statistical variables into principal components.

       H. Hotelling (1933)
        Bookmark
        • Record: found
        • Abstract: not found
        • Book: not found

        Fractal Geometry

          Bookmark
          • Record: found
          • Abstract: not found
          • Article: not found

          Numerical Methods for Computing Angles Between Linear Subspaces

            Bookmark

            Author and article information

            Journal
            05 August 2018
            1808.01686

            http://arxiv.org/licenses/nonexclusive-distrib/1.0/

            Custom metadata
            To appear in the Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, Waltham, MA USA
            cs.CV cs.LG eess.IV eess.SP

            Comments

            Comment on this article