Blog
About

18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Sample-Optimal Density Estimation in Nearly-Linear Time

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We design a new, fast algorithm for agnostically learning univariate probability distributions whose densities are well approximated by piecewise polynomial functions. Let \(f\) be the density function of an arbitrary univariate distribution, and suppose that \(f\) is \(\mathrm{OPT}\)-close in \(L_1\)-distance to an unknown piecewise polynomial function with \(t\) interval pieces and degree \(d\). Our algorithm draws \(n = O(t(d+1)/\epsilon^2)\) samples from \(f\), runs in time \(\tilde{O}(n \cdot \mathrm{poly}(d))\), and with probability at least \(9/10\) outputs an \(O(t)\)-piecewise degree-\(d\) hypothesis \(h\) that is \(4 \cdot \mathrm{OPT} +\epsilon\) close to \(f\). Our general algorithm yields (nearly) sample-optimal and nearly-linear time estimators for a wide range of structured distribution families over both continuous and discrete domains in a unified way. For most of our applications, these are the first sample-optimal and nearly-linear time estimators in the literature. As a consequence, our work resolves the sample and computational complexities of a broad class of inference tasks via a single "meta-algorithm". Moreover, we experimentally demonstrate that our algorithm performs very well in practice. Our algorithm consists of three "levels": (i) At the top level, we employ an iterative greedy algorithm for finding a good partition of the real line into the pieces of a piecewise polynomial. (ii) For each piece, we show that the sub-problem of finding a good polynomial fit on the current interval can be solved efficiently with a separation oracle method. (iii) We reduce the task of finding a separating hyperplane to a combinatorial problem and give an efficient algorithm for this problem. Combining these three procedures gives a density estimation algorithm with the claimed guarantees.

          Related collections

          Most cited references 7

          • Record: found
          • Abstract: not found
          • Article: not found

          Mixture Densities, Maximum Likelihood and the EM Algorithm

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Log-concave probability and its applications

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Polynomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture

                Bookmark

                Author and article information

                Journal
                1506.00671

                Comments

                Comment on this article