7
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Identifiability and Unmixing of Latent Parse Trees

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: not found
          • Article: not found

          Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Identification in Parametric Models

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Full reconstruction of Markov models on evolutionary trees: identifiability and consistency.

              A Markov model of evolution of characters on a phylogenetic tree consists of a tree topology together with a specification of probability transition matrices on the edges of the tree. Previous work has shown that, under mild conditions, the tree topology may be reconstructed, in the sense that the topology is identifiable from knowledge of the joint distribution of character states at pairs of terminal nodes of the tree. Also, the method of maximum likelihood is statistically consistent for inferring the tree topology. In this article we answer the analogous questions for reconstructing the full model, including the edge transition matrices. Under mild conditions, such full reconstruction is achievable, not by using pairs of terminal nodes, but rather by using triples of terminal nodes. The identifiability result generalizes previous results that were restricted either to characters having two states or to transition matrices having special structure. The proof develops matrix relationships that may be exploited to identify the model. We also use the identifiability result to prove that the method of maximum likelihood is consistent for reconstructing the full model.
                Bookmark

                Author and article information

                Journal
                14 June 2012
                Article
                1206.3137
                3a122416-3636-4f35-9136-418eaf7315c6

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                stat.ML cs.LG

                Comments

                Comment on this article