44
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Graph-Sparse LDA: A Topic Model with Structured Sparsity

      Preprint
      , ,

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Originally designed to model text, topic modeling has become a powerful tool for uncovering latent structure in domains including medicine, finance, and vision. The goals for the model vary depending on the application: in some cases, the discovered topics may be used for prediction or some other downstream task. In other cases, the content of the topic itself may be of intrinsic scientific interest. Unfortunately, even using modern sparse techniques, the discovered topics are often difficult to interpret due to the high dimensionality of the underlying space. To improve topic interpretability, we introduce Graph-Sparse LDA, a hierarchical topic model that leverages knowledge of relationships between words (e.g., as encoded by an ontology). In our model, topics are summarized by a few latent concept-words from the underlying graph that explain the observed words. Graph-Sparse LDA recovers sparse, interpretable summaries on two real-world biomedical datasets while matching state-of-the-art prediction performance.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: found
          • Article: not found

          Finding scientific topics.

          A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying "hot topics" by examining temporal dynamics and tagging abstracts to illustrate semantic content.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Human action recognition by semilatent topic models.

              We propose two new models for human action recognition from video sequences using topic models. Video sequences are represented by a novel "bag-of-words" representation, where each frame corresponds to a "word." Our models differ from previous latent topic models for visual recognition in two major aspects: first of all, the latent topics in our models directly correspond to class labels; second, some of the latent variables in previous topic models become observed in our case. Our models have several advantages over other latent topic models used in visual recognition. First of all, the training is much easier due to the decoupling of the model parameters. Second, it alleviates the issue of how to choose the appropriate number of latent topics. Third, it achieves much better performance by utilizing the information provided by the class labels in the training set. We present action classification results on five different data sets. Our results are either comparable to, or significantly better than previously published results on these data sets.
                Bookmark

                Author and article information

                Journal
                16 October 2014
                2014-11-21
                Article
                1410.4510
                a93655da-fa45-4bd6-a892-44cb6c03497e

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                stat.ML cs.CL cs.LG

                Comments

                Comment on this article