215
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Finding scientific topics.

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the data, consistent with the class designations provided by the authors of the articles, and outline further applications of this analysis, including identifying "hot topics" by examining temporal dynamics and tagging abstracts to illustrate semantic content.

          Related collections

          Author and article information

          Journal
          Proc Natl Acad Sci U S A
          Proceedings of the National Academy of Sciences of the United States of America
          Proceedings of the National Academy of Sciences
          0027-8424
          0027-8424
          Apr 06 2004
          : 101 Suppl 1
          Affiliations
          [1 ] Department of Psychology, Stanford University, Stanford, CA 94305, USA. gruffydd@psych.stanford.edu
          Article
          0307752101
          10.1073/pnas.0307752101
          387300
          14872004
          e6a1292f-c6cd-406d-b79a-e5fe238804e2
          History

          Comments

          Comment on this article