6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      “Hybrid Topics” -- Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Extracting and understanding information, themes and relationships from large collections of documents is an important task for biomedical researchers. Latent Dirichlet Allocation is an unsupervised topic modeling technique using the bag-of-words assumption that has been applied extensively to unveil hidden thematic information within large sets of documents. In this paper, we added MeSH descriptors to the bag-of-words assumption to generate ‘hybrid topics’, which are mixed vectors of words and descriptors. We evaluated this approach on the quality and interpretability of topics in both a general corpus and a specialized corpus. Our results demonstrated that the coherence of ‘hybrid topics’ is higher than that of regular bag-of-words topics in the specialized corpus. We also found that the proportion of topics that are not associated with MeSH descriptors is higher in the specialized corpus than in the general corpus.

          Related collections

          Most cited references11

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Understanding PubMed® user search behavior through log analysis

          This article reports on a detailed investigation of PubMed users’ needs and behavior as a step toward improving biomedical information retrieval. PubMed is providing free service to researchers with access to more than 19 million citations for biomedical articles from MEDLINE and life science journals. It is accessed by millions of users each day. Efficient search tools are crucial for biomedical researchers to keep abreast of the biomedical literature relating to their own research. This study provides insight into PubMed users’ needs and their behavior. This investigation was conducted through the analysis of one month of log data, consisting of more than 23 million user sessions and more than 58 million user queries. Multiple aspects of users’ interactions with PubMed are characterized in detail with evidence from these logs. Despite having many features in common with general Web searches, biomedical information searches have unique characteristics that are made evident in this study. PubMed users are more persistent in seeking information and they reformulate queries often. The three most frequent types of search are search by author name, search by gene/protein, and search by disease. Use of abbreviation in queries is very frequent. Factors such as result set size influence users’ decisions. Analysis of characteristics such as these plays a critical role in identifying users’ information needs and their search habits. In turn, such an analysis also provides useful insight for improving biomedical information retrieval. Database URL: http://www.ncbi.nlm.nih.gov/PubMed
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Latent dirichlet allocation

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              An analysis of the coherence of descriptors in topic modeling

                Bookmark

                Author and article information

                Journal
                9214582
                21248
                Stud Health Technol Inform
                Stud Health Technol Inform
                Studies in health technology and informatics
                0926-9630
                23 March 2018
                2017
                29 March 2018
                : 245
                : 662-666
                Affiliations
                [a ]The University of Texas of Biomedical Informatics at Houston, Houston, Texas, USA
                [b ]Department of Computer Science, University of Maryland, College Park, Maryland, USA
                [c ]U.S. National Library of Medicine, National Institute of Health, Bethesda, Maryland, USA
                Author notes
                Address for correspondence: Olivier Bodenreider, MD, PhD, 8600 Rockville Pike, 38A/ 09S904, Bethesda, MD 20894, Phone Number: (301) 827-4982, olivier@ 123456nlm.nih.gov
                Article
                NIHMS953313
                5875427
                29295179
                1660c4b7-c45a-4257-998b-d33d7be68795

                This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).

                History
                Categories
                Article

                medical subject headings,models,statistical data,data interpretation,statistical

                Comments

                Comment on this article