+1 Recommend
0 collections
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.


          Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline. The SPECIALIST Lexicon and Lexical Tools, distributed by the National Library of Medicine as one of the Unified Medical Language System Knowledge Sources, provides an underlying resource for many NLP applications. This article reports recent developments of 3 key components in the Lexicon. The core NLP operation of Unified Medical Language System concept mapping is used to illustrate the importance of these developments. Our objective is to provide generic, broad coverage and a robust lexical system for NLP applications. A novel multiword approach and other planned developments are proposed.

          Related collections

          Most cited references 34

          • Record: found
          • Abstract: found
          • Article: not found

          A simple algorithm for identifying negated findings and diseases in discharge summaries.

          Narrative reports in medical records contain a wealth of information that may augment structured data for managing patient information and predicting trends in diseases. Pertinent negatives are evident in text but are not usually indexed in structured databases. The objective of the study reported here was to test a simple algorithm for determining whether a finding or disease mentioned within narrative medical reports is present or absent. We developed a simple regular expression algorithm called NegEx that implements several phrases indicating negation, filters out sentences containing phrases that falsely appear to be negation phrases, and limits the scope of the negation phrases. We compared NegEx against a baseline algorithm that has a limited set of negation phrases and a simpler notion of scope. In a test of 1235 findings and diseases in 1000 sentences taken from discharge summaries indexed by physicians, NegEx had a specificity of 94.5% (versus 85.3% for the baseline), a positive predictive value of 84.5% (versus 68.4% for the baseline) while maintaining a reasonable sensitivity of 77.8% (versus 88.3% for the baseline). We conclude that with little implementation effort a simple regular expression algorithm for determining whether a finding or disease is absent can identify a large portion of the pertinent negatives from discharge summaries.
            • Record: found
            • Abstract: found
            • Article: not found

            Exploring and developing consumer health vocabularies.

             Qing Zeng,  Tony Tse (2015)
            Laypersons ("consumers") often have difficulty finding, understanding, and acting on health information due to gaps in their domain knowledge. Ideally, consumer health vocabularies (CHVs) would reflect the different ways consumers express and think about health topics, helping to bridge this vocabulary gap. However, despite the recent research on mismatches between consumer and professional language (e.g., lexical, semantic, and explanatory), there have been few systematic efforts to develop and evaluate CHVs. This paper presents the point of view that CHV development is practical and necessary for extending research on informatics-based tools to facilitate consumer health information seeking, retrieval, and understanding. In support of the view, we briefly describe a distributed, bottom-up approach for (1) exploring the relationship between common consumer health expressions and professional concepts and (2) developing an open-access, preliminary (draft) "first-generation" CHV. While recognizing the limitations of the approach (e.g., not addressing psychosocial and cultural factors), we suggest that such exploratory research and development will yield insights into the nature of consumer health expressions and assist developers in creating tools and applications to support consumer health information seeking.
              • Record: found
              • Abstract: found
              • Article: not found

              The Unified Medical Language System: an informatics research collaboration.

              In 1986, the National Library of Medicine (NLM) assembled a large multidisciplinary, multisite team to work on the Unified Medical Language System (UMLS), a collaborative research project aimed at reducing fundamental barriers to the application of computers to medicine. Beyond its tangible products, the UMLS Knowledge Sources, and its influence on the field of informatics, the UMLS project is an interesting case study in collaborative research and development. It illustrates the strengths and challenges of substantive collaboration among widely distributed research groups. Over the past decade, advances in computing and communications have minimized the technical difficulties associated with UMLS collaboration and also facilitated the development, dissemination, and use of the UMLS Knowledge Sources. The spread of the World Wide Web has increased the visibility of the information access problems caused by multiple vocabularies and many information sources which are the focus of UMLS work. The time is propitious for building on UMLS accomplishments and making more progress on the informatics research issues first highlighted by the UMLS project more than 10 years ago.

                Author and article information

                J Am Med Inform Assoc
                J Am Med Inform Assoc
                Journal of the American Medical Informatics Association : JAMIA
                Oxford University Press
                October 2020
                29 May 2020
                29 May 2020
                : 27
                : 10
                : 1600-1605
                Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health , Bethesda, Maryland, USA
                Author notes
                Corresponding Author: Chris J. Lu, PhD, Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bldg. 38A, Room 9S-911, 8600 Rockville Pike, Bethesda, MD 20894, USA; chlu@
                © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email:

                This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (, which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact

                Page count
                Pages: 6
                Funded by: Intramural Research Program of the National Library of Medicine, National Institutes of Health;
                Case Report


                Comment on this article