39
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Corpus-based Approach to Creating a Semantic Lexicon for Clinical Research Eligibility Criteria from UMLS

      research-article

      Read this article at

      ScienceOpenPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We describe a corpus-based approach to creating a semantic lexicon using UMLS knowledge sources. We extracted 10,000 sentences from the eligibility criteria sections of clinical trial summaries contained in ClinicalTrials.gov. The UMLS Metathesaurus and SPECIALIST Lexical Tools were used to extract and normalize UMLS recognizable terms. When annotated with Semantic Network types, the corpus had a lexical ambiguity of 1.57 (=total types for unique lexemes / total unique lexemes) and a word occurrence ambiguity of 1.96 (=total type occurrences / total word occurrences). A set of semantic preference rules was developed and applied to completely eliminate ambiguity in semantic type assignment. The lexicon covered 95.95% UMLS-recognizable terms in our corpus. A total of 20 UMLS semantic types, representing about 17% of all the distinct semantic types assigned to corpus lexemes, covered about 80% of the vocabulary of our corpus.

          Related collections

          Most cited references3

          • Record: found
          • Abstract: found
          • Article: not found

          The SAGE Guideline Model: achievements and overview.

          The SAGE (Standards-Based Active Guideline Environment) project was formed to create a methodology and infrastructure required to demonstrate integration of decision-support technology for guideline-based care in commercial clinical information systems. This paper describes the development and innovative features of the SAGE Guideline Model and reports our experience encoding four guidelines. Innovations include methods for integrating guideline-based decision support with clinical workflow and employment of enterprise order sets. Using SAGE, a clinician informatician can encode computable guideline content as recommendation sets using only standard terminologies and standards-based patient information models. The SAGE Model supports encoding large portions of guideline knowledge as re-usable declarative evidence statements and supports querying external knowledge sources.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Better access to information about clinical trials.

            Access to information about clinical trials is important to researchers, health care professionals, and patients. Many have argued for the establishment of clinical trials registries, citing their substantial benefits. Although some registries do exist, it has been difficult to create comprehensive, easily accessible systems. This paper briefly reviews existing registries, discusses the challenges in building registries, and reviews some of their benefits. The paper concludes with a description of a new, extensive Web-based registry called ClinicalTrials.gov (http://clinicaltrials. gov/), which was developed at the National Institutes of Health (NIH) by the National Library of Medicine as a result of recent legislation calling for a comprehensive, publicly accessible registry of clinical trials. The first version of the system became available in late February 2000 and contains information about approximately 5000 trials. The first release contains primarily NIH-sponsored trials, and new trials are regularly added to the system. Subsequent versions will contain information about trials sponsored by other federal agencies and by the private sector. The system was developed in accordance with basic informatics principles, including adherence to standards, usability considerations, and iterative testing and evaluation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A semantic lexicon for medical language processing.

              Construction of a resource that provides semantic information about words and phrases to facilitate the computer processing of medical narrative. Lexemes (words and word phrases) in the Specialist Lexicon were matched against strings in the 1997 Metathesaurus of the Unified Medical Language System (UMLS) developed by the National Library of Medicine. This yielded a "semantic lexicon," in which each lexeme is associated with one or more syntactic types, each of which can have one or more semantic types. The semantic lexicon was then used to assign semantic types to lexemes occurring in a corpus of discharge summaries (603,306 sentences). Lexical items with multiple semantic types were examined to determine whether some of the types could be eliminated, on the basis of usage in discharge summaries. A concordance program was used to find contrasting contexts for each lexeme that would reflect different semantic senses. Based on this evidence, semantic preference rules were developed to reduce the number of lexemes with multiple semantic types. Matching the Specialist Lexicon against the Metathesaurus produced a semantic lexicon with 75,711 lexical forms, 22,805 (30.1 percent) of which had two or more semantic types. Matching the Specialist Lexicon against one year's worth of discharge summaries identified 27,633 distinct lexical forms, 13,322 of which had at least one semantic type. This suggests that the Specialist Lexicon has about 79 percent coverage for syntactic information and 38 percent coverage for semantic information for discharge summaries. Of those lexemes in the corpus that had semantic types, 3,474 (12.6 percent) had two or more types. When semantic preference rules were applied to the semantic lexicon, the number of entries with multiple semantic types was reduced to 423 (1.5 percent). In the discharge summaries, occurrences of lexemes with multiple semantic types were reduced from 9.41 to 1.46 percent. Automatic methods can be used to construct a semantic lexicon from existing UMLS sources. This semantic information can aid natural language processing programs that analyze medical narrative, provided that lexemes with multiple semantic types are kept to a minimum. Semantic preference rules can be used to select semantic types that are appropriate to clinical reports. Further work is needed to increase the coverage of the semantic lexicon and to exploit contextual information when selecting semantic senses.
                Bookmark

                Author and article information

                Journal
                AMIA Summits Transl Sci Proc
                AMIA Summits on Translational Science Proceedings
                American Medical Informatics Association
                2153-4063
                2010
                1 March 2010
                : 2010
                : 26-30
                Affiliations
                Department of Biomedical Informatics, Columbia University
                Article
                amia-s2010_cri_026
                3041551
                21347142
                660669b5-d49f-4f4f-a297-a5f0fc4d58b1
                ©2010 AMIA - All rights reserved.

                This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose

                History
                Categories
                Articles

                Medicine
                Medicine

                Comments

                Comment on this article