54
views
0
recommends
+1 Recommend
0 collections
    3
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi- Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors.

          Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus.

          We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements.

          Results

          Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM’s parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain.

          We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%.

          Conclusion

          SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: not found
          • Article: not found

          DBpedia - A crystallization point for the Web of Data

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            YAGO: A Large Ontology from Wikipedia and WordNet

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Extracting information from textual documents in the electronic health record: a review of recent research.

              We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR). Literature review of the research published after 1995, based on PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers already included. 174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information in general, extraction of codes and of information for decision-support and enrichment of the EHR, information extraction for surveillance, research, automated terminology management, and data mining, and de-identification of clinical text. Performance of information extraction systems with clinical text has improved since the last systematic review in 1995, but they are still rarely applied outside of the laboratory they have been developed in. Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.
                Bookmark

                Author and article information

                Contributors
                kristina.doing-harris@utah.edu
                yarden@sci.utah.edu
                stephane.meystre@hsc.utah.edu
                Journal
                J Biomed Semantics
                J Biomed Semantics
                Journal of Biomedical Semantics
                BioMed Central (London )
                2041-1480
                2 April 2015
                2 April 2015
                2015
                : 6
                : 15
                Affiliations
                [ ]University of Utah, Department of Biomedical Informatics, 421 Wakara Way, Suite 140, Salt Lake City, UT 84112 USA
                [ ]Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT USA
                Article
                11
                10.1186/s13326-015-0011-7
                4396714
                25874077
                f956b5f3-0a15-45a5-a0c8-8c8b8ca8ec6e
                © Doing-Harris et al.; licensee BioMed Central. 2015

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

                History
                : 1 September 2014
                : 4 March 2015
                Categories
                Software
                Custom metadata
                © The Author(s) 2015

                Bioinformatics & Computational biology
                ontology,natural language processing,terminology extraction

                Comments

                Comment on this article