2
views
0
recommends
+1 Recommend
1 collections
    0
    shares

      Submit your digital health research with an established publisher
      - celebrating 25 years of open access

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers–Assisted Sublanguage Analysis

      research-article

      Read this article at

      ScienceOpenPublisherPMC
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          A patient’s family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used.

          Objective

          In this study, we aimed to construct an FH lexical resource for information extraction and normalization.

          Methods

          We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning–based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation.

          Results

          The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning–based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable.

          Conclusions

          The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.

          Related collections

          Most cited references27

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The FAIR Guiding Principles for scientific data management and stewardship

          There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

            We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

              The 2010 i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records presented three tasks: a concept extraction task focused on the extraction of medical concepts from patient reports; an assertion classification task focused on assigning assertion types for medical problem concepts; and a relation classification task focused on assigning relation types that hold between medical problems, tests, and treatments. i2b2 and the VA provided an annotated reference standard corpus for the three tasks. Using this reference standard, 22 systems were developed for concept extraction, 21 for assertion classification, and 16 for relation classification. These systems showed that machine learning approaches could be augmented with rule-based systems to determine concepts, assertions, and relations. Depending on the task, the rule-based systems can either provide input for machine learning or post-process the output of machine learning. Ensembles of classifiers, information from unlabeled data, and external knowledge sources can help when the training data are inadequate.
                Bookmark

                Author and article information

                Contributors
                Journal
                JMIR Med Inform
                JMIR Med Inform
                JMI
                JMIR Medical Informatics
                JMIR Publications (Toronto, Canada )
                2291-9694
                2023
                27 June 2023
                : 11
                : e48072
                Affiliations
                [1 ] Department of Artificial Intelligence and Informatics Mayo Clinic Rochester, MN United States
                [2 ] Center for Digital Health Mayo Clinic Rochester, MN United States
                [3 ] Department of Computer Science University of Kentucky Lexington, KY United States
                [4 ] Division of Biomedical Informatics Department of Internal Medicine University of Kentucky Lexington, KY United States
                Author notes
                Corresponding Author: Hongfang Liu liu.hongfang@ 123456mayo.edu
                Author information
                https://orcid.org/0000-0001-9970-8604
                https://orcid.org/0000-0003-1312-4195
                https://orcid.org/0000-0001-9090-8028
                https://orcid.org/0000-0002-9191-3897
                https://orcid.org/0000-0003-1691-5179
                https://orcid.org/0000-0002-9758-4609
                https://orcid.org/0009-0000-2432-8316
                https://orcid.org/0000-0001-9763-1164
                https://orcid.org/0000-0003-1238-9378
                https://orcid.org/0000-0003-2570-3741
                Article
                v11i1e48072
                10.2196/48072
                10337517
                37368483
                2928f08e-e47d-4b55-be30-6e992a6b5bab
                ©Liwei Wang, Huan He, Andrew Wen, Sungrim Moon, Sunyang Fu, Kevin J Peterson, Xuguang Ai, Sijia Liu, Ramakanth Kavuluru, Hongfang Liu. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 27.06.2023.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.

                History
                : 13 April 2023
                : 12 May 2023
                : 25 May 2023
                : 1 June 2023
                Categories
                Original Paper
                Original Paper

                electronic health record,natural language processing,family history,sublanguage analysis,rule-based system,deep learning

                Comments

                Comment on this article