27
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Medical Specialty Classification Based on Semiadversarial Data Augmentation

      research-article

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Rapidly increasing adoption of electronic health record (EHR) systems has caused automated medical specialty classification to become an important research field. Medical specialty classification not only improves EHR system retrieval efficiency and helps general practitioners identify urgent patient issues but also is useful in studying the practice and validity of clinical referral patterns. However, currently available medical note data are imbalanced and insufficient. In addition, medical specialty classification is a multicategory problem, and it is not easy to remove sensitive information from numerous medical notes and tag them. To solve those problems, we propose a data augmentation method based on adversarial attacks. The semiadversarial examples generated during the dynamic process of adversarial attacking are added to the training set as augmented examples, which can effectively expand the coverage of the training data on the decision space. Besides, as nouns in medical notes are critical information, we design a classification framework incorporating probabilistic information of nouns, with confidence recalculation after the softmax layer. We validate our proposed method on an 18-class dataset with extremely unbalanced data, and comparison experiments with four benchmarks show that our method improves accuracy and F1 score to the optimal level, by an average of 14.9%.

          Related collections

          Most cited references42

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          BioBERT: a pre-trained biomedical language representation model for biomedical text mining

          Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.
            • Record: found
            • Abstract: not found
            • Article: not found

            WordNet: a lexical database for English

              • Record: found
              • Abstract: not found
              • Article: not found

              GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification

                Author and article information

                Contributors
                Journal
                Comput Intell Neurosci
                Comput Intell Neurosci
                cin
                Computational Intelligence and Neuroscience
                Hindawi
                1687-5265
                1687-5273
                2023
                17 October 2023
                : 2023
                : 4919371
                Affiliations
                1Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
                2Department of New Networks, Peng Cheng Laboratory, Shenzhen, China
                3School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
                Author notes

                Academic Editor: Carmen De Maio

                Author information
                https://orcid.org/0000-0003-1909-9373
                https://orcid.org/0000-0001-7546-852X
                Article
                10.1155/2023/4919371
                10597728
                37881209
                15de8da8-4ac8-40b0-8479-d29ee8026917
                Copyright © 2023 Huan Zhang et al.

                This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 September 2022
                : 7 November 2022
                : 17 November 2022
                Funding
                Funded by: National Natural Science Foundation of China
                Award ID: 62250410365
                Award ID: 61902082
                Funded by: Major Key Project of PCL
                Award ID: PCL2022A03
                Funded by: Guangzhou Science and Technology Program key projects
                Award ID: 202102010507
                Funded by: Guangdong Higher Education Innovation Group
                Award ID: 2020KCXTD007
                Funded by: Guangzhou Higher Education Innovation Group
                Award ID: 202032854
                Categories
                Research Article

                Neurosciences
                Neurosciences

                Comments

                Comment on this article

                Related Documents Log