9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background and purpose

          This project assessed performance of natural language processing (NLP) and machine learning (ML) algorithms for classification of brain MRI radiology reports into acute ischemic stroke (AIS) and non-AIS phenotypes.

          Materials and methods

          All brain MRI reports from a single academic institution over a two year period were randomly divided into 2 groups for ML: training (70%) and testing (30%). Using “quanteda” NLP package, all text data were parsed into tokens to create the data frequency matrix. Ten-fold cross-validation was applied for bias correction of the training set. Labeling for AIS was performed manually, identifying clinical notes. We applied binary logistic regression, naïve Bayesian classification, single decision tree, and support vector machine for the binary classifiers, and we assessed performance of the algorithms by F1-measure. We also assessed how n-grams or term frequency-inverse document frequency weighting affected the performance of the algorithms.

          Results

          Of all 3,204 brain MRI documents, 432 (14.3%) were labeled as AIS. AIS documents were longer in character length than those of non-AIS (median [interquartile range]; 551 [377–681] vs. 309 [164–396]). Of all ML algorithms, single decision tree had the highest F1-measure (93.2) and accuracy (98.0%). Adding bigrams to the ML model improved F1-mesaure of naïve Bayesian classification, but not in others, and term frequency-inverse document frequency weighting to data frequency matrix did not show any additional performance improvements.

          Conclusions

          Supervised ML based NLP algorithms are useful for automatic classification of brain MRI reports for identification of AIS patients. Single decision tree was the best classifier to identify brain MRI reports with AIS.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: not found
          • Article: not found

          Term-weighting approaches in automatic text retrieval

            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Decision tree methods: applications for classification and prediction

            Summary Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review

              We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.
                Bookmark

                Author and article information

                Contributors
                Role: Formal analysisRole: InvestigationRole: MethodologyRole: ValidationRole: Writing – original draft
                Role: Data curationRole: MethodologyRole: Project administrationRole: ValidationRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: MethodologyRole: Project administrationRole: SupervisionRole: Writing – review & editing
                Role: ConceptualizationRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: SupervisionRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                28 February 2019
                2019
                : 14
                : 2
                : e0212778
                Affiliations
                [1 ] Department of Neurology, Hallym University College of Medicine, Chuncheon, Korea
                [2 ] Medical University of South Carolina, Charleston, South Carolina, United States of America
                [3 ] Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, United States of America
                [4 ] Department of Internal Medicine, Medical University of South Carolina, Charleston, South Carolina, United States of America
                University College London, UNITED STATES
                Author notes

                Competing Interests: Drs. Kim, Zhu and Obeid have no competing interests. Dr. Lenert is a member of the Board of Directors of the ATCC. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.

                Author information
                http://orcid.org/0000-0001-8762-8340
                Article
                PONE-D-18-24904
                10.1371/journal.pone.0212778
                6394972
                30818342
                9e83c26f-6c74-4965-90c8-77be363ef78f
                © 2019 Kim et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 23 August 2018
                : 8 February 2019
                Page count
                Figures: 4, Tables: 2, Pages: 13
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100006108, National Center for Advancing Translational Sciences;
                Award ID: UL1 TR001450
                Award Recipient :
                Funded by: SmartState Program in South Carolina
                Award Recipient :
                This study was supported by the NIH National Center for Advancing Translational Sciences (NCATS) through Grant Number UL1 TR001450 and the SmartState Program in South Carolina.
                Categories
                Research Article
                Medicine and Health Sciences
                Diagnostic Medicine
                Diagnostic Radiology
                Magnetic Resonance Imaging
                Research and Analysis Methods
                Imaging Techniques
                Diagnostic Radiology
                Magnetic Resonance Imaging
                Medicine and Health Sciences
                Radiology and Imaging
                Diagnostic Radiology
                Magnetic Resonance Imaging
                Computer and Information Sciences
                Information Technology
                Natural Language Processing
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Machine Learning Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Machine Learning Algorithms
                Medicine and Health Sciences
                Neurology
                Cerebrovascular Diseases
                Stroke
                Ischemic Stroke
                Medicine and Health Sciences
                Vascular Medicine
                Stroke
                Ischemic Stroke
                Engineering and Technology
                Management Engineering
                Decision Analysis
                Decision Trees
                Research and Analysis Methods
                Decision Analysis
                Decision Trees
                Medicine and Health Sciences
                Neurology
                Cerebrovascular Diseases
                Stroke
                Medicine and Health Sciences
                Vascular Medicine
                Stroke
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Support Vector Machines
                Custom metadata
                All relevant data are within the manuscript and its Supporting Information files.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article