2
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Using text mining techniques to extract prostate cancer predictive information (Gleason score) from semi-structured narrative laboratory reports in the Gauteng province, South Africa

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Prostate cancer (PCa) is the leading male neoplasm in South Africa with an age-standardised incidence rate of 68.0 per 100,000 population in 2018. The Gleason score (GS) is the strongest predictive factor for PCa treatment and is embedded within semi-structured prostate biopsy narrative reports. The manual extraction of the GS is labour-intensive. The objective of our study was to explore the use of text mining techniques to automate the extraction of the GS from irregularly reported text-intensive patient reports.

          Methods

          We used the associated Systematized Nomenclature of Medicine clinical terms morphology and topography codes to identify prostate biopsies with a PCa diagnosis for men aged > 30 years between 2006 and 2016 in the Gauteng Province, South Africa. We developed a text mining algorithm to extract the GS from 1000 biopsy reports with a PCa diagnosis from the National Health Laboratory Service database and validated the algorithm using 1000 biopsies from the private sector. The logical steps for the algorithm were data acquisition, pre-processing, feature extraction, feature value representation, feature selection, information extraction, classification, and discovered knowledge. We evaluated the algorithm using precision, recall and F-score. The GS was manually coded by two experts for both datasets. The top five GS were reported, with the remaining scores categorised as “Other” for both datasets. The percentage of biopsies with a high-risk GS (≥ 8) was also reported.

          Results

          The first output reported an F-score of 0.99 that improved to 1.00 after the algorithm was amended (the GS reported in clinical history was ignored). For the validation dataset, an F-score of 0.99 was reported. The most commonly reported GS were 5 + 4 = 9 (17.6%), 3 + 3 = 6 (17.5%), 4 + 3 = 7 (16.4%), 3 + 4 = 7 (14.7%) and 4 + 4 = 8 (14.2%). For the validation dataset, the most commonly reported GS were: (i) 3 + 3 = 6 (37.7%), (ii) 3 + 4 = 7 (19.4%), (iii) 4 + 3 = 7 (14.9%), (iv) 4 + 4 = 8 (10.0%) and (v) 4 + 5 = 9 (7.4%). A high-risk GS was reported for 31.8% compared to 17.4% for the validation dataset.

          Conclusions

          We demonstrated reliable extraction of information about GS from narrative text-based patient reports using an in-house developed text mining algorithm. A secondary outcome was that late presentation could be assessed.

          Related collections

          Most cited references21

          • Record: found
          • Abstract: found
          • Article: not found

          Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries

          This article provides a status report on the global burden of cancer worldwide using the GLOBOCAN 2018 estimates of cancer incidence and mortality produced by the International Agency for Research on Cancer, with a focus on geographic variability across 20 world regions. There will be an estimated 18.1 million new cancer cases (17.0 million excluding nonmelanoma skin cancer) and 9.6 million cancer deaths (9.5 million excluding nonmelanoma skin cancer) in 2018. In both sexes combined, lung cancer is the most commonly diagnosed cancer (11.6% of the total cases) and the leading cause of cancer death (18.4% of the total cancer deaths), closely followed by female breast cancer (11.6%), prostate cancer (7.1%), and colorectal cancer (6.1%) for incidence and colorectal cancer (9.2%), stomach cancer (8.2%), and liver cancer (8.2%) for mortality. Lung cancer is the most frequent cancer and the leading cause of cancer death among males, followed by prostate and colorectal cancer (for incidence) and liver and stomach cancer (for mortality). Among females, breast cancer is the most commonly diagnosed cancer and the leading cause of cancer death, followed by colorectal and lung cancer (for incidence), and vice versa (for mortality); cervical cancer ranks fourth for both incidence and mortality. The most frequently diagnosed cancer and the leading cause of cancer death, however, substantially vary across countries and within each country depending on the degree of economic development and associated social and life style factors. It is noteworthy that high-quality cancer registry data, the basis for planning and implementing evidence-based cancer control programs, are not available in most low- and middle-income countries. The Global Initiative for Cancer Registry Development is an international partnership that supports better estimation, as well as the collection and use of local data, to prioritize and evaluate national cancer control efforts. CA: A Cancer Journal for Clinicians 2018;0:1-31. © 2018 American Cancer Society.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Deep learning.

            Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A guide to deep learning in healthcare

              Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data. Similarly, reinforcement learning is discussed in the context of robotic-assisted surgery, and generalized deep-learning methods for genomics are reviewed.
                Bookmark

                Author and article information

                Contributors
                naseem.cassim@wits.ac.za
                michael.mapundu@wits.ac.za
                victoro@nicd.ac.za
                turgay.celik@wits.ac.za
                jaya.george@wits.ac.za
                debbie.glencross@nhls.ac.za
                Journal
                BMC Med Inform Decis Mak
                BMC Med Inform Decis Mak
                BMC Medical Informatics and Decision Making
                BioMed Central (London )
                1472-6947
                25 November 2021
                25 November 2021
                2021
                : 21
                : 330
                Affiliations
                [1 ]GRID grid.416657.7, ISNI 0000 0004 0630 4574, Department of Molecular Medicine and Haematology, Faculty of Health Sciences, , University of Witwatersrand and National Health Laboratory Service (NHLS), ; 7 York Road, Parktown, Johannesburg, South Africa
                [2 ]GRID grid.11951.3d, ISNI 0000 0004 1937 1135, School of Public Health, Faculty of Health Sciences, , University of Witwatersrand, ; 7 York Road, Parktown, Johannesburg, South Africa
                [3 ]GRID grid.416657.7, ISNI 0000 0004 0630 4574, National Health Laboratory Service (NHLS), , National Cancer Registry (NCR), ; 1 Modderfontein Road, Sandringham, Johannesburg, South Africa
                [4 ]GRID grid.11951.3d, ISNI 0000 0004 1937 1135, School of Electrical & Information Engineering and Wits Institute of Data Science, , University of Witwatersrand, ; 1 Jan Smuts Avenue, Braamfontein, Johannesburg, South Africa
                [5 ]GRID grid.416657.7, ISNI 0000 0004 0630 4574, Department of Chemical Pathology, Faculty of Health Sciences, , University of Witwatersrand and National Health Laboratory Service (NHLS), ; 7 York Road, Parktown, Johannesburg, South Africa
                Author information
                http://orcid.org/0000-0003-4389-2849
                http://orcid.org/0000-0002-2830-0692
                http://orcid.org/0000-0002-0154-0688
                http://orcid.org/0000-0001-6925-6010
                http://orcid.org/0000-0002-8741-8746
                http://orcid.org/0000-0001-7106-769X
                Article
                1697
                10.1186/s12911-021-01697-2
                8614040
                34823522
                3370e287-e153-4d4f-856d-a2f8c838c890
                © The Author(s) 2021

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 4 August 2021
                : 18 November 2021
                Categories
                Research
                Custom metadata
                © The Author(s) 2021

                Bioinformatics & Computational biology
                prostate cancer,gleason score,late presentation,text mining,algorithm,public health

                Comments

                Comment on this article