18
views
0
recommends
+1 Recommend
2 collections
    0
    shares

      Submit your digital health research with JMIR Publications, a leading publisher of open access digital health research

      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Researching people with herpes simplex virus (HSV) is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity.

          Objective

          This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users to determine their level of HSV infection risk.

          Methods

          The US National Health and Nutrition Examination Survey (NHANES, 2015-2016) database includes the confirmed HSV type 1 and type 2 (HSV-1 and HSV-2, respectively) status of American participants (14-49 years) and a wealth of demographic and health-related data. The questionnaires and data sets from this survey were used to form two data sets: one for HSV-1 and one for HSV-2. These data sets were used to train and test a model that used a random forest algorithm (devised using Python) to minimize the number of anonymous lifestyle-based questions needed to identify risk groups for HSV.

          Results

          The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for the HSV-1 and HSV-2 data sets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender. The model, therefore, provided high predictability of risk of infection with minimal required input.

          Conclusions

          This machine learning algorithm can be used in a real-world evidence registry to collect relevant lifestyle data and identify individuals’ levels of risk of HSV infection. A limitation is the absence of real user data and integration with electronic medical records, which would enable model learning and improvement. Future work will explore model adjustments, anonymization options, explicit permissions, and a standardized data schema that meet the General Data Protection Regulation, Health Insurance Portability and Accountability Act, and third-party interface connectivity requirements.

          Related collections

          Most cited references46

          • Record: found
          • Abstract: not found
          • Article: not found

          Artificial intelligence in healthcare

            • Record: found
            • Abstract: not found
            • Article: not found

            Scikit-learn Machine Learning in Python.

              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              Machine learning applications in cancer prognosis and prediction

              Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.

                Author and article information

                Contributors
                Journal
                JMIRx Med
                JMIRx Med
                JMIRxMed
                JMIRx Med
                JMIR Publications (Toronto, Canada )
                2563-6316
                Apr-Jun 2021
                11 June 2021
                : 2
                : 2
                : e25560
                Affiliations
                [1 ] Skein Ltd London United Kingdom
                [2 ] Department of Informatics King’s College London London United Kingdom
                [3 ] Institute of Biomedical Engineering Department of Engineering Science University of Oxford Oxford United Kingdom
                [4 ] Centre for Health Technology University of Plymouth Plymouth United Kingdom
                [5 ] Nuffield Department of Primary Health Sciences Medical Sciences Division University of Oxford Oxford United Kingdom
                [6 ] Department of Primary Care and Public Health School of Public Health Imperial College London London United Kingdom
                Author notes
                Corresponding Author: Edward Meinert edward.meinert@ 123456plymouth.ac.uk
                Author information
                https://orcid.org/0000-0002-9444-0917
                https://orcid.org/0000-0002-9137-749X
                https://orcid.org/0000-0002-1635-7691
                https://orcid.org/0000-0001-7628-882X
                https://orcid.org/0000-0003-1245-8759
                https://orcid.org/0000-0003-2484-3347
                Article
                v2i2e25560
                10.2196/25560
                10414389
                37725536
                592e5744-5cb4-45f6-bc32-cdc5a0e7d0ca
                ©Svitlana Surodina, Ching Lam, Svetislav Grbich, Madison Milne-Ives, Michelle van Velthoven, Edward Meinert. Originally published in JMIRx Med (https://med.jmirx.org), 11.06.2021.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on https://med.jmirx.org/, as well as this copyright and license information must be included.

                History
                : 6 November 2020
                : 26 November 2020
                : 4 February 2021
                : 12 March 2021
                Categories
                Original Paper
                Original Paper

                data collection,herpes simplex virus,registries,machine learning,risk assessment,artificial intelligence,medical information system,user-centered design,predictor,risk

                Comments

                Comment on this article

                Related Documents Log
                scite_

                Similar content50

                Cited by4

                Most referenced authors853