3
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Assessing ChatGPT’s capacity for clinical decision support in pediatrics: A comparative study with pediatricians using KIDMAP of Rasch analysis

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Background:

          The application of large language models in clinical decision support (CDS) is an area that warrants further investigation. ChatGPT, a prominent large language models developed by OpenAI, has shown promising performance across various domains. However, there is limited research evaluating its use specifically in pediatric clinical decision-making. This study aimed to assess ChatGPT’s potential as a CDS tool in pediatrics by evCDSaluating its performance on 8 common clinical symptom prompts. Study objectives were to answer the 2 research questions: the ChatGPT’s overall grade in a range from A (high) to E (low) compared to a normal sample and the difference in assessment of ChatGPT between 2 pediatricians.

          Methods:

          We compared ChatGPT’s responses to 8 items related to clinical symptoms commonly encountered by pediatricians. Two pediatricians independently assessed the answers provided by ChatGPT in an open-ended format. The scoring system ranged from 0 to 100, which was then transformed into 5 ordinal categories. We simulated 300 virtual students with a normal distribution to provide scores on items based on Rasch rating scale model and their difficulties in a range between −2 to 2.5 logits. Two visual presentations (Wright map and KIDMAP) were generated to answer the 2 research questions outlined in the objectives of the study.

          Results:

          The 2 pediatricians’ assessments indicated that ChatGPT’s overall performance corresponded to a grade of C in a range from A to E, with average scores of −0.89 logits and 0.90 logits (=log odds), respectively. The assessments revealed a significant difference in performance between the 2 pediatricians ( P < .05), with scores of −0.89 (SE = 0.37) and 0.90 (SE = 0.41) in log odds units (logits in Rasch analysis).

          Conclusion:

          This study demonstrates the feasibility of utilizing ChatGPT as a CDS tool for patients presenting with common pediatric symptoms. The findings suggest that ChatGPT has the potential to enhance clinical workflow and aid in responsible clinical decision-making. Further exploration and refinement of ChatGPT’s capabilities in pediatric care can potentially contribute to improved healthcare outcomes and patient management.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: not found

          A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.

          Intraclass correlation coefficient (ICC) is a widely used reliability index in test-retest, intrarater, and interrater reliability analyses. This article introduces the basic concept of ICC in the content of reliability analysis.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

            We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

              Background Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. Methods We used 2 sets of multiple-choice questions to evaluate ChatGPT’s performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT’s performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results Of the 4 data sets, AMBOSS-Step1 , AMBOSS-Step2 , NBME-Free-Step1 , and NBME-Free-Step2 , ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased ( P =.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT’s answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 ( P <.001) and NBME-Free-Step2 ( P =.001) data sets, respectively. Conclusions ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at a greater than 60% threshold on the NBME-Free-Step-1 data set, we show that the model achieves the equivalent of a passing score for a third-year medical student. Additionally, we highlight ChatGPT’s capacity to provide logic and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as an interactive medical education tool to support learning.
                Bookmark

                Author and article information

                Contributors
                Journal
                Medicine (Baltimore)
                MD
                Medicine
                Lippincott Williams & Wilkins (Hagerstown, MD )
                0025-7974
                1536-5964
                23 June 2023
                23 June 2023
                : 102
                : 25
                : e34068
                Affiliations
                [a ] Department of Internal Medicine, Chi Mei Medical Center, Chiali, Taiwan
                [b ] Department of Medical Research, Chi-Mei Medical Center, Tainan, Taiwan
                [c ] The Education University of Hong Kong, Hong Kong, China
                [d ] Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, Tainan, Taiwan
                [e ] Department of Physical Medicine and Rehabilitation, Chung San Medical University Hospital, Taichung, Taiwan
                [f ] Department of Pediatrics, Chi Mei Medical Center, Tainan, Taiwan
                [g ] Department of Pediatrics, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.
                Author notes
                * Correspondence: Julie Chi Chow, Chi-Mei Medical Center, 901 Chung Hwa Road, Yung Kung Dist., Tainan 710, Taiwan (e-mail: jcchow2@ 123456yahoo.com.tw ).
                Author information
                https://orcid.org/0000-0002-1132-9341
                https://orcid.org/0000-0003-3150-4917
                Article
                00035
                10.1097/MD.0000000000034068
                10289633
                37352054
                f27067b5-165c-4ef7-a322-603a867bb38f
                Copyright © 2023 the Author(s). Published by Wolters Kluwer Health, Inc.

                This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial License 4.0 (CCBY-NC), where it is permissible to download, share, remix, transform, and buildup the work provided it is properly cited. The work cannot be used commercially without permission from the journal.

                History
                : 23 February 2023
                : 18 May 2023
                : 1 June 2023
                Categories
                6200
                Research Article
                Systematic Review and Meta-Analysis
                Custom metadata
                TRUE
                T

                artificial intelligence,chatgpt,kidmap,logit,pediatrics,rasch analysis,wright map

                Comments

                Comment on this article