82
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.

          Author summary

          Artificial intelligence (AI) systems hold great promise to improve medical care and health outcomes. As such, it is crucial to ensure that the development of clinical AI is guided by the principles of trust and explainability. Measuring AI medical knowledge in comparison to that of expert human clinicians is a critical first step in evaluating these qualities. To accomplish this, we evaluated the performance of ChatGPT, a language-based AI, on the United States Medical Licensing Exam (USMLE). The USMLE is a set of three standardized tests of expert-level knowledge, which are required for medical licensure in the United States. We found that ChatGPT performed at or near the passing threshold of 60% accuracy. Being the first to achieve this benchmark, this marks a notable milestone in AI maturation. Impressively, ChatGPT was able to achieve this result without specialized input from human trainers. Furthermore, ChatGPT displayed comprehensible reasoning and valid clinical insights, lending increased confidence to trust and explainability. Our study suggests that large language models such as ChatGPT may potentially assist human learners in a medical education setting, as a prelude to future integration into clinical decision-making.

          Related collections

          Most cited references28

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          Rethinking the Inception Architecture for Computer Vision

            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

            Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              A deep learning system for differential diagnosis of skin diseases

              Skin conditions affect 1.9 billion people. Because of a shortage of dermatologists, most cases are seen instead by general practitioners with lower diagnostic accuracy. We present a deep learning system (DLS) to provide a differential diagnosis of skin conditions using 16,114 de-identified cases (photographs and clinical data) from a teledermatology practice serving 17 sites. The DLS distinguishes between 26 common skin conditions, representing 80% of cases seen in primary care, while also providing a secondary prediction covering 419 skin conditions. On 963 validation cases, where a rotating panel of three board-certified dermatologists defined the reference standard, the DLS was non-inferior to six other dermatologists and superior to six primary care physicians (PCPs) and six nurse practitioners (NPs) (top-1 accuracy: 0.66 DLS, 0.63 dermatologists, 0.44 PCPs and 0.40 NPs). These results highlight the potential of the DLS to assist general practitioners in diagnosing skin conditions.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: MethodologyRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: MethodologyRole: Validation
                Role: Data curationRole: MethodologyRole: Project administration
                Role: Data curation
                Role: Data curation
                Role: Data curation
                Role: Investigation
                Role: Data curation
                Role: Data curationRole: Formal analysisRole: MethodologyRole: SoftwareRole: ValidationRole: Visualization
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SupervisionRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLOS Digit Health
                PLOS Digit Health
                plos
                PLOS Digital Health
                Public Library of Science (San Francisco, CA USA )
                2767-3170
                9 February 2023
                February 2023
                : 2
                : 2
                : e0000198
                Affiliations
                [1 ] AnsibleHealth, Inc Mountain View, California, United States of America
                [2 ] Department of Anesthesiology, Massachusetts General Hospital, Harvard School of Medicine Boston, Massachusetts, United States of America
                [3 ] Warren Alpert Medical School; Brown University Providence, Rhode Island, United States of America
                [4 ] Department of Medical Education, UWorld, LLC Dallas, Texas, United States of America
                Beth Israel Deaconess Medical Center, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                https://orcid.org/0000-0003-0211-512X
                Article
                PDIG-D-22-00371
                10.1371/journal.pdig.0000198
                9931230
                36812645
                959168ca-d8bf-4c9e-984d-0c3e804791ee
                © 2023 Kung et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 19 December 2022
                : 23 January 2023
                Page count
                Figures: 3, Tables: 0, Pages: 12
                Funding
                The authors received no specific funding for this work.
                Categories
                Research Article
                Computer and Information Sciences
                Artificial Intelligence
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Learning
                Human Learning
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Learning
                Human Learning
                Social Sciences
                Psychology
                Cognitive Psychology
                Learning
                Human Learning
                Biology and Life Sciences
                Neuroscience
                Learning and Memory
                Learning
                Human Learning
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Language
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Language
                Social Sciences
                Psychology
                Cognitive Psychology
                Language
                Medicine and Health Sciences
                Health Care
                Health Care Providers
                Physicians
                People and Places
                Population Groupings
                Professions
                Medical Personnel
                Physicians
                Social Sciences
                Sociology
                Education
                Medical Education
                Medicine and Health Sciences
                Medical Humanities
                Medical Education
                Social Sciences
                Linguistics
                Language Acquisition
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Reasoning
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Reasoning
                Social Sciences
                Psychology
                Cognitive Psychology
                Reasoning
                Computer and Information Sciences
                Software Engineering
                Programming Languages
                Engineering and Technology
                Software Engineering
                Programming Languages
                Custom metadata
                The data analyzed in this study were obtained from USMLE sample questions sets which are publicly available. We have made the question indices, raw inputs, and raw AI outputs, and special annotations available in S1 Data.

                Comments

                Comment on this article