28
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training

      research-article

      Read this article at

      ScienceOpenPublisherPMC
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The emergence of large language models (LLM) with remarkable performance such as ChatGPT and GPT-4, has led to an unprecedented uptake in the population. One of their most promising and studied applications concerns education due to their ability to understand and generate human-like text, creating a multitude of opportunities for enhancing educational practices and outcomes. The objective of this study is twofold: to assess the accuracy of ChatGPT/GPT-4 in answering rheumatology questions from the access exam to specialized medical training in Spain (MIR), and to evaluate the medical reasoning followed by these LLM to answer those questions. A dataset, RheumaMIR, of 145 rheumatology-related questions, extracted from the exams held between 2010 and 2023, was created for that purpose, used as a prompt for the LLM, and was publicly distributed. Six rheumatologists with clinical and teaching experience evaluated the clinical reasoning of the chatbots using a 5-point Likert scale and their degree of agreement was analyzed. The association between variables that could influence the models’ accuracy (i.e., year of the exam question, disease addressed, type of question and genre) was studied. ChatGPT demonstrated a high level of performance in both accuracy, 66.43%, and clinical reasoning, median (Q1–Q3), 4.5 (2.33–4.67). However, GPT-4 showed better performance with an accuracy score of 93.71% and a median clinical reasoning value of 4.67 (4.5–4.83). These findings suggest that LLM may serve as valuable tools in rheumatology education, aiding in exam preparation and supplementing traditional teaching methods.

          Related collections

          Most cited references49

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

          We evaluated the performance of a large language model called ChatGPT on the United States Medical Licensing Exam (USMLE), which consists of three exams: Step 1, Step 2CK, and Step 3. ChatGPT performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations. These results suggest that large language models may have the potential to assist with medical education, and potentially, clinical decision-making.
            • Record: found
            • Abstract: found
            • Article: not found

            Computing inter-rater reliability and its variance in the presence of high agreement.

            Pi (pi) and kappa (kappa) statistics are widely used in the areas of psychiatry and psychological testing to compute the extent of agreement between raters on nominally scaled data. It is a fact that these coefficients occasionally yield unexpected results in situations known as the paradoxes of kappa. This paper explores the origin of these limitations, and introduces an alternative and more stable agreement coefficient referred to as the AC1 coefficient. Also proposed are new variance estimators for the multiple-rater generalized pi and AC1 statistics, whose validity does not depend upon the hypothesis of independence between raters. This is an improvement over existing alternative variances, which depend on the independence assumption. A Monte-Carlo simulation study demonstrates the validity of these variance estimators for confidence interval construction, and confirms the value of AC1 as an improved alternative to existing inter-rater reliability statistics.
              • Record: found
              • Abstract: not found
              • Article: not found
              Is Open Access

              ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope

                Author and article information

                Contributors
                alfredo.madrid@salud.madrid.org
                Journal
                Sci Rep
                Sci Rep
                Scientific Reports
                Nature Publishing Group UK (London )
                2045-2322
                13 December 2023
                13 December 2023
                2023
                : 13
                : 22129
                Affiliations
                [1 ]GRID grid.414780.e, Grupo de Patología Musculoesquelética, Hospital Clínico San Carlos, , Instituto de Investigación Sanitaria del Hospital Clínico San Carlos (IdISSC), ; Prof. Martin Lagos S/N, 28040 Madrid, Spain
                [2 ]GRID grid.81821.32, ISNI 0000 0000 8970 9163, Reumatología, , Hospital Universitario La Paz-IdiPaz, ; Paseo de La Castellana, 261, 28046 Madrid, Spain
                [3 ]Medicina Interna, Hospital Universitario del Henares, ( https://ror.org/047ev4v84) Avenida de Marie Curie, 0, 28822 Madrid, Spain
                [4 ]Facultad de Medicina, Universidad Francisco de Vitoria, ( https://ror.org/03ha64j07) Carretera Pozuelo, Km 1800, 28223 Madrid, Spain
                [5 ]Facultad de Medicina, Universidad Complutense de Madrid, ( https://ror.org/02p0gd045) Madrid, Spain
                Author information
                http://orcid.org/0000-0002-1591-0467
                http://orcid.org/0000-0002-4244-3139
                http://orcid.org/0000-0002-0966-2778
                http://orcid.org/0000-0002-2098-4313
                http://orcid.org/0000-0002-4145-7395
                http://orcid.org/0000-0003-3503-9047
                http://orcid.org/0000-0001-7142-0545
                http://orcid.org/0000-0002-6126-8786
                http://orcid.org/0000-0002-2869-7861
                Article
                49483
                10.1038/s41598-023-49483-6
                10719375
                38092821
                32343764-8d41-4685-8bd3-bf0fdac37ac8
                © The Author(s) 2023

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 16 October 2023
                : 8 December 2023
                Funding
                Funded by: Instituto de Salud Carlos III, Ministry of Health, Madrid, Spain
                Award ID: RD21/002/0001
                Award Recipient :
                Categories
                Article
                Custom metadata
                © Springer Nature Limited 2023

                Uncategorized
                rheumatology,engineering
                Uncategorized
                rheumatology, engineering

                Comments

                Comment on this article

                Related Documents Log