1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: not found

      Evaluating the Diagnostic and Treatment Recommendation Capabilities of GPT-4 Vision in Dermatology

      Preprint
      , ,
      medRxiv

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The integration of artificial intelligence (AI) in dermatology presents a promising frontier for enhancing diagnostic accuracy and treatment planning. However, general purpose AI models require rigorous evaluation before being applied to real-world medical cases.

          Objective

          This project specifically evaluates GPT-4V’s performance in accurately diagnosing and generating treatment plans for common dermatological conditions, comparing its assessment of textual versus image data and its performance with multimodal inputs. Beyond the immediate scope, this study contributes to the broader trajectory of integrating AI in healthcare, highlighting the limitations of these technologies, as well as their potential to enhance efficiency, and education within medical training and practice.

          Methods

          A dataset of 102 images representing nine common dermatological conditions was compiled from open-access websites. Fifty-four images were ultimately selected by two board-certified dermatologists as being representative and typical of the common conditions. Additionally, nine clinical scenarios corresponding to these conditions were developed. GPT-4V’s diagnostic capabilities were assessed in three setups: Image Prompt (image-based), Scenario Prompt (text-based), and Image and Scenario Prompt (combining both modalities). The model’s performance was evaluated based on diagnostic accuracy, differential diagnosis, and treatment recommendations.

          Results

          In the Image Prompt setup, GPT-4V correctly identified the primary diagnosis for 29 of 54 images. The Scenario Prompt setup showed a higher accuracy rate of 89% in identifying the primary diagnosis. The multimodal Image and Scenario Prompt setup also achieved an 89% accuracy rate. However, a notable bias towards textual data over visual data was observed. Treatment recommendations were evaluated by the same two dermatologists, using a modified Entrustment Scale, showing competent but not expert-level performance.

          Conclusion

          GPT-4V demonstrates promising capabilities in dermatological diagnosis and treatment recommendations, particularly in text-based scenarios. However, its performance in image-based diagnosis and integration of multimodal data highlights areas for improvement. The study underscores the potential of AI in augmenting dermatological practice, emphasizing the need for further development, and fine-tuning of such models to ensure their efficacy and reliability in clinical settings.

          Related collections

          Author and article information

          Journal
          medRxiv
          January 26 2024
          Article
          10.1101/2024.01.24.24301743
          359e3444-2107-4cc1-82c6-0102bde57368
          © 2024
          History

          Comments

          Comment on this article