Assessing ChatGPT’s Ability to Reply to Queries Regarding Colon Cancer Screening Based on Multisociety Guidelines

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

ChatGPT™ is a chatbot defined Artificial Intelligence program launched by San Francisco–based OpenAI on November 30th, 2022, with the ability to hold human-like conversations. 1 Although literature commenting on ChatGPT ™’s abilities has grown over the past months, individual studies assessing its utility in clinical care, research and teaching in the field of Gastroenterology (GI) has been scarce with only 2 reported studies. 2,3 Our study assesses ChatGPT™’s ability to answer queries regarding appropriate colonoscopy intervals for colon cancer screening compared to currently applicable guidelines. Utilizing the American Gastroenterological Association (AGA) ’s recommendations for follow-up after colonoscopy and polypectomy, 4,5 12 questions were developed to query ChatGPT ™ (Table). The queries were entered into ChatGPT ™ by the author (SM) with the responses being separately documented (Appendix 1). Each of the 12 query-response pairs underwent adjudication by 4 senior GI fellows (CD, AP, NF, IU) who graded the responses on a semi-qualitative scale over a set of 5 options ranging from “addresses the query and is factually entirely correct” to “does not address the query and is factually incorrect”. A field to comment on the potential usefulness to patients was provided. Adjudicators were provided a copy of the AGA guideline as base truth to aid assessment of responses. All 4 adjudicators were blinded regarding the source of the responses to reduce potential bias. All adjudicators were informed that the responses were generated by ChatGPT ™ after conclusion of the study. The study did not meet criteria for institutional review board submission given the absence of human subjects. Three of 4 (75%) adjudicators felt that ChatGPT™’s response to Q1 (What is the risk developing a colon cancer leading to death after a clear colonoscopy?) addressed the query and was factually correct. One of 4 stated it was inaccurate in its reporting of colon cancer incidence as a percentage (as opposed to a hazard ratio). Three of 4 felt the answers would be usable by patients. Only 50% (2/4) of the adjudicators felt that ChatGPT™’s response to Q2 (When should colon screening be repeated in a patient with a quality colonoscopy?) addressed the query and was factually correct. 100% agreed that the answer would be usable by patients. ChatGPT™ had suggested starting colon cancer screening at 50, with repeat colonoscopies every 10 years. While it was accurate regarding the time interval for repeat colonoscopy, it was inaccurate regarding the age to initiate screening (45 for average risk). Similarly, when assessing ChatGPT™ ’s response to Q3 (Repeat colonoscopy for patients who had 1–2 small tubular adenomas <10 mm in size that have been completely resected at a high-quality examination?), 75% (3/4) felt that the queries would be usable by patients and 75% (3/4) agreed that while it did address the query, it contained both correct and incorrect responses. ChatGPT™’s response was that the interval was to be “5–10 years” (instead of 7–10 years). Kappa for interrater reliability was 0.189 for all 12 questions, 0.248 for the first 3 questions and 0.704 when assessing patient usability. Analysis was performed using RStudio. 6 A summary of all queries is presented in Table. Critical observations of all responses are presented in Appendix 1. None of the responses completely inaccurate as none were found by all adjudicators to be completely wrong. ChatGPT™ was also able to identify rare genetic syndromes in Q11–12. ChatGPT™’s introduction has generated widespread interest the academic community. Its ability to draft entire essays and even pass the United States Medical Licensing Exam has led to debates about the ethics of its use. 1,7 One area which continues to generate discussion is the question of its authorship on publications. 8,9 Scholarly societies, such as World Association of Medical Editors state that chatbots cannot be authors as they do not create new knowledge. 10 Its capabilities in GI education and research remains relatively unexplored, with only 2 studies describing early experience. 2,3 Lahat et al, 2 assessed its ability to identify questions related to GI research and concluded that while it was able to frame questions, they were not considered novel. Yeo et al 3 assessed its ability to answer questions on the management of liver cirrhosis and hepatocellular carcinoma where it performed favorably. The purpose of our study was 2-fold: First, can ChatGPT™ accurately answer queries regarding colonoscopy intervals as held to the standard of currently active guidelines? Second, could it be a tool in patient selfeducation? Regarding the former, its ability to respond to simple and direct questions (Questions 1–3) was greater in straightforward queries when compared to the more nuanced questions. Regarding the latter, while there was no patient data used in this project, adjudicator assessments suggest it may be a useful patient tool for background information to inform discussion with treating physicians. It is not felt to be useful for self-directed care due to potential imprecision. The study has several strengths: we assessed the accuracy of ChatGPT™’s responses against a standard of care guideline and found that ChatGPT™’s ability to provide accurate responses diminishes with more complex medical queries. Additionally, our findings highlight a potential role for ChatGPT™ as an adjunct tool for patient education on the utility and timing of follow-up colonoscopy but should not replace information received from a licensed medical provider. Regarding its limitations: First, human adjudication is prone to error and the small number of adjudicators and verbosity of ChatGPT™ responses have resulted in variability in adjudication, as reflected in the weak kappa statistic. Second, suitability of ChatGPT™’s responses for patient education was determined by the adjudicators as opposed to patients. Third, ChatGPT™’s training data is current through September 2021 which may have contributed to ChatGPT™’s inaccuracy. In conclusion, we assessed ChatGPT™’s ability to answer queries regarding appropriate colonoscopy intervals for colon cancer screening and surveillance. Although in its current iteration it under-delivers, it does appear to be a potential source of background information for patient selfeducation. As global interest in ChatGPT™ continues to increase and the technology iterates, we expect that future renditions will be able address nuanced queries with increased precision, serving as a readily available resource for GI education. Supplementary Material 1

Related collections

Most cited references 3

Record: found
Abstract: not found
Book: not found

RStudio: integrated development for R

(2020)

0 comments Cited 170 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Article: not found

Is Open Access

ChatGPT: friend or foe?

The Lancet Digital Health (2023)

0 comments Cited 89 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Chatbots, ChatGPT, and Scholarly Manuscripts: WAME Recommendations on ChatGPT and Chatbots in relation to scholarly publications

CHRIS ZIELINSKI, MARGARET WINKER, RAKESH AGGARWAL … (2023)

0 comments Cited 5 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-journal-id): 9918350485906676

Journal ID (pubmed-jr-id): 51397

Journal ID (nlm-ta): Gastro Hep Adv

Journal ID (iso-abbrev): Gastro Hep Adv

Title: Gastro hep advances

ISSN (Electronic): 2772-5723

Publication date Nihms-submitted: 4 November 2023

Publication date (Print): 2023

Publication date (Electronic): 20 July 2023

Publication date PMC-release: 15 November 2023

Volume: 2

Issue: 8

Pages: 1040-1043

Affiliations

[1 ]Gastroenterology Division, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania

[2 ]Department of Medicine, Center for Endoscopic Innovation Research and Training, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania

Author notes

Correspondence: Address correspondence to: Samiran Mukherjee, MD, Division of Gastroenterology, Perelman Center for Advanced Medicine South Pavilion, 4th Floor 3400 Civic Center Boulevard, Philadelphia, Pennsylvania 19104. samiranmukherjee93@ 123456gmail.com .

Article

Manuscript ID: NIHMS1941610

DOI: 10.1016/j.gastha.2023.07.008

PMC ID: 10653253

PubMed ID: 37974564

SO-VID: f2061002-a9dd-40e5-954a-c7fea3ff18ae

License:

This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/).

Assessing ChatGPT’s Ability to Reply to Queries Regarding Colon Cancer Screening Based on Multisociety Guidelines

Read this article at

Abstract

Related collections

Recursive Rule based Visual Categorization

Most cited references 3

RStudio: integrated development for R

ChatGPT: friend or foe?

Chatbots, ChatGPT, and Scholarly Manuscripts: WAME Recommendations on ChatGPT and Chatbots in relation to scholarly publications

Author and article information

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 3,018

Most referenced authors 56