Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Assessing ChatGPT’s Ability to Reply to Queries Regarding Colon Cancer Screening Based on Multisociety Guidelines

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          ChatGPT™ is a chatbot defined Artificial Intelligence program launched by San Francisco–based OpenAI on November 30th, 2022, with the ability to hold human-like conversations. 1 Although literature commenting on ChatGPT ™’s abilities has grown over the past months, individual studies assessing its utility in clinical care, research and teaching in the field of Gastroenterology (GI) has been scarce with only 2 reported studies. 2,3 Our study assesses ChatGPT™’s ability to answer queries regarding appropriate colonoscopy intervals for colon cancer screening compared to currently applicable guidelines. Utilizing the American Gastroenterological Association (AGA) ’s recommendations for follow-up after colonoscopy and polypectomy, 4,5 12 questions were developed to query ChatGPT ™ (Table). The queries were entered into ChatGPT ™ by the author (SM) with the responses being separately documented (Appendix 1). Each of the 12 query-response pairs underwent adjudication by 4 senior GI fellows (CD, AP, NF, IU) who graded the responses on a semi-qualitative scale over a set of 5 options ranging from “addresses the query and is factually entirely correct” to “does not address the query and is factually incorrect”. A field to comment on the potential usefulness to patients was provided. Adjudicators were provided a copy of the AGA guideline as base truth to aid assessment of responses. All 4 adjudicators were blinded regarding the source of the responses to reduce potential bias. All adjudicators were informed that the responses were generated by ChatGPT ™ after conclusion of the study. The study did not meet criteria for institutional review board submission given the absence of human subjects. Three of 4 (75%) adjudicators felt that ChatGPT™’s response to Q1 (What is the risk developing a colon cancer leading to death after a clear colonoscopy?) addressed the query and was factually correct. One of 4 stated it was inaccurate in its reporting of colon cancer incidence as a percentage (as opposed to a hazard ratio). Three of 4 felt the answers would be usable by patients. Only 50% (2/4) of the adjudicators felt that ChatGPT™’s response to Q2 (When should colon screening be repeated in a patient with a quality colonoscopy?) addressed the query and was factually correct. 100% agreed that the answer would be usable by patients. ChatGPT™ had suggested starting colon cancer screening at 50, with repeat colonoscopies every 10 years. While it was accurate regarding the time interval for repeat colonoscopy, it was inaccurate regarding the age to initiate screening (45 for average risk). Similarly, when assessing ChatGPT™ ’s response to Q3 (Repeat colonoscopy for patients who had 1–2 small tubular adenomas <10 mm in size that have been completely resected at a high-quality examination?), 75% (3/4) felt that the queries would be usable by patients and 75% (3/4) agreed that while it did address the query, it contained both correct and incorrect responses. ChatGPT™’s response was that the interval was to be “5–10 years” (instead of 7–10 years). Kappa for interrater reliability was 0.189 for all 12 questions, 0.248 for the first 3 questions and 0.704 when assessing patient usability. Analysis was performed using RStudio. 6 A summary of all queries is presented in Table. Critical observations of all responses are presented in Appendix 1. None of the responses completely inaccurate as none were found by all adjudicators to be completely wrong. ChatGPT™ was also able to identify rare genetic syndromes in Q11–12. ChatGPT™’s introduction has generated widespread interest the academic community. Its ability to draft entire essays and even pass the United States Medical Licensing Exam has led to debates about the ethics of its use. 1,7 One area which continues to generate discussion is the question of its authorship on publications. 8,9 Scholarly societies, such as World Association of Medical Editors state that chatbots cannot be authors as they do not create new knowledge. 10 Its capabilities in GI education and research remains relatively unexplored, with only 2 studies describing early experience. 2,3 Lahat et al, 2 assessed its ability to identify questions related to GI research and concluded that while it was able to frame questions, they were not considered novel. Yeo et al 3 assessed its ability to answer questions on the management of liver cirrhosis and hepatocellular carcinoma where it performed favorably. The purpose of our study was 2-fold: First, can ChatGPT™ accurately answer queries regarding colonoscopy intervals as held to the standard of currently active guidelines? Second, could it be a tool in patient selfeducation? Regarding the former, its ability to respond to simple and direct questions (Questions 1–3) was greater in straightforward queries when compared to the more nuanced questions. Regarding the latter, while there was no patient data used in this project, adjudicator assessments suggest it may be a useful patient tool for background information to inform discussion with treating physicians. It is not felt to be useful for self-directed care due to potential imprecision. The study has several strengths: we assessed the accuracy of ChatGPT™’s responses against a standard of care guideline and found that ChatGPT™’s ability to provide accurate responses diminishes with more complex medical queries. Additionally, our findings highlight a potential role for ChatGPT™ as an adjunct tool for patient education on the utility and timing of follow-up colonoscopy but should not replace information received from a licensed medical provider. Regarding its limitations: First, human adjudication is prone to error and the small number of adjudicators and verbosity of ChatGPT™ responses have resulted in variability in adjudication, as reflected in the weak kappa statistic. Second, suitability of ChatGPT™’s responses for patient education was determined by the adjudicators as opposed to patients. Third, ChatGPT™’s training data is current through September 2021 which may have contributed to ChatGPT™’s inaccuracy. In conclusion, we assessed ChatGPT™’s ability to answer queries regarding appropriate colonoscopy intervals for colon cancer screening and surveillance. Although in its current iteration it under-delivers, it does appear to be a potential source of background information for patient selfeducation. As global interest in ChatGPT™ continues to increase and the technology iterates, we expect that future renditions will be able address nuanced queries with increased precision, serving as a readily available resource for GI education. Supplementary Material 1

          Related collections

          Most cited references3

          • Record: found
          • Abstract: not found
          • Book: not found

          RStudio: integrated development for R

          (2020)
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found
            Is Open Access

            ChatGPT: friend or foe?

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Chatbots, ChatGPT, and Scholarly Manuscripts: WAME Recommendations on ChatGPT and Chatbots in relation to scholarly publications

                Bookmark

                Author and article information

                Journal
                9918350485906676
                51397
                Gastro Hep Adv
                Gastro Hep Adv
                Gastro hep advances
                2772-5723
                4 November 2023
                2023
                20 July 2023
                15 November 2023
                : 2
                : 8
                : 1040-1043
                Affiliations
                [1 ]Gastroenterology Division, Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
                [2 ]Department of Medicine, Center for Endoscopic Innovation Research and Training, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
                Author notes
                Correspondence: Address correspondence to: Samiran Mukherjee, MD, Division of Gastroenterology, Perelman Center for Advanced Medicine South Pavilion, 4th Floor 3400 Civic Center Boulevard, Philadelphia, Pennsylvania 19104. samiranmukherjee93@ 123456gmail.com .
                Article
                NIHMS1941610
                10.1016/j.gastha.2023.07.008
                10653253
                37974564
                f2061002-a9dd-40e5-954a-c7fea3ff18ae

                This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/).

                History
                Categories
                Article

                Comments

                Comment on this article