65
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers

      Read this article at

          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Objective To evaluate the design characteristics of studies that evaluated the performance of artificial intelligence (AI) algorithms for the diagnostic analysis of medical images. Materials and Methods PubMed MEDLINE and Embase databases were searched to identify original research articles published between January 1, 2018 and August 17, 2018 that investigated the performance of AI algorithms that analyze medical images to provide diagnostic decisions. Eligible articles were evaluated to determine 1) whether the study used external validation rather than internal validation, and in case of external validation, whether the data for validation were collected, 2) with diagnostic cohort design instead of diagnostic case-control design, 3) from multiple institutions, and 4) in a prospective manner. These are fundamental methodologic features recommended for clinical validation of AI performance in real-world practice. The studies that fulfilled the above criteria were identified. We classified the publishing journals into medical vs. non-medical journal groups. Then, the results were compared between medical and non-medical journals. Results Of 516 eligible published studies, only 6% (31 studies) performed external validation. None of the 31 studies adopted all three design features: diagnostic cohort design, the inclusion of multiple institutions, and prospective data collection for external validation. No significant difference was found between medical and non-medical journals. Conclusion Nearly all of the studies published in the study period that evaluated the performance of AI algorithms for diagnostic analysis of medical images were designed as proof-of-concept technical feasibility studies and did not have the design features that are recommended for robust validation of the real-world clinical performance of AI algorithms.

          Related collections

          Most cited references33

          • Record: found
          • Abstract: found
          • Article: not found

          The Lancet Commission on global mental health and sustainable development

          The Lancet, 392(10157), 1553-1598
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Convolutional neural networks: an overview and application in radiology

            Abstract Convolutional neural network (CNN), a class of artificial neural networks that has become dominant in various computer vision tasks, is attracting interest across a variety of domains, including radiology. CNN is designed to automatically and adaptively learn spatial hierarchies of features through backpropagation by using multiple building blocks, such as convolution layers, pooling layers, and fully connected layers. This review article offers a perspective on the basic concepts of CNN and its application to various radiological tasks, and discusses its challenges and future directions in the field of radiology. Two challenges in applying CNN to radiological tasks, small dataset and overfitting, will also be covered in this article, as well as techniques to minimize them. Being familiar with the concepts and advantages, as well as limitations, of CNN is essential to leverage its potential in diagnostic radiology, with the goal of augmenting the performance of radiologists and improving patient care. Key Points • Convolutional neural network is a class of deep learning methods which has become dominant in various computer vision tasks and is attracting interest across a variety of domains, including radiology. • Convolutional neural network is composed of multiple building blocks, such as convolution layers, pooling layers, and fully connected layers, and is designed to automatically and adaptively learn spatial hierarchies of features through a backpropagation algorithm. • Familiarity with the concepts and advantages, as well as limitations, of convolutional neural network is essential to leverage its potential to improve radiologist performance and, eventually, patient care.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes

              Question How does a deep learning system (DLS) using artificial intelligence compare with professional human graders in identifying diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes? Findings In the primary validation dataset (71 896 images; 14 880 patients), the DLS had a sensitivity of 90.5% and specificity of 91.6% for detecting referable diabetic retinopathy; 100% sensitivity and 91.1% specificity for vision-threatening diabetic retinopathy; 96.4% sensitivity and 87.2% specificity for possible glaucoma; and 93.2% sensitivity and 88.7% specificity for age-related macular degeneration, compared with professional graders. Meaning The DLS had high sensitivity and specificity for identifying diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. Importance A deep learning system (DLS) is a machine learning technology with potential for screening diabetic retinopathy and related eye diseases. Objective To evaluate the performance of a DLS in detecting referable diabetic retinopathy, vision-threatening diabetic retinopathy, possible glaucoma, and age-related macular degeneration (AMD) in community and clinic-based multiethnic populations with diabetes. Design, Setting, and Participants Diagnostic performance of a DLS for diabetic retinopathy and related eye diseases was evaluated using 494 661 retinal images. A DLS was trained for detecting diabetic retinopathy (using 76 370 images), possible glaucoma (125 189 images), and AMD (72 610 images), and performance of DLS was evaluated for detecting diabetic retinopathy (using 112 648 images), possible glaucoma (71 896 images), and AMD (35 948 images). Training of the DLS was completed in May 2016, and validation of the DLS was completed in May 2017 for detection of referable diabetic retinopathy (moderate nonproliferative diabetic retinopathy or worse) and vision-threatening diabetic retinopathy (severe nonproliferative diabetic retinopathy or worse) using a primary validation data set in the Singapore National Diabetic Retinopathy Screening Program and 10 multiethnic cohorts with diabetes. Exposures Use of a deep learning system. Main Outcomes and Measures Area under the receiver operating characteristic curve (AUC) and sensitivity and specificity of the DLS with professional graders (retinal specialists, general ophthalmologists, trained graders, or optometrists) as the reference standard. Results In the primary validation dataset (n = 14 880 patients; 71 896 images; mean [SD] age, 60.2 [2.2] years; 54.6% men), the prevalence of referable diabetic retinopathy was 3.0%; vision-threatening diabetic retinopathy, 0.6%; possible glaucoma, 0.1%; and AMD, 2.5%. The AUC of the DLS for referable diabetic retinopathy was 0.936 (95% CI, 0.925-0.943), sensitivity was 90.5% (95% CI, 87.3%-93.0%), and specificity was 91.6% (95% CI, 91.0%-92.2%). For vision-threatening diabetic retinopathy, AUC was 0.958 (95% CI, 0.956-0.961), sensitivity was 100% (95% CI, 94.1%-100.0%), and specificity was 91.1% (95% CI, 90.7%-91.4%). For possible glaucoma, AUC was 0.942 (95% CI, 0.929-0.954), sensitivity was 96.4% (95% CI, 81.7%-99.9%), and specificity was 87.2% (95% CI, 86.8%-87.5%). For AMD, AUC was 0.931 (95% CI, 0.928-0.935), sensitivity was 93.2% (95% CI, 91.1%-99.8%), and specificity was 88.7% (95% CI, 88.3%-89.0%). For referable diabetic retinopathy in the 10 additional datasets, AUC range was 0.889 to 0.983 (n = 40 752 images). Conclusions and Relevance In this evaluation of retinal images from multiethnic cohorts of patients with diabetes, the DLS had high sensitivity and specificity for identifying diabetic retinopathy and related eye diseases. Further research is necessary to evaluate the applicability of the DLS in health care settings and the utility of the DLS to improve vision outcomes. This diagnostic accuracy study compares the performance of deep learning systems vs eye professionals for detecting referable and vision-threatening diabetic retinopathy, glaucoma, and other eye diseases in retinal images from Chinese, Indian, and Malaysian patients.
                Bookmark

                Author and article information

                Contributors
                Journal
                Korean Journal of Radiology
                Korean J Radiol
                The Korean Society of Radiology
                1229-6929
                2005-8330
                2019
                2019
                : 20
                : 3
                : 405
                Affiliations
                [1 ]Department of Radiology, Taean-gun Health Center and County Hospital, Taean-gun, Korea.
                [2 ]Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea.
                Article
                10.3348/kjr.2019.0025
                bef0ca95-c935-4638-81f9-c9c0b2496834
                © 2019

                http://creativecommons.org/licenses/by-nc/4.0/

                History

                Comments

                Comment on this article