52
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

      data-paper

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 (“Human Against Machine with 10000 training images”) dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: not found

          Classification of the Clinical Images for Benign and Malignant Cutaneous Tumors Using a Deep Learning Algorithm

          We tested the use of a deep learning algorithm to classify the clinical images of 12 skin diseases-basal cell carcinoma, squamous cell carcinoma, intraepithelial carcinoma, actinic keratosis, seborrheic keratosis, malignant melanoma, melanocytic nevus, lentigo, pyogenic granuloma, hemangioma, dermatofibroma, and wart. The convolutional neural network (Microsoft ResNet-152 model; Microsoft Research Asia, Beijing, China) was fine-tuned with images from the training portion of the Asan dataset, MED-NODE dataset, and atlas site images (19,398 images in total). The trained model was validated with the testing portion of the Asan, Hallym and Edinburgh datasets. With the Asan dataset, the area under the curve for the diagnosis of basal cell carcinoma, squamous cell carcinoma, intraepithelial carcinoma, and melanoma was 0.96 ± 0.01, 0.83 ± 0.01, 0.82 ± 0.02, and 0.96 ± 0.00, respectively. With the Edinburgh dataset, the area under the curve for the corresponding diseases was 0.90 ± 0.01, 0.91 ± 0.01, 0.83 ± 0.01, and 0.88 ± 0.01, respectively. With the Hallym dataset, the sensitivity for basal cell carcinoma diagnosis was 87.1% ± 6.0%. The tested algorithm performance with 480 Asan and Edinburgh images was comparable to that of 16 dermatologists. To improve the performance of convolutional neural network, additional images with a broader range of ages and ethnicities should be collected.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Edge-Based Color Constancy

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Diagnostic accuracy of dermatoscopy for melanocytic and nonmelanocytic pigmented lesions.

              It is unknown whether dermatoscopy improves the diagnostic accuracy for all types of pigmented skin lesions or only for those that are melanocytic. We sought to assess if the addition of dermatoscopy to clinical examination with the unaided eye improves diagnostic accuracy for all types of pigmented lesions. We analyzed 463 consecutively excised pigmented skin lesions collected during a period of 30 months in a primary care skin cancer practice in Queensland, Australia. Of 463 lesions, 217 (46.9%) were nonmelanocytic. Overall 30% (n = 138) were malignant including 29 melanomas, 72 basal cell carcinomas, and 37 squamous cell carcinomas. The diagnostic accuracy for malignant neoplasms measured as area under receiver operating characteristic curves was 0.89 with dermatoscopy and 0.83 without it (P < .001). Given a fixed specificity of 80%, the corresponding sensitivity was 82.6% with dermatoscopy and 70.5% without it. The improvement achieved by dermatoscopy was higher for nonmelanocytic lesions than for melanocytic lesions. A short algorithm based on pattern analysis reached a sensitivity of 98.6% for basal cell carcinoma, 86.5% for pigmented squamous cell carcinoma, and 79.3% for melanoma. Among benign conditions, the highest false-positive rate (90.5%) was observed for lichen planus-like keratosis. Estimates of diagnostic accuracy are influenced by verification bias. Dermatoscopy improves the diagnostic accuracy for nonmelanocytic lesions. A simple algorithm based on pattern analysis is suitable for the detection of melanoma and nonmelanoma skin cancer. Copyright © 2010 American Academy of Dermatology, Inc. Published by Mosby, Inc. All rights reserved.
                Bookmark

                Author and article information

                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group
                2052-4463
                14 August 2018
                2018
                : 5
                : 180161
                Affiliations
                [1 ]ViDIR Group, Department of Dermatology, Medical University of Vienna , Vienna 1090, Austria
                [2 ]Faculty of Medicine, University of Queensland , Herston 4006, Austria
                Author notes
                []

                P.T. wrote the Data Descriptor, and performed data handling, image processing and analysis, expert annotation and quality review. C.R. wrote the Data Descriptor, and collected all cases from the Rosendahl-Series. H.K. wrote the Data Descriptor, and performed image annotation, extraction of MoleMax Series, and quality review of all images.

                Author information
                http://orcid.org/0000-0002-0051-8016
                Article
                sdata2018161
                10.1038/sdata.2018.161
                6091241
                30106392
                2b866a29-ad58-427a-93da-147411a9b195
                Copyright © 2018, The Author(s)

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.

                History
                : 25 April 2018
                : 26 June 2018
                Categories
                Data Descriptor

                cancer screening,squamous cell carcinoma,melanoma,basal cell carcinoma,cancer imaging

                Comments

                Comment on this article