Open access image repositories: high-quality data to enable machine learning research

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Originally motivated by the need for research reproducibility and data reuse, large-scale, open access information repositories have become key resources for training and testing of advanced machine learning applications in biomedical and clinical research. To be of value, such repositories must provide large, high-quality data sets, where quality is defined as minimising variance due to data collection protocols and data misrepresentations. Curation is the key to quality. We have constructed a large public access image repository, The Cancer Imaging Archive, dedicated to the promotion of open science to advance the global effort to diagnose and treat cancer. Drawing on this experience and our experience in applying machine learning techniques to the analysis of radiology and pathology image data, we will review the requirements placed on such information repositories by state-of-the-art machine learning applications and how these requirements can be met.

Related collections

Author and article information

Journal

Title: Clinical Radiology

Abbreviated Title: Clinical Radiology

Publisher: Elsevier BV

ISSN (Print): 00099260

Publication date Created: April 2019

Publication date (Print): April 2019

Article

DOI: 10.1016/j.crad.2019.04.002

SO-VID: 58a17d23-340c-436f-a02a-14bd6701a6ae

License:

https://www.elsevier.com/tdm/userlicense/1.0/

History

Data availability:

Comments

Comment on this article

scite_

Cited by 18

See all cited by

- Version 1
- Version 1