On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.

eLife digest

Research in biology generates many image datasets, mostly from microscopy. These images have to be analyzed, and much of this analysis relies on a human expert looking at the images and manually annotating features. Image datasets are often large, and human annotation can be subjective, so automating image analysis is highly desirable. This is where machine learning algorithms, such as deep learning, have proven to be useful. In order for deep learning algorithms to work first they have to be ‘trained’. Deep learning algorithms are trained by being given a training dataset that has been annotated by human experts. The algorithms extract the relevant features to look out for from this training dataset and can then look for these features in other image data.

However, it is also worth noting that because these models try to mimic the annotation behavior presented to them during training as well as possible, they can sometimes also mimic an expert’s subjectivity when annotating data. Segebarth, Griebel et al. asked whether this was the case, whether it had an impact on the outcome of the image data analysis, and whether it was possible to avoid this problem when using deep learning for imaging dataset analysis.

For this research, Segebarth, Griebel et al. used microscopy images of mouse brain sections, where a protein called cFOS had been labeled with a fluorescent tag. This protein typically controls the rate at which DNA information is copied into RNA, leading to the production of proteins. Its activity can be influenced experimentally by testing the behaviors of mice. Thus, this experimental manipulation can be used to evaluate the results of deep learning-based image analyses.

First, the fluorescent images were interpreted manually by a group of human experts. Then, their results were used to train a large variety of deep learning models. Models were trained either on the results of an individual expert or on the results pooled from all experts to come up with a consensus model, a deep learning model that learned from the personal annotation preferences of all experts. This made it possible to test whether training a model on multiple experts reduces the risk of subjectivity. As the training of deep learning models is random, Segebarth, Griebel et al. also tested whether combining the predictions from multiple models in a so-called model ensemble improves the consistency of the analyses. For evaluation, the annotations of the deep learning models were compared to those of the human experts, to ensure that the results were not influenced by the subjective behavior of one person. The results of all bioimage annotations were finally compared to the experimental results from analyzing the mice’s behaviors in order to check whether the models were able to find the behavioral effect on cFOS.

Segebarth, Griebel et al. concluded that combining the expert knowledge of multiple experts reduces the subjectivity of bioimage annotation by deep learning algorithms. Combining such consensus information in a group of deep learning models improves the quality of bioimage analysis, so that the results are reliable, transparent and less subjective.

Related collections

Most cited references 75

Record: found
Abstract: found
Article: not found

NIH Image to ImageJ: 25 years of image analysis

Kevin W Eliceiri, Caroline A Schneider, Wayne S Rasband (2018)

For the past twenty five years the NIH family of imaging software, NIH Image and ImageJ have been pioneers as open tools for scientific image analysis. We discuss the origins, challenges and solutions of these two programs, and how their history can serve to advise and inform other software projects.

0 comments Cited 8519 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Deep learning.

Yann LeCun, Yoshua Bengio, Geoffrey E Hinton (2015)

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

0 comments Cited 8475 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

The Measurement of Observer Agreement for Categorical Data

J Landis, Gary G. Koch (1977)

0 comments Cited 6871 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Christoph M Flath:

ORCID: https://orcid.org/0000-0002-1761-9833

Robert Blum:

ORCID: https://orcid.org/0000-0002-5270-3854

Scott E Fraser: Role: Reviewing Editor

Marianne E Bronner: Role: Senior Editor

Journal

Journal ID (nlm-ta): eLife

Journal ID (iso-abbrev): Elife

Journal ID (publisher-id): eLife

Title: eLife

Publisher: eLife Sciences Publications, Ltd

ISSN (Electronic): 2050-084X

Publication date (Electronic, pub): 19 October 2020

Publication date Collection: 2020

Volume: 9

Electronic Location Identifier: e59780

Affiliations

[1 ]Institute of Clinical Neurobiology, University Hospital Würzburg WürzburgGermany

[2 ]Department of Business and Economics, University of Würzburg WürzburgGermany

[3 ]Institute of Physiology I, Westfälische Wilhlems-Universität MünsterGermany

[4 ]Department of Pharmacology, Medical University of Innsbruck InnsbruckAustria

[5 ]Department of Pharmacology and Toxicology, Institute of Pharmacy and Center for Molecular Biosciences Innsbruck, University of Innsbruck InnsbruckAustria

[6 ]Department of Child and Adolescent Psychiatry, Center of Mental Health, University Hospital Würzburg WürzburgGermany

[7 ]Comprehensive Anxiety Center WürzburgGermany

University of Southern California United States

California Institute of Technology United States

University of Southern California United States

USC United States

University of Texas Southwestern Medical Center United States

Author notes

[†]

These authors contributed equally to this work.

[‡]

These authors also contributed equally to this work.

Author information

Dennis Segebarth https://orcid.org/0000-0002-3806-9324

Matthias Griebel https://orcid.org/0000-0003-1959-0242

Lucas B Comeras https://orcid.org/0000-0003-2445-3605

Anupam Sah https://orcid.org/0000-0001-8298-6501

Christina Lillesaar http://orcid.org/0000-0002-5166-2851

Nicolas Singewald http://orcid.org/0000-0002-0166-3370

Hans-Christian Pape http://orcid.org/0000-0001-6874-8224

Christoph M Flath https://orcid.org/0000-0002-1761-9833

Robert Blum https://orcid.org/0000-0002-5270-3854

Article

Publisher ID: 59780

DOI: 10.7554/eLife.59780

PMC ID: 7710359

PubMed ID: 33074102

SO-VID: 0bb4fb80-e0a2-47b1-885e-17d50f261529

License:

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

History

Date received : 15 June 2020

Date accepted : 16 October 2020

Funding

Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;

Award ID: ID 44541416 - TRR58 A10

Award Recipient : Robert Blum

Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;

Award ID: ID 44541416 - TRR58 A03

Award Recipient : Hans-Christian Pape

Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;

Award ID: ID 44541416 - TRR58 B08

Award Recipient : Maren D Lange

Funded by: Graduate School of Life Sciences Wuerzburg;

Award ID: fellowship

Award Recipient : Rohini Gupta Manju Sasi

Funded by: FundRef http://dx.doi.org/10.13039/501100002428, Austrian Science Fund;

Award ID: P29952 & P25851

Award Recipient : Ramon O Tasan

Funded by: FundRef http://dx.doi.org/10.13039/501100002428, Austrian Science Fund;

Award ID: I 3875

Award Recipient : Nicolas Singewald

Funded by: FundRef http://dx.doi.org/10.13039/501100002428, Austrian Science Fund;

Award ID: DKW-1206

Award Recipient : Nicolas Singewald

Funded by: FundRef http://dx.doi.org/10.13039/501100002428, Austrian Science Fund;

Award ID: SFB F4410

Award Recipient : Nicolas Singewald

Funded by: Interdisziplinaeres Zentrum fuer Klinische Zusammenarbeit Wuerzburg;

Award ID: N-320

Award Recipient : Christina Lillesaar

Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;

Award ID: ID 424778381 A02

Award Recipient : Robert Blum

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Custom metadata

Author impact statement A comparison of different bioimage analysis pipelines reveals how deep learning can be used for automatized and reliable analysis of fluorescent features in biological datasets.

ScienceOpen disciplines: Life sciences

Keywords: bioimage informatics,deep learning,reproducibility,objectivity,validity,fluorescence microscopy,mouse,zebrafish

Data availability:

ScienceOpen disciplines: Life sciences

Keywords: bioimage informatics, deep learning, reproducibility, objectivity, validity, fluorescence microscopy, mouse, zebrafish

On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

Read this article at

Abstract

eLife digest

Related collections

Computer Vision, Deep Learning, Deep Reinforcement Learning, IoT

Most cited references 75

NIH Image to ImageJ: 25 years of image analysis

Deep learning.

The Measurement of Observer Agreement for Categorical Data

Author and article information

Contributors

Journal

Affiliations

Author notes

Author information

Article

History

Funding

Categories

Custom metadata

Comments

Comment on this article

Similar content 106

Cited by 9

Most referenced authors 1,660