1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      On the objectivity, reliability, and validity of deep learning enabled bioimage analyses

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Bioimage analysis of fluorescent labels is widely used in the life sciences. Recent advances in deep learning (DL) allow automating time-consuming manual image analysis processes based on annotated training data. However, manual annotation of fluorescent features with a low signal-to-noise ratio is somewhat subjective. Training DL models on subjective annotations may be instable or yield biased models. In turn, these models may be unable to reliably detect biological effects. An analysis pipeline integrating data annotation, ground truth estimation, and model training can mitigate this risk. To evaluate this integrated process, we compared different DL-based analysis approaches. With data from two model organisms (mice, zebrafish) and five laboratories, we show that ground truth estimation from multiple human annotators helps to establish objectivity in fluorescent feature annotations. Furthermore, ensembles of multiple models trained on the estimated ground truth establish reliability and validity. Our research provides guidelines for reproducible DL-based bioimage analyses.

          eLife digest

          Research in biology generates many image datasets, mostly from microscopy. These images have to be analyzed, and much of this analysis relies on a human expert looking at the images and manually annotating features. Image datasets are often large, and human annotation can be subjective, so automating image analysis is highly desirable. This is where machine learning algorithms, such as deep learning, have proven to be useful. In order for deep learning algorithms to work first they have to be ‘trained’. Deep learning algorithms are trained by being given a training dataset that has been annotated by human experts. The algorithms extract the relevant features to look out for from this training dataset and can then look for these features in other image data.

          However, it is also worth noting that because these models try to mimic the annotation behavior presented to them during training as well as possible, they can sometimes also mimic an expert’s subjectivity when annotating data. Segebarth, Griebel et al. asked whether this was the case, whether it had an impact on the outcome of the image data analysis, and whether it was possible to avoid this problem when using deep learning for imaging dataset analysis.

          For this research, Segebarth, Griebel et al. used microscopy images of mouse brain sections, where a protein called cFOS had been labeled with a fluorescent tag. This protein typically controls the rate at which DNA information is copied into RNA, leading to the production of proteins. Its activity can be influenced experimentally by testing the behaviors of mice. Thus, this experimental manipulation can be used to evaluate the results of deep learning-based image analyses.

          First, the fluorescent images were interpreted manually by a group of human experts. Then, their results were used to train a large variety of deep learning models. Models were trained either on the results of an individual expert or on the results pooled from all experts to come up with a consensus model, a deep learning model that learned from the personal annotation preferences of all experts. This made it possible to test whether training a model on multiple experts reduces the risk of subjectivity. As the training of deep learning models is random, Segebarth, Griebel et al. also tested whether combining the predictions from multiple models in a so-called model ensemble improves the consistency of the analyses. For evaluation, the annotations of the deep learning models were compared to those of the human experts, to ensure that the results were not influenced by the subjective behavior of one person. The results of all bioimage annotations were finally compared to the experimental results from analyzing the mice’s behaviors in order to check whether the models were able to find the behavioral effect on cFOS.

          Segebarth, Griebel et al. concluded that combining the expert knowledge of multiple experts reduces the subjectivity of bioimage annotation by deep learning algorithms. Combining such consensus information in a group of deep learning models improves the quality of bioimage analysis, so that the results are reliable, transparent and less subjective.

          Related collections

          Most cited references75

          • Record: found
          • Abstract: found
          • Article: not found

          NIH Image to ImageJ: 25 years of image analysis

          For the past twenty five years the NIH family of imaging software, NIH Image and ImageJ have been pioneers as open tools for scientific image analysis. We discuss the origins, challenges and solutions of these two programs, and how their history can serve to advise and inform other software projects.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Deep learning.

            Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The Measurement of Observer Agreement for Categorical Data

                Bookmark

                Author and article information

                Contributors
                Role: Reviewing Editor
                Role: Senior Editor
                Journal
                eLife
                Elife
                eLife
                eLife
                eLife Sciences Publications, Ltd
                2050-084X
                19 October 2020
                2020
                : 9
                : e59780
                Affiliations
                [1 ]Institute of Clinical Neurobiology, University Hospital Würzburg WürzburgGermany
                [2 ]Department of Business and Economics, University of Würzburg WürzburgGermany
                [3 ]Institute of Physiology I, Westfälische Wilhlems-Universität MünsterGermany
                [4 ]Department of Pharmacology, Medical University of Innsbruck InnsbruckAustria
                [5 ]Department of Pharmacology and Toxicology, Institute of Pharmacy and Center for Molecular Biosciences Innsbruck, University of Innsbruck InnsbruckAustria
                [6 ]Department of Child and Adolescent Psychiatry, Center of Mental Health, University Hospital Würzburg WürzburgGermany
                [7 ]Comprehensive Anxiety Center WürzburgGermany
                University of Southern California United States
                California Institute of Technology United States
                University of Southern California United States
                USC United States
                University of Texas Southwestern Medical Center United States
                Author notes
                [†]

                These authors contributed equally to this work.

                [‡]

                These authors also contributed equally to this work.

                Author information
                https://orcid.org/0000-0002-3806-9324
                https://orcid.org/0000-0003-1959-0242
                https://orcid.org/0000-0003-2445-3605
                https://orcid.org/0000-0001-8298-6501
                http://orcid.org/0000-0002-5166-2851
                http://orcid.org/0000-0002-0166-3370
                http://orcid.org/0000-0001-6874-8224
                https://orcid.org/0000-0002-1761-9833
                https://orcid.org/0000-0002-5270-3854
                Article
                59780
                10.7554/eLife.59780
                7710359
                33074102
                0bb4fb80-e0a2-47b1-885e-17d50f261529
                © 2020, Segebarth et al

                This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

                History
                : 15 June 2020
                : 16 October 2020
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;
                Award ID: ID 44541416 - TRR58 A10
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;
                Award ID: ID 44541416 - TRR58 A03
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;
                Award ID: ID 44541416 - TRR58 B08
                Award Recipient :
                Funded by: Graduate School of Life Sciences Wuerzburg;
                Award ID: fellowship
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100002428, Austrian Science Fund;
                Award ID: P29952 & P25851
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100002428, Austrian Science Fund;
                Award ID: I 3875
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100002428, Austrian Science Fund;
                Award ID: DKW-1206
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100002428, Austrian Science Fund;
                Award ID: SFB F4410
                Award Recipient :
                Funded by: Interdisziplinaeres Zentrum fuer Klinische Zusammenarbeit Wuerzburg;
                Award ID: N-320
                Award Recipient :
                Funded by: FundRef http://dx.doi.org/10.13039/501100001659, Deutsche Forschungsgemeinschaft;
                Award ID: ID 424778381 A02
                Award Recipient :
                The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
                Categories
                Research Article
                Computational and Systems Biology
                Neuroscience
                Custom metadata
                A comparison of different bioimage analysis pipelines reveals how deep learning can be used for automatized and reliable analysis of fluorescent features in biological datasets.

                Life sciences
                bioimage informatics,deep learning,reproducibility,objectivity,validity,fluorescence microscopy,mouse,zebrafish

                Comments

                Comment on this article