45
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.

          Author summary

          Food security is a growing global concern. Farmers, plant breeders, and geneticists are hastening to address the challenges presented to agriculture by climate change, dwindling arable land, and population growth. Scientists in the field of plant phenomics are using satellite and drone images to understand how crops respond to a changing environment and to combine genetics and environmental measures to maximize crop growth efficiency. However, the terabytes of image data require new computational methods to extract useful information. Machine learning algorithms are effective in recognizing select parts of images, but they require high quality data curated by people to train them, a process that can be laborious and costly. We examined how well crowdsourcing works in providing training data for plant phenomics, specifically, segmenting a corn tassel—the male flower of the corn plant—from the often-cluttered images of a cornfield. We provided images to students, and to Amazon MTurkers, the latter being an on-demand workforce brokered by Amazon.com and paid on a task-by-task basis. We report on best practices in crowdsourcing image labeling for phenomics, and compare the different groups on measures such as fatigue and accuracy over time. We find that crowdsourcing is a good way of generating quality labeled data, rivaling that of experts.

          Related collections

          Most cited references26

          • Record: found
          • Abstract: found
          • Article: not found

          Best linear unbiased estimation and prediction under a selection model.

          Mixed linear models are assumed in most animal breeding applications. Convenient methods for computing BLUE of the estimable linear functions of the fixed elements of the model and for computing best linear unbiased predictions of the random elements of the model have been available. Most data available to animal breeders, however, do not meet the usual requirements of random sampling, the problem being that the data arise either from selection experiments or from breeders' herds which are undergoing selection. Consequently, the usual methods are likely to yield biased estimates and predictions. Methods for dealing with such data are presented in this paper.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found
              Is Open Access

              Machine Learning for High-Throughput Stress Phenotyping in Plants.

              Advances in automated and high-throughput imaging technologies have resulted in a deluge of high-resolution images and sensor data of plants. However, extracting patterns and features from this large corpus of data requires the use of machine learning (ML) tools to enable data assimilation and feature identification for stress phenotyping. Four stages of the decision cycle in plant stress phenotyping and plant breeding activities where different ML approaches can be deployed are (i) identification, (ii) classification, (iii) quantification, and (iv) prediction (ICQP). We provide here a comprehensive overview and user-friendly taxonomy of ML tools to enable the plant community to correctly and easily apply the appropriate ML tools and best-practice guidelines for various biotic and abiotic stress traits.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: Data curationRole: Formal analysisRole: VisualizationRole: Writing – original draft
                Role: Data curationRole: Formal analysisRole: MethodologyRole: SoftwareRole: VisualizationRole: Writing – review & editing
                Role: Data curationRole: InvestigationRole: MethodologyRole: ResourcesRole: Software
                Role: Data curationRole: Formal analysisRole: InvestigationRole: Project administrationRole: SoftwareRole: ValidationRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: SoftwareRole: Writing – original draftRole: Writing – review & editing
                Role: Formal analysisRole: MethodologyRole: SupervisionRole: VisualizationRole: Writing – review & editing
                Role: ConceptualizationRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: MethodologyRole: SupervisionRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: InvestigationRole: MethodologyRole: Project administrationRole: SupervisionRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: ResourcesRole: SupervisionRole: Writing – original draftRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS Comput Biol
                PLoS Comput. Biol
                plos
                ploscomp
                PLoS Computational Biology
                Public Library of Science (San Francisco, CA USA )
                1553-734X
                1553-7358
                July 2018
                30 July 2018
                : 14
                : 7
                : e1006337
                Affiliations
                [1 ] Program in Bioinformatics and Computational Biology, Iowa State University, Ames, Iowa, United States of America
                [2 ] Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, Iowa, United States of America
                [3 ] Department of Psychology, Iowa State University, Ames, Iowa, United States of America
                [4 ] Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, United States of America
                [5 ] Department of Mechanical Engineering, Iowa State University, Ames, Iowa, United States of America
                [6 ] Agricultural Research Services, United States Department of Agriculture, Ames, Iowa, United States of America
                [7 ] Department of Computer Science, Iowa State University
                [8 ] Department of Statistics, Iowa State University, Ames, Iowa, United States of America
                [9 ] Department of Agronomy, Iowa State University, Ames, Iowa, United States of America
                University of California, Riverside, UNITED STATES
                Author notes

                The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0003-4657-3050
                http://orcid.org/0000-0002-6045-1036
                http://orcid.org/0000-0002-1789-8000
                Article
                PCOMPBIOL-D-18-00362
                10.1371/journal.pcbi.1006337
                6085066
                30059508
                b59b55a6-d032-4800-9368-51486f9b0af1

                This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

                History
                : 7 March 2018
                : 29 June 2018
                Page count
                Figures: 6, Tables: 2, Pages: 16
                Funding
                Funded by: funder-id http://dx.doi.org/10.13039/100000153, Division of Biological Infrastructure;
                Award ID: 1458359
                Award Recipient :
                Funded by: funder-id http://dx.doi.org/10.13039/100009227, Iowa State University;
                Award ID: D3AI
                Award Recipient :
                This work was supported primarily by an award from the Iowa State University Presidential Interdisciplinary Research Initiative to support the D3AI (Data-Driven Discovery for Agricultural Innovation) project. For more information, see http://www.d3ai.iastate.edu/. Additional support came from the Iowa State University Plant Sciences Institute Faculty Scholars Program and the USDA Agricultural Research Service. IF was funded, in part, by National Science Foundation award ABI 1458359. DN, BG and CJLD gratefully acknowledge Iowa State University’s Plant Sciences Institute Scholars program funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
                Categories
                Research Article
                Research and Analysis Methods
                Experimental Organism Systems
                Model Organisms
                Maize
                Research and Analysis Methods
                Model Organisms
                Maize
                Biology and Life Sciences
                Organisms
                Eukaryota
                Plants
                Grasses
                Maize
                Research and Analysis Methods
                Experimental Organism Systems
                Plant and Algal Models
                Maize
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Machine Learning Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Machine Learning Algorithms
                Biology and Life Sciences
                Genetics
                Phenotypes
                Biology and Life Sciences
                Genetics
                Plant Genetics
                Crop Genetics
                Biology and Life Sciences
                Plant Science
                Plant Genetics
                Crop Genetics
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Research and Analysis Methods
                Research Design
                Pilot Studies
                Biology and Life Sciences
                Agriculture
                Crop Science
                Crops
                Engineering and Technology
                Signal Processing
                Image Processing
                Custom metadata
                vor-update-to-uncorrected-proof
                2018-08-09
                The software for this project is available from: https://github.com/ashleyzhou972/Crowdsource-Corn-Tassels. The data for this project are available from: https://doi.org/10.6084/m9.figshare.6360236.v2.

                Quantitative & Systems biology
                Quantitative & Systems biology

                Comments

                Comment on this article