6
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Human-supervised clustering of multidimensional data using crowdsourcing

      research-article
      1 , 1 , 2 , 1 ,
      Royal Society Open Science
      The Royal Society
      data clustering, human-computing, crowdsourcing, games

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.

          Related collections

          Most cited references35

          • Record: found
          • Abstract: found
          • Article: not found

          Fast unfolding of communities in large networks

          Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Silhouettes: A graphical aid to the interpretation and validation of cluster analysis

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Modularity and community structure in networks

              M. Newman (2006)
              Many networks of interest in the sciences, including social networks, computer networks, and metabolic and regulatory networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure is one of the outstanding issues in the study of networked systems. One highly effective approach is the optimization of the quality function known as "modularity" over the possible divisions of a network. Here I show that the modularity can be expressed in terms of the eigenvectors of a characteristic matrix for the network, which I call the modularity matrix, and that this expression leads to a spectral algorithm for community detection that returns results of demonstrably higher quality than competing methods in shorter running times. I illustrate the method with applications to several published network data sets.
                Bookmark

                Author and article information

                Contributors
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draft
                Role: SoftwareRole: Visualization
                Role: Data curationRole: Formal analysisRole: MethodologyRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: MethodologyRole: SupervisionRole: ValidationRole: Writing – review & editing
                Journal
                R Soc Open Sci
                R Soc Open Sci
                RSOS
                royopensci
                Royal Society Open Science
                The Royal Society
                2054-5703
                May 24, 2022
                May 2022
                May 24, 2022
                : 9
                : 5
                : 211189
                Affiliations
                [ 1 ] School of Computer Science, McGill University, , Montréal, Canada
                [ 2 ] Department of Computer Science, University of Manitoba, , Winnipeg, Canada
                Author notes

                Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.5994902.

                Author information
                http://orcid.org/0000-0002-2561-7117
                Article
                rsos211189
                10.1098/rsos.211189
                9128850
                35620007
                3c325651-d6d7-460d-8ffc-f5290795158f
                © 2022 The Authors.

                Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

                History
                : July 15, 2021
                : April 29, 2022
                Funding
                Funded by: Genome Quebec;
                Funded by: Canadian Institutes of Health Research, http://dx.doi.org/10.13039/501100000024;
                Award ID: Bioinformatics & Computational Biology
                Funded by: Genome Canada, http://dx.doi.org/10.13039/100008762;
                Award ID: Bioinformatics & Computational Biology
                Categories
                1003
                7
                104
                Computer Science and Artificial Intelligence
                Research Articles

                data clustering,human-computing,crowdsourcing,games
                data clustering, human-computing, crowdsourcing, games

                Comments

                Comment on this article