8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Clustering algorithms: A comparative approach

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Many real-world systems can be studied in terms of pattern recognition tasks, so that proper use (and understanding) of machine learning methods in practical applications becomes essential. While many classification methods have been proposed, there is no consensus on which methods are more suitable for a given dataset. As a consequence, it is important to comprehensively compare methods in many possible scenarios. In this context, we performed a systematic comparison of 9 well-known clustering methods available in the R language assuming normally distributed data. In order to account for the many possible variations of data, we considered artificial datasets with several tunable properties (number of classes, separation between classes, etc). In addition, we also evaluated the sensitivity of the clustering methods with regard to their parameters configuration. The results revealed that, when considering the default configurations of the adopted methods, the spectral approach tended to present particularly good performance. We also found that the default configuration of the adopted implementations was not always accurate. In these cases, a simple approach based on random selection of parameters values proved to be a good alternative to improve the performance. All in all, the reported approach provides subsidies guiding the choice of clustering algorithms.

          Related collections

          Most cited references71

          • Record: found
          • Abstract: not found
          • Article: not found

          MapReduce

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters

            J. C. Dunn (1973)
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              A Tutorial on Spectral Clustering

              In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.
                Bookmark

                Author and article information

                Contributors
                Role: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: SoftwareRole: ValidationRole: VisualizationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Data curationRole: Formal analysisRole: InvestigationRole: MethodologyRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Funding acquisitionRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – review & editing
                Role: Funding acquisitionRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: ConceptualizationRole: Formal analysisRole: Funding acquisitionRole: MethodologyRole: Project administrationRole: ResourcesRole: SupervisionRole: ValidationRole: Writing – original draftRole: Writing – review & editing
                Role: Funding acquisitionRole: Project administrationRole: SupervisionRole: ValidationRole: Writing – review & editing
                Role: Editor
                Journal
                PLoS One
                PLoS ONE
                plos
                plosone
                PLoS ONE
                Public Library of Science (San Francisco, CA USA )
                1932-6203
                2019
                15 January 2019
                : 14
                : 1
                : e0210236
                Affiliations
                [1 ] Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo, Brazil
                [2 ] Department of Computer Science, Federal University of São Carlos, São Carlos, São Paulo, Brazil
                [3 ] Federal University of Technology, Paraná, Paraná, Brazil
                [4 ] São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo, Brazil
                University of Ulm, GERMANY
                Author notes

                Competing Interests: The authors have declared that no competing interests exist.

                Author information
                http://orcid.org/0000-0003-1207-4982
                Article
                PONE-D-16-51060
                10.1371/journal.pone.0210236
                6333366
                30645617
                3cdf671f-4c1c-4382-a1ae-01b72ff75e52
                © 2019 Rodriguez et al

                This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

                History
                : 26 December 2016
                : 19 December 2018
                Page count
                Figures: 8, Tables: 14, Pages: 34
                Funding
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo (BR)
                Award ID: 15/18942-8
                Award Recipient :
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo (BR)
                Award ID: 16/19069-9
                Award Recipient :
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo (BR)
                Award ID: 14/08026-1
                Award Recipient :
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo (BR)
                Award ID: 11/50761-2
                Award Recipient :
                Funded by: Conselho Nacional de Desenvolvimento Científico e Tecnológico (BR)
                Award ID: 307797/2014-7
                Award Recipient :
                Funded by: Conselho Nacional de Desenvolvimento Científico e Tecnológico (BR)
                Award ID: 307333/2013-2
                Award Recipient :
                Funded by: Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (BR)
                Award Recipient :
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo (BR)
                Award ID: 18/09125-4
                Award Recipient :
                Funded by: Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (BR)
                Award ID: 001
                Award Recipient :
                Funded by: Fundação de Amparo à Pesquisa do Estado de São Paulo (BR
                Award ID: 15/22308-2
                Award Recipient :
                This work has been supported by FAPESP - Fundação de Amparo à Pesquisa do Estado de São Paulo (grant nos. 15/18942-8 and 18/09125-4 for CHC, 14/20830-0 and 16/19069-9 for DRA, 14/08026-1 for OMB and 11/50761-2 and 15/22308-2 for LdFC), CNPq - Conselho Nacional de Desenvolvimento Científico e Tecnológico (grant nos. 307797/2014-7 for OMB and 307333/2013-2 for LdFC), Núcleo de Apoio à Pesquisa (LdFC) and CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Finance Code 001).
                Categories
                Research Article
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Clustering Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Clustering Algorithms
                Physical Sciences
                Mathematics
                Probability Theory
                Probability Distribution
                Normal Distribution
                Physical Sciences
                Mathematics
                Applied Mathematics
                Algorithms
                Machine Learning Algorithms
                Research and Analysis Methods
                Simulation and Modeling
                Algorithms
                Machine Learning Algorithms
                Computer and Information Sciences
                Artificial Intelligence
                Machine Learning
                Machine Learning Algorithms
                Research and Analysis Methods
                Mathematical and Statistical Techniques
                Cluster Analysis
                Hierarchical Clustering
                Physical Sciences
                Mathematics
                Probability Theory
                Random Variables
                Covariance
                Biology and Life Sciences
                Neuroscience
                Cognitive Science
                Cognitive Psychology
                Language
                Biology and Life Sciences
                Psychology
                Cognitive Psychology
                Language
                Social Sciences
                Psychology
                Cognitive Psychology
                Language
                Physical Sciences
                Mathematics
                Optimization
                Custom metadata
                All datasets used for evaluating the algorithms can be obtained from Figshare: https://figshare.com/s/29005b491a418a667b22.

                Uncategorized
                Uncategorized

                Comments

                Comment on this article