Blog
About

1
views
0
recommends
+1 Recommend
1 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found

      Performances of Clustering Methods Considering Data Transformation and Sample Size: An Evaluation with Fisheries Survey Data

      Read this article at

      ScienceOpenPublisher
      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering (HC), K-means (KM), and expectation-maximization (EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap (APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.

          Related collections

          Author and article information

          Journal
          JOUC
          Journal of Ocean University of China
          Science Press and Springer (China )
          1672-5182
          02 May 2020
          01 June 2020
          : 19
          : 3
          : 659-668
          Affiliations
          1College of Fisheries, Ocean University of China, Qingdao 266003, China
          2Laboratory for Marine Science and Food Production Processes, National Laboratory for Marine Science and Technology, Qingdao 266237, China
          Author notes
          *Corresponding author: REN Yiping, Tel: 0086-532-82032960, E-mail: renyip@ 123456ouc.edu.cn
          Article
          s11802-020-4200-3
          10.1007/s11802-020-4200-3
          Copyright © Ocean University of China, Science Press and Springer-Verlag GmbH Germany 2020.

          The copyright to this article, including any graphic elements therein (e.g. illustrations, charts, moving images), is hereby assigned for good and valuable consideration to the editorial office of Journal of Ocean University of China, Science Press and Springer effective if and when the article is accepted for publication and to the extent assignable if assignability is restricted for by applicable law or regulations (e.g. for U.S. government or crown employees).

          Product
          Self URI (journal-page): https://www.springer.com/journal/11802

          Comments

          Comment on this article