12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A data-driven approach to estimating the number of clusters in hierarchical clustering

      methods-article
      a , 1
      F1000Research
      F1000Research
      Clustering, Hierarchy, Dendrogram, Gene Expression, Empirical

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          DNA microarray and gene expression problems often require a researcher to perform clustering on their data in a bid to better understand its structure. In cases where the number of clusters is not known, one can resort to hierarchical clustering methods. However, there currently exist very few automated algorithms for determining the true number of clusters in the data. We propose two new methods (mode and maximum difference) for estimating the number of clusters in a hierarchical clustering framework to create a fully automated process with no human intervention. These methods are compared to the established elbow and gap statistic algorithms using simulated datasets and the Biobase Gene ExpressionSet. We also explore a data mixing procedure inspired by cross validation techniques. We find that the overall performance of the maximum difference method is comparable or greater to that of the gap statistic in multi-cluster scenarios, and achieves that performance at a fraction of the computational cost. This method also responds well to our mixing procedure, which opens the door to future research. We conclude that both the mode and maximum difference methods warrant further study related to their mixing and cross-validation potential. We particularly recommend the use of the maximum difference method in multi-cluster scenarios given its accuracy and execution times, and present it as an alternative to existing algorithms.

          Related collections

          Most cited references1

          • Record: found
          • Abstract: not found
          • Book: not found

          The Elements of Statistical Learning

            Bookmark

            Author and article information

            Journal
            F1000Res
            F1000Res
            F1000Research
            F1000Research
            F1000Research (London, UK )
            2046-1402
            1 December 2016
            2016
            : 5
            : ISCB Comm J-2809
            Affiliations
            [1 ]Quantech Solutions LLC, San Rafael, CA, USA
            [1 ]Department of Physics, Randall Division of Cell & Molecular Biophysics, Faculty of Life Sciences & Medicine, King's College London, London, UK
            [1 ]RIKEN Center for Integrative Medical Sciences, Laboratory for Medical Science Mathematics, Yokohama, Japan
            [1 ]Shanghai Center for Systems Biomedicine, Shaighai JiaoTong University, Shanghai, China
            Author notes

            Competing interests: No competing interests were disclosed.

            Competing interests: No competing interests were disclosed.

            Competing interests: No competing interests were disclosed.

            Competing interests: No competing interests were disclosed.

            Article
            10.12688/f1000research.10103.1
            5373427
            9150cfd4-2196-4715-abfa-93fc812c6251
            Copyright: © 2016 Zambelli AE

            This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

            History
            : 25 November 2016
            Funding
            The author(s) declared that no grants were involved in supporting this work.
            Categories
            Method Article
            Articles
            Bioinformatics

            clustering,hierarchy,dendrogram,gene expression,empirical
            clustering, hierarchy, dendrogram, gene expression, empirical

            Comments

            Comment on this article