Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise
  Similarities

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Ensemble clustering has been a popular research topic in data mining and machine learning. Despite its significant progress in recent years, there are still two challenging issues in the current ensemble clustering research. First, most of the existing algorithms tend to investigate the ensemble information at the object-level, yet often lack the ability to explore the rich information at higher levels of granularity. Second, they mostly focus on the direct connections (e.g., direct intersection or pair-wise co-occurrence) in the multiple base clusterings, but generally neglect the multi-scale indirect relationship hidden in them. To address these two issues, this paper presents a novel ensemble clustering approach based on fast propagation of cluster-wise similarities via random walks. We first construct a cluster similarity graph with the base clusters treated as graph nodes and the cluster-wise Jaccard coefficient exploited to compute the initial edge weights. Upon the constructed graph, a transition probability matrix is defined, based on which the random walk process is conducted to propagate the graph structural information. Specifically, by investigating the propagating trajectories starting from different nodes, a new cluster-wise similarity matrix can be derived by considering the trajectory relationship. Then, the newly obtained cluster-wise similarity matrix is mapped from the cluster-level to the object-level to achieve an enhanced co-association (ECA) matrix, which is able to simultaneously capture the object-wise co-occurrence relationship as well as the multi-scale cluster-wise relationship in ensembles. Finally, two novel consensus functions are proposed to obtain the consensus clustering result. Extensive experiments on a variety of real-world datasets have demonstrated the effectiveness and efficiency of our approach.

Related collections

Most cited references 30

Record: found
Abstract: not found
Article: not found

Normalized cuts and image segmentation

Jianbo Shi, S J Malik, Wei Dong (2000)

0 comments Cited 1531 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Factor graphs and the sum-product algorithm

F.R. Kschischang, B.J. Frey, H.-A. Loeliger (2001)

0 comments Cited 872 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Combining multiple clusterings using evidence accumulation.

Rishi K. Jain, Ana Fred (2005)

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble--a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n x n, similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.

0 comments Cited 133 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Publication date Created: 30 October 2018

Article

DOI: 10.1109/TSMC.2018.2876202

ArXiV ID: 1810.12544

SO-VID: e1ab0f60-ce22-4ee5-9315-761a50fea397

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments To appear in IEEE Transactions on Systems, Man, and Cybernetics: Systems. The MATLAB source code of this work is available at: http://www.researchgate.net/publication/328581758

Categories cs.LG stat.ML

ScienceOpen disciplines: Machine learning,Artificial intelligence

Data availability:

ScienceOpen disciplines: Machine learning, Artificial intelligence

Enhanced Ensemble Clustering via Fast Propagation of Cluster-wise Similarities

Read this article at

Abstract

Related collections

Semantic Knowledge Base

Most cited references 30

Normalized cuts and image segmentation

Factor graphs and the sum-product algorithm

Combining multiple clusterings using evidence accumulation.

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 7

Most referenced authors 223