WoCE: a framework for clustering ensemble by exploiting the wisdom of
  Crowds theory

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The Wisdom of Crowds (WOC), as a theory in the social science, gets a new paradigm in computer science. The WOC theory explains that the aggregate decision made by a group is often better than those of its individual members if specific conditions are satisfied. This paper presents a novel framework for unsupervised and semi-supervised cluster ensemble by exploiting the WOC theory. We employ four conditions in the WOC theory, i.e., diversity, independency, decentralization and aggregation, to guide both the constructing of individual clustering results and the final combination for clustering ensemble. Firstly, independency criterion, as a novel mapping system on the raw data set, removes the correlation between features on our proposed method. Then, decentralization as a novel mechanism generates high-quality individual clustering results. Next, uniformity as a new diversity metric evaluates the generated clustering results. Further, weighted evidence accumulation clustering method is proposed for the final aggregation without using thresholding procedure. Experimental study on varied data sets demonstrates that the proposed approach achieves superior performance to state-of-the-art methods.

Related collections

Most cited references 13

Record: found
Abstract: found
Article: not found

Combining multiple clusterings using evidence accumulation.

Rishi K. Jain, Ana Fred (2005)

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble--a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n x n, similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.

0 comments Cited 133 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Locally Consistent Concept Factorization for Document Clustering

Jiawei Han, Xiaofei He, Deng Cai (2011)

0 comments Cited 42 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Semi-Supervised Kernel Mean Shift Clustering.

Saket Anand, Sushil Mittal, Oncel Tuzel … (2014)

Mean shift clustering is a powerful nonparametric technique that does not require prior knowledge of the number of clusters and does not constrain the shape of the clusters. However, being completely unsupervised, its performance suffers when the original distance metric fails to capture the underlying cluster structure. Despite recent advances in semi-supervised clustering methods, there has been little effort towards incorporating supervision into mean shift. We propose a semi-supervised framework for kernel mean shift clustering (SKMS) that uses only pairwise constraints to guide the clustering procedure. The points are first mapped to a high-dimensional kernel space where the constraints are imposed by a linear transformation of the mapped points. This is achieved by modifying the initial kernel matrix by minimizing a log det divergence-based objective function. We show the advantages of SKMS by evaluating its performance on various synthetic and real datasets while comparing with state-of-the-art semi-supervised clustering algorithms.

0 comments Cited 20 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Publication date Created: 2016-12-20

Article

ArXiV ID: 1612.06598

SO-VID: 1002c402-d868-4793-b389-2d7674784119

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments Accepted in IEEE Transactions on Cybernetics

Categories stat.ML cs.LG

ScienceOpen disciplines: Artificial intelligence

Data availability:

ScienceOpen disciplines: Artificial intelligence

WoCE: a framework for clustering ensemble by exploiting the wisdom of Crowds theory

Read this article at

Abstract

Related collections

Semantic Knowledge Base

Most cited references 13

Combining multiple clusterings using evidence accumulation.

Locally Consistent Concept Factorization for Document Clustering

Semi-Supervised Kernel Mean Shift Clustering.

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 131

Most referenced authors 107