A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Today, semi-structured and unstructured data are mainly collected and analyzed for data analysis applicable to various systems. Such data have a dense distribution of space and usually contain outliers and noise data. There have been ongoing research studies on clustering algorithms to classify such data (outliers and noise data). The K-means algorithm is one of the most investigated clustering algorithms. Researchers have pointed out a couple of problems such as processing clustering for the number of clusters, K, by an analyst through his or her random choices, producing biased results in data classification through the connection of nodes in dense data, and higher implementation costs and lower accuracy according to the selection models of the initial centroids. Most K-means researchers have pointed out the disadvantage of outliers belonging to external or other clusters instead of the concerned ones when K is big or small. Thus, the present study analyzed problems with the selection of initial centroids in the existing K-means algorithm and investigated a new K-means algorithm of selecting initial centroids. The present study proposed a method of cutting down clustering calculation costs by applying an initial center point approach based on space division and outliers so that no objects would be subordinate to the initial cluster center for dependence lower from the initial cluster center. Since data containing outliers could lead to inappropriate results when they are reflected in the choice of a center point of a cluster, the study proposed an algorithm to minimize the error rates of outliers based on an improved algorithm for space division and distance measurement. The performance experiment results of the proposed algorithm show that it lowered the execution costs by about 13–14% compared with those of previous studies when there was an increase in the volume of clustering data or the number of clusters. It also recorded a lower frequency of outliers, a lower effectiveness index, which assesses performance deterioration with outliers, and a reduction of outliers by about 60%.

Related collections

Most cited references 69

Record: found
Abstract: not found
Article: not found

Algorithm AS 136: A K-Means Clustering Algorithm

J. A. Hartigan, M. A. Wong (1979)

0 comments Cited 787 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

An efficient k-means clustering algorithm: analysis and implementation

T. Kanungo, D.M. Mount, N.S. Netanyahu … (2002)

0 comments Cited 491 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

CRITICAL QUESTIONS FOR BIG DATA

danah boyd, Kate Crawford (2012)

0 comments Cited 446 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Journal

Journal ID (nlm-ta): Entropy (Basel)

Journal ID (iso-abbrev): Entropy (Basel)

Journal ID (publisher-id): entropy

Title: Entropy

Publisher: MDPI

ISSN (Electronic): 1099-4300

Publication date (Electronic): 17 August 2020

Publication date Collection: August 2020

Volume: 22

Issue: 8

Electronic Location Identifier: 902

Affiliations

[1 ]School of Creative Convergence, Andong National University, Andong 36729, Korea; jungsh@ 123456anu.ac.kr

[2 ]School of Computer Engineering, Youngsan University, 288 Junam-Ro, Yangsan, Gyeongnam 50510, Korea

[3 ]Department of Data Informatics, (National) Korea Maritime and Ocean University, Busan 49112, Korea

Author notes

[* ]Correspondence: mohan@ 123456ysu.ac.kr (H.L.); 72networks@ 123456pukyong.ac.kr or 72networks@ 123456kmou.ac.kr (J.-H.H.)

Author information

Se-Hoon Jung https://orcid.org/0000-0002-1776-9823

Hansung Lee https://orcid.org/0000-0002-6519-4120

Article

Publisher ID: entropy-22-00902

DOI: 10.3390/e22080902

PMC ID: 7517527

SO-VID: 3d0c9988-26f0-488f-8aef-7997884d514b

License:

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

History

Date received : 18 June 2020

Date accepted : 11 August 2020

Comments

Comment on this article

scite_

Cited by 1

Psychosocial Factors and Psychological Characteristics of Personality of Patients with Chronic Diseases Using Artificial Intelligence Data Mining Technology and Wireless Network Cloud Service Platform
Authors: Kangqi An

See all cited by

Most referenced authors 628

See all reference authors

A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost

Read this article at

Abstract

Related collections

Data-Driven Civil Engineering

Most cited references 69

Algorithm AS 136: A K-Means Clustering Algorithm

An efficient k-means clustering algorithm: analysis and implementation

CRITICAL QUESTIONS FOR BIG DATA

Author and article information

Journal

Affiliations

Author notes

Author information

Article

History

Categories

Comments

Comment on this article

Similar content 98

Cited by 1

Most referenced authors 628