4
views
0
recommends
+1 Recommend
0 collections
0
shares
• Record: found
• Abstract: found
• Article: found
Is Open Access

A Validation and Quality Assessment Method with Metamorphic Relations for Unsupervised Machine Learning Software

Preprint

, , , ,

Bookmark
There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Unsupervised machine learning is a task of modeling the underlying structure of unlabeled data''. Since learning algorithms have been incorporated into many real-world applications, validating the implementations of those algorithms becomes much more important in the aim of software quality assurance. However, validating unsupervised machine learning programs is challenging because there lacks of priori knowledge. Along this line, in this paper, we present a metamorphic testing based method for validating and characterizing unsupervised machine learning programs, and conduct an empirical study on a real-world machine learning tool. The results demonstrate to what extent a program may fit to the profile of a specific scenario, which help end-users or software practitioners comprehend its performance in a vivid and light-weight way. And the experimental findings also reveal the gap between theory and implementation in a software artifact which could be easily ignored by people without much practical experience. In our method, metamorphic relations can serve as one type of quality measure, and one type of guidelines for selecting suitable programs.

Most cited references10

• Record: found
• Abstract: found

Survey of clustering algorithms.

(2005)
Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
Bookmark
• Record: found

Mixture Densities, Maximum Likelihood and the EM Algorithm

(1984)
Bookmark
• Record: found

An empirical comparison of supervised learning algorithms

Bookmark

Author and article information

Journal
27 July 2018
Article
1807.10453