20
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A Validation and Quality Assessment Method with Metamorphic Relations for Unsupervised Machine Learning Software

      Preprint
      , , , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Unsupervised machine learning is a task of modeling the underlying structure of ``unlabeled data''. Since learning algorithms have been incorporated into many real-world applications, validating the implementations of those algorithms becomes much more important in the aim of software quality assurance. However, validating unsupervised machine learning programs is challenging because there lacks of priori knowledge. Along this line, in this paper, we present a metamorphic testing based method for validating and characterizing unsupervised machine learning programs, and conduct an empirical study on a real-world machine learning tool. The results demonstrate to what extent a program may fit to the profile of a specific scenario, which help end-users or software practitioners comprehend its performance in a vivid and light-weight way. And the experimental findings also reveal the gap between theory and implementation in a software artifact which could be easily ignored by people without much practical experience. In our method, metamorphic relations can serve as one type of quality measure, and one type of guidelines for selecting suitable programs.

          Related collections

          Most cited references10

          • Record: found
          • Abstract: found
          • Article: not found

          Survey of clustering algorithms.

          Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Mixture Densities, Maximum Likelihood and the EM Algorithm

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              An empirical comparison of supervised learning algorithms

                Bookmark

                Author and article information

                Journal
                27 July 2018
                Article
                1807.10453
                9a2cf741-7ab3-4b5d-baec-c813dad22cb8

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                Zhiyi Zhang and Xiaoyuan Xie are the co-first authors
                cs.SE

                Software engineering
                Software engineering

                Comments

                Comment on this article