92
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Systematic Construction of Anomaly Detection Benchmarks from Real Data

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Research in anomaly detection suffers from a lack of realistic and publicly-available data sets. Because of this, most published experiments in anomaly detection validate their algorithms with application-specific case studies or benchmark datasets of the researchers' construction. This makes it difficult to compare different methods or to measure progress in the field. It also limits our ability to understand the factors that determine the performance of anomaly detection algorithms. This article proposes a new methodology for empirical analysis and evaluation of anomaly detection algorithms. It is based on generating thousands of benchmark datasets by transforming existing supervised learning benchmark datasets and manipulating properties relevant to anomaly detection. The paper identifies and validates four important dimensions: (a) point difficulty, (b) relative frequency of anomalies, (c) clusteredness of anomalies, and (d) relevance of features. We apply our generated datasets to analyze several leading anomaly detection algorithms. The evaluation verifies the importance of these dimensions and shows that, while some algorithms are clearly superior to others, anomaly detection accuracy is determined more by variation in the four dimensions than by the choice of algorithm.

          Related collections

          Author and article information

          Journal
          1503.01158

          Machine learning,Artificial intelligence
          Machine learning, Artificial intelligence

          Comments

          Comment on this article