5
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Preprocessing Methods and Pipelines of Data Mining: An Overview

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models to make them more robust to the noise of the input data, their qualities still strongly depend on the quality of it. The article starts with an overview of the data mining pipeline, where the procedures in a data mining task are briefly introduced. Then an overview of the data preprocessing techniques which are categorized as the data cleaning, data transformation and data preprocessing is given. Detailed preprocessing methods, as well as their influenced on the data mining models, are covered in this article.

          Related collections

          Most cited references7

          • Record: found
          • Abstract: not found
          • Article: not found

          Distance-based outliers: algorithms and applications

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points

            Velmurugan (2010)
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Cluster-based outlier detection

                Bookmark

                Author and article information

                Journal
                20 June 2019
                Article
                1906.08510
                89beef1a-13d8-4744-9fa6-545bba2eae78

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                7 pages, 3 figures, IEEE conference format
                cs.LG cs.DB stat.ML

                Databases,Machine learning,Artificial intelligence
                Databases, Machine learning, Artificial intelligence

                Comments

                Comment on this article