9
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Taming Wild High Dimensional Text Data with a Fuzzy Lash

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: not found
          • Article: not found

          Outline of a New Approach to the Analysis of Complex Systems and Decision Processes

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A validity measure for fuzzy clustering

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              An empirical comparison of supervised learning algorithms

                Bookmark

                Author and article information

                Journal
                16 December 2017
                Article
                1712.05997
                f1746290-a852-4b4c-a886-1117bd1e9a27

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                stat.ML cs.CL cs.IR cs.LG stat.AP

                Comments

                Comment on this article