780
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Distributed Representations of Sentences and Documents

      Preprint
      ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

          Related collections

          Most cited references4

          • Record: found
          • Abstract: not found
          • Conference Proceedings: not found

          A unified architecture for natural language processing

            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            Fisher Kernels on Visual Vocabularies for Image Categorization

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              Large-scale image retrieval with compressed Fisher vectors

                Bookmark

                Author and article information

                Journal
                2014-05-16
                2014-05-22
                Article
                1405.4053
                0734e255-8e53-4ff4-8867-fc0f0d56a323

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                cs.CL cs.AI cs.LG

                Theoretical computer science,Artificial intelligence
                Theoretical computer science, Artificial intelligence

                Comments

                Comment on this article