Learning Document Embeddings by Predicting N-grams for Sentiment
  Classification of Long Movie Reviews

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Despite the loss of semantic information, bag-of-ngram based methods still achieve state-of-the-art results for tasks such as sentiment classification of long movie reviews. Many document embeddings methods have been proposed to capture semantics, but they still can't outperform bag-of-ngram based methods on this task. In this paper, we modify the architecture of the recently proposed Paragraph Vector, allowing it to learn document vectors by predicting not only words, but n-gram features as well. Our model is able to capture both semantics and word order in documents while keeping the expressive power of learned vectors. Experimental results on IMDB movie review dataset shows that our model outperforms previous deep learning models and bag-of-ngram based models due to the above advantages. More robust results are also obtained when our model is combined with other models. The source code of our model will be also published together with this paper.

Related collections

Author and article information

Journal

Publication date Created: 2015-12-27

Publication date Updated: 2016-04-23

Article

ArXiV ID: 1512.08183

SO-VID: 29eee304-15df-4574-abb2-38386c0ce111

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.CL

ScienceOpen disciplines: Theoretical computer science

Data availability:

ScienceOpen disciplines: Theoretical computer science

Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews

Read this article at

Abstract

Related collections

Blockchain in Healthcare Today

Author and article information

Journal

Article

History

Custom metadata

Comments

Comment on this article

Similar content 153