Rehabilitation of Count-based Models for Word Vector Representations

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Recent works on word representations mostly rely on predictive models. Distributed word representations (aka word embeddings) are trained to optimally predict the contexts in which the corresponding words tend to appear. Such models have succeeded in capturing word similarties as well as semantic and syntactic regularities. Instead, we aim at reviving interest in a model based on counts. We present a systematic study of the use of the Hellinger distance to extract semantic representations from the word co-occurence statistics of large text corpora. We show that this distance gives good performance on word similarity and analogy tasks, with a proper type and size of context, and a dimensionality reduction based on a stochastic low-rank approximation. Besides being both simple and intuitive, this method also provides an encoding function which can be used to infer unseen words or phrases. This becomes a clear advantage compared to predictive models which must train these new words.

Related collections

Most cited references 4

Record: found
Abstract: not found
Article: not found

Contextual correlates of synonymy

Herbert Rubenstein, John B Goodenough (1965)

0 comments Cited 157 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge.

Thomas Landauer, Susan Dumais (1997)

0 comments Cited 128 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Extracting semantic representations from word co-occurrence statistics: a computational study.

P. Levy, John Bullinaria (2007)

The idea that at least some aspects of word meaning can be induced from patterns of word co-occurrence is becoming increasingly popular. However, there is less agreement about the precise computations involved, and the appropriate tests to distinguish between the various possibilities. It is important that the effect of the relevant design choices and parameter values are understood if psychological models using these methods are to be reliably evaluated and compared. In this article, we present a systematic exploration of the principal computational possibilities for formulating and validating representations of word meanings from word co-occurrence statistics. We find that, once we have identified the best procedures, a very simple approach is surprisingly successful and robust over a range of psychologically relevant evaluation measures.