Answer a question

I am working on a recurrent language model. To learn word embeddings that can be used to initialize my language model, I am using gensim's word2vec model. After training, the word2vec model holds two vectors for each word in the vocabulary: the word embedding (rows of input/hidden matrix) and the context embedding (columns of hidden/output matrix).

As outlined in this post there are at least three common ways to combine these two embedding vectors:

  1. summing the context and word vector for each word
  2. summing & averaging
  3. concatenating the context and word vector

However, I couldn't find proper papers or reports on the best strategy. So my questions are:

  1. Is there a common solution whether to sum, average or concatenate the vectors?
  2. Or does the best way depend entirely on the task in question? If so, what strategy is best for a word-level language model?
  3. Why combine the vectors at all? Why not use the "original" word embeddings for each word, i.e. those contained in the weight matrix between input and hidden neurons.

Related (but unanswered) questions:

  • word2vec: Summing/concatenate inside and outside vector
  • why we use input-hidden weight matrix to be the word vectors instead of hidden-output weight matrix?

Answers

I have found an answer in the Stanford lecture "Deep Learning for Natural Language Processing" (Lecture 2, March 2016). It's available here. In minute 46 Richard Socher states that the common way is to average the two word vectors.

Logo

学AI,认准AI Studio!GPU算力,限时免费领,邀请好友解锁更多惊喜福利 >>>

更多推荐