tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None)
I cannot understand the duty of this function. Is it like a lookup table? Which means to return the parameters corresponding to each id (in ids)?
For instance, in the skip-gram
model if we use tf.nn.embedding_lookup(embeddings, train_inputs)
, then for each train_input
it finds the correspond embedding?
Yes, this function is hard to understand, until you get the point.
In its simplest form, it is similar to
tf.gather
. It returns the elements ofparams
according to the indexes specified byids
.For example (assuming you are inside
tf.InteractiveSession()
)would return
[10 20 30 40]
, because the first element (index 0) of params is10
, the second element of params (index 1) is20
, etc.Similarly,
would return
[20 20 40]
.But
embedding_lookup
is more than that. Theparams
argument can be a list of tensors, rather than a single tensor.In such a case, the indexes, specified in
ids
, correspond to elements of tensors according to a partition strategy, where the default partition strategy is 'mod'.In the 'mod' strategy, index 0 corresponds to the first element of the first tensor in the list. Index 1 corresponds to the first element of the second tensor. Index 2 corresponds to the first element of the third tensor, and so on. Simply index
i
corresponds to the first element of the (i+1)th tensor , for all the indexes0..(n-1)
, assuming params is a list ofn
tensors.Now, index
n
cannot correspond to tensor n+1, because the listparams
contains onlyn
tensors. So indexn
corresponds to the second element of the first tensor. Similarly, indexn+1
corresponds to the second element of the second tensor, etc.So, in the code
index 0 corresponds to the first element of the first tensor: 1
index 1 corresponds to the first element of the second tensor: 10
index 2 corresponds to the second element of the first tensor: 2
index 3 corresponds to the second element of the second tensor: 20
Thus, the result would be:
embedding_lookup
function retrieves rows of theparams
tensor. The behavior is similar to using indexing with arrays in numpy. E.g.params
argument can be also a list of tensors in which case theids
will be distributed among the tensors. For example, given a list of 3 tensors[2, 64]
, the default behavior is that they will representids
:[0, 3]
,[1, 4]
,[2, 5]
.partition_strategy
controls the way how theids
are distributed among the list. The partitioning is useful for larger scale problems when the matrix might be too large to keep in one piece.Another way to look at it is , assume that you flatten out the tensors to one dimensional array, and then you are performing a lookup
(eg) Tensor0=[1,2,3], Tensor1=[4,5,6], Tensor2=[7,8,9]
The flattened out tensor will be as follows [1,4,7,2,5,8,3,6,9]
Now when you do a lookup of [0,3,4,1,7] it will yeild [1,2,5,4,6]
(i,e) if lookup value is 7 for example , and we have 3 tensors (or a tensor with 3 rows) then,
7 / 3 : (Reminder is 1, Quotient is 2) So 2nd element of Tensor1 will be shown, which is 6
Adding to Asher Stern's answer,
params
is interpreted as a partitioning of a large embedding tensor. It can be a single tensor representing the complete embedding tensor, or a list of X tensors all of same shape except for the first dimension, representing sharded embedding tensors.The function
tf.nn.embedding_lookup
is written considering the fact that embedding (params) will be large. Therefore we needpartition_strategy
.Yes, the purpose of
tf.nn.embedding_lookup()
function is to perform a lookup in the embedding matrix and return the embeddings (or in simple terms the vector representation) of words.A simple embedding matrix (of shape:
vocabulary_size x embedding_dimension
) would look like below. (i.e. each word will be represented by a vector of numbers; hence the name word2vec)Embedding Matrix
I split the above embedding matrix and loaded only the words in
vocab
which will be our vocabulary and the corresponding vectors inemb
array.Embedding Lookup in TensorFlow
Now we will see how can we perform embedding lookup for some arbitrary input sentence.
Observe how we got the embeddings from our original embedding matrix (with words) using the indices of words in our vocabulary.
Usually, such an embedding lookup is performed by the first layer (called Embedding layer) which then passes these embeddings to RNN/LSTM/GRU layers for further processing.
Side Note: Usually the vocabulary will also have a special
unk
token. So, if a token from our input sentence is not present in our vocabulary, then the index corresponding tounk
will be looked up in the embedding matrix.P.S. Note that
embedding_dimension
is a hyperparameter that one has to tune for their application but popular models like Word2Vec and GloVe uses300
dimension vector for representing each word.Bonus Reading word2vec skip-gram model
Since I was also intrigued by this function, I'll give my two cents.
The way I see it in the 2D case is just as a matrix multiplication (it's easy to generalize to other dimensions).
Consider a vocabulary with N symbols. Then, you can represent a symbol x as a vector of dimensions Nx1, one-hot-encoded.
But you want a representation of this symbol not as a vector of Nx1, but as one with dimensions Mx1, called y.
So, to transform x into y, you can use and embedding matrix E, with dimensions MxN:
y = E x.
This is essentially what tf.nn.embedding_lookup(params, ids, ...) is doing, with the nuance that ids are just one number that represents the position of the 1 in the one-hot-encoded vector x.