I have trained a word2vec model in tensorflow. But when I save the session, it only outputted model.ckpt.data
/ .index
/ .meta
files.
I was thinking of implementing KNN method in retrieving nearest words. I saw answers of using gensim, but how can I save my tensorflow word2vec model into .txt
first?
Simply evaluate the embeddings matrix into a numpy array and write it to the file along with resolved words. Sample code:
vocabulary_size = 50000
embedding_size = 128
# Assume your word to index map
word_to_idx = { ... }
# Assume your embeddings variable
embeddings = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size],0,1))
with tf.Session() as sess:
embeddings_val = sess.run(embeddings)
with open('embeddings.txt', 'w') as file_:
for i in range(vocabulary_size):
embed = embeddings_val[i, :]
word = word_to_idx[i]
file_.write('%s %s\n' % (word, ' '.join(map(str, embed))))
I just had the same problem and tried Maxim's solution.
You need to replace the line:
word = word_to_idx[i]
with
word = idx_to_word[i]
You can simply reverse the word_to_idx-dictionary with the following code:
idx_to_word = dict(zip(word_to_idx.values(), word_to_idx.keys()))
Except for that, his solution works fine.