I have just started using Word2vec and I was wondering how can we find the closest word to a vector suppose. I have this vector which is the average vector for a set of vectors:
array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)
Is there a straight forward way to find the most similar word in my training data to this vector?
Or the only solution is to calculate the cosine similarity between this vector and the vectors of each word in my training data, then select the closest one?
Thanks.
Don't forget to add empty array with negative words in most_similar function:
For gensim implementation of word2vec there is
most_similar()
function that lets you find words semantically close to a given word:or to it's vector representation:
where
topn
defines the desired number of returned results.However, my gut feeling is that function does exactly the same that you proposed, i.e. calculates cosine similarity for the given vector and each other vector in the dictionary (which is quite inefficient...)
Alternatively, model.wv.similar_by_vector(vector, topn=10, restrict_vocab=None) is also available in the
gensim
package.