How to find the closest word to a vector using wor

I have just started using Word2vec and I was wondering how can we find the closest word to a vector suppose. I have this vector which is the average vector for a set of vectors:

array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)

Is there a straight forward way to find the most similar word in my training data to this vector?

Or the only solution is to calculate the cosine similarity between this vector and the vectors of each word in my training data, then select the closest one?

Thanks.

标签： python text-mining data-analysis word2vec

3条回答

闹够了就滚

2楼-- · 2020-05-22 00:52

Don't forget to add empty array with negative words in most_similar function:

import numpy as np
model_word_vector = np.array( my_vector, dtype='f')
topn = 20;
most_similar_words = model.most_similar( [ model_word_vector ], [], topn)

0人赞添加讨论(0) 举报

疯言疯语

3楼-- · 2020-05-22 00:56

For gensim implementation of word2vec there is most_similar() function that lets you find words semantically close to a given word:

>>> model.most_similar(positive=['woman', 'king'], negative=['man'])
[('queen', 0.50882536), ...]

or to it's vector representation:

>>> your_word_vector = array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)
>>> model.most_similar(positive=[your_word_vector], topn=1))

where topn defines the desired number of returned results.

However, my gut feeling is that function does exactly the same that you proposed, i.e. calculates cosine similarity for the given vector and each other vector in the dictionary (which is quite inefficient...)

0人赞添加讨论(0) 举报

孤傲高冷的网名

4楼-- · 2020-05-22 00:58

Alternatively, model.wv.similar_by_vector(vector, topn=10, restrict_vocab=None) is also available in the gensim package.

Find the top-N most similar words by vector.

Parameters:

vector (numpy.array) – Vector from which similarities are to be computed.

topn ({int, False}, optional) – Number of top-N similar words to return. If topn is False, similar_by_vector returns the vector of similarity scores.

restrict_vocab (int, optional) – Optional integer which limits the range of vectors which are searched for most-similar values. For example, restrict_vocab=10000 would only check the first 10000 word vectors in the vocabulary order. (This may be meaningful if you’ve sorted the vocabulary by descending frequency.)

Returns: Sequence of (word, similarity).

Return type: list of (str, float)

0人赞添加讨论(0) 举报

How to find the closest word to a vector using wor

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间