Beginner NLP Question here:
How does the .similiarity method work?
Wow spaCy is great! Its tfidf model could be easier to preprocess, but w2v with only one line of code (token.vector)?! - Awesome!
In his 10 line tutorial on spaCy andrazhribernik show's us the .similarity method that can be run on tokens, sents, word chunks, and docs.
After nlp = spacy.load('en')
and doc = nlp(raw_text)
we can do .similarity queries between tokens and chunks.
However, what is being calculated behind the scenes in this .similarity
method?
SpaCy already has the incredibly simple .vector
, which computes the w2v vector as trained from the GloVe model (how cool would a .tfidf
or .fasttext
method be?).
Is the model similarity model simply computing the cosine similarity between these two w2v-GloVe-vectors or doing something else? The specifics aren't clear in the documentation; any help appreciated!