I have two sets of wordnet synsets (contained in two separate list objects, s1 and s2), from which I want to find the maximum path similarity score for each synset in s1 onto s2 with the length of output equal that of s1. For example, if s1 contains 4 synsets, then the length of output should be 4.
I have experimented with the following code (so far):
import numpy as np
import nltk
from nltk.corpus import wordnet as wn
import pandas as pd
#two wordnet synsets (s1, s2)
s1 = [wn.synset('be.v.01'),
wn.synset('angstrom.n.01'),
wn.synset('trial.n.02'),
wn.synset('function.n.01')]
s2 = [wn.synset('use.n.01'),
wn.synset('function.n.01'),
wn.synset('check.n.01'),
wn.synset('code.n.01'),
wn.synset('inch.n.01'),
wn.synset('be.v.01'),
wn.synset('correct.v.01')]
# define a function to find the highest path similarity score for each synset in s1 onto s2, with the length of output equal that of s1
ps_list = []
def similarity_score(s1, s2):
for word1 in s1:
best = max(wn.path_similarity(word1, word2) for word2 in s2)
ps_list.append(best)
return ps_list
ps_list(s1, s2)
But it returns this following error message
'>' not supported between instances of 'NoneType' and 'float'
I couldn't figure out what's going on with code. Would anyone care to take a look at my code and share his/her insights on the for loop? It will be really appreciated.
Thank you.
The full error traceback is here
TypeError Traceback (most recent call last)
<ipython-input-73-4506121e17dc> in <module>()
38 return word_list
39
---> 40 s = similarity_score(s1, s2)
41
42
<ipython-input-73-4506121e17dc> in similarity_score(s1, s2)
33 def similarity_score(s1, s2):
34 for word1 in s1:
---> 35 best = max(wn.path_similarity(word1, word2) for word2 in s2)
36 word_list.append(best)
37
TypeError: '>' not supported between instances of 'NoneType' and 'float'
[edit] I came up with this temporary solution:
s_list = []
for word1 in s1:
best = [word1.path_similarity(word2) for word2 in s2]
b = pd.Series(best).max()
s_list.append(b)
It's not elegant but it works. Wonder if anyone have better solutions or handy tricks to handle this?