Iterate one list of synsets over another

2019-08-24 11:11发布

问题:

I have two sets of wordnet synsets (contained in two separate list objects, s1 and s2), from which I want to find the maximum path similarity score for each synset in s1 onto s2 with the length of output equal that of s1. For example, if s1 contains 4 synsets, then the length of output should be 4.

I have experimented with the following code (so far):

import numpy as np
import nltk
from nltk.corpus import wordnet as wn
import pandas as pd

#two wordnet synsets (s1, s2)

s1 = [wn.synset('be.v.01'),
 wn.synset('angstrom.n.01'),
 wn.synset('trial.n.02'),
 wn.synset('function.n.01')]

s2 = [wn.synset('use.n.01'),
 wn.synset('function.n.01'),
 wn.synset('check.n.01'),
 wn.synset('code.n.01'),
 wn.synset('inch.n.01'),
 wn.synset('be.v.01'),
 wn.synset('correct.v.01')]
 
# define a function to find the highest path similarity score for each synset in s1 onto s2, with the length of output equal that of s1

ps_list = []
def similarity_score(s1, s2):
    for word1 in s1:
        best = max(wn.path_similarity(word1, word2) for word2 in s2)
        ps_list.append(best)
    return ps_list

ps_list(s1, s2)

But it returns this following error message

'>' not supported between instances of 'NoneType' and 'float'

I couldn't figure out what's going on with code. Would anyone care to take a look at my code and share his/her insights on the for loop? It will be really appreciated.

Thank you.

The full error traceback is here

TypeError                                 Traceback (most recent call last)
<ipython-input-73-4506121e17dc> in <module>()
     38     return word_list
     39 
---> 40 s = similarity_score(s1, s2)
     41 
     42 

<ipython-input-73-4506121e17dc> in similarity_score(s1, s2)
     33 def similarity_score(s1, s2):
     34     for word1 in s1:
---> 35         best = max(wn.path_similarity(word1, word2) for word2 in s2)
     36         word_list.append(best)
     37 

TypeError: '>' not supported between instances of 'NoneType' and 'float'

[edit] I came up with this temporary solution:

s_list = []
for word1 in s1:
    best = [word1.path_similarity(word2) for word2 in s2]
    b = pd.Series(best).max()
    s_list.append(b)

It's not elegant but it works. Wonder if anyone have better solutions or handy tricks to handle this?

回答1:

I have no experience with the nltk module, but from reading the docs I can see that path_similarity is a method of whatever object wn.synset(args) returns. You are instead treating it as a function.

What you should be doing, is something like this:

ps_list = []
for word1 in s1:
    best = max(word1.path_similarity(word2) for word2 in s2) #path_similarity is a method of each synset
    ps_list.append(best)


回答2:

I think the error comes from the following:

best = max(wn.path_similarity(word1, word2) for word2 in s2)

you should add a condition if wn.path_similarity(word1, word2) is NoneType, then you cannot do max() , for instance you can re-write like this:

best = max([word1.path_similarity(word2) for word2 in s2 if word1.path_similarity(word2) is not None])