I am trying to calculate semantic similarity between two words. I am using Wordnet-based similarity measures i.e Resnik measure(RES), Lin measure(LIN), Jiang and Conrath measure(JNC) and Banerjee and Pederson measure(BNP).
To do that, I am using nltk and Wordnet 3.0. Next, I want to combine the similarity values obtained from different measure. To do that i need to normalize the similarity values as some measure give values between 0 and 1, while others give values greater than 1.
So, my question is how do I normalize the similarity values obtained from different measures.
Extra detail on what I am actually trying to do: I have a set of words. I calculate pairwise similarity between the words. and remove the words that are not strongly correlated with other words in the set.
How to normalize a single measure
Let's consider a single arbitrary similarity measure
M
and take an arbitrary wordw
.Define
m = M(w,w)
. Then m takes maximum possible value ofM
.Let's define
MN
as a normalized measureM
.For any two words
w, u
you can computeMN(w, u) = M(w, u) / m
.It's easy to see that if
M
takes non-negative values, thenMN
takes values in[0, 1]
.How to normalize a measure combined from many measures
In order to compute your own defined measure
F
combined of k different measuresm_1, m_2, ..., m_k
first normalize independently eachm_i
using above method and then define:such that
alpha_i
denotes the weight of i-th measure.All alphas must sum up to 1, i.e:
Then to compute your own measure for
w, u
you do:It's clear that
F
takes values in [0,1]