我们为什么不包括0的比赛,而计算二进制数之间的Jaccard距离是多少?(Why don't

2019-09-27 08:15发布

我工作的基础上杰卡德距离的程序,我需要计算两个二进制位向量之间的距离捷卡。 我碰到的净如下:

 If p1 = 10111 and p2 = 10011,

 The total number of each combination attributes for p1 and p2:

 M11 = total number of attributes where p1 & p2 have a value 1,
 M01 = total number of attributes where p1 has a value 0 & p2 has a value 1,
 M10 = total number of attributes where p1 has a value 1 & p2 has a value 0,
 M00 = total number of attributes where p1 & p2 have a value 0.
 Jaccard similarity coefficient = J = 
 intersection/union = M11/(M01 + M10 + M11) 
 = 3 / (0 + 1 + 3) = 3/4,

 Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4, 
 Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11)
 = (0 + 1)/(0 + 1 + 3) = 1/4

现在,在计算系数,为什么“M00”不包括在分母? 任何人都可以请解释?

Answer 1:

A和B的提花指数是|A∩B| / |A∪B| = |A∩B| /(| A | + | B | - |A∩B|)。

我们有:|A∩B| = M11,| A | = M11 + M10,| B | = M11 + M01。

所以|A∩B| /(| A | + | B | - |A∩B|)= M11 /(M11 + M10 + + M11 M01 - M11)= M11 /(M10 + M01 + M11)。

这维恩图可​​帮助:



文章来源: Why don't we include 0 matches while calculating jaccard distance between binary numbers?