I am working on a program based on Jaccard Distance, and I need to calculate the Jaccard Distance between two binary bit vectors. I came across the following on the net:
If p1 = 10111 and p2 = 10011,
The total number of each combination attributes for p1 and p2:
M11 = total number of attributes where p1 & p2 have a value 1,
M01 = total number of attributes where p1 has a value 0 & p2 has a value 1,
M10 = total number of attributes where p1 has a value 1 & p2 has a value 0,
M00 = total number of attributes where p1 & p2 have a value 0.
Jaccard similarity coefficient = J =
intersection/union = M11/(M01 + M10 + M11)
= 3 / (0 + 1 + 3) = 3/4,
Jaccard distance = J' = 1 - J = 1 - 3/4 = 1/4,
Or J' = 1 - (M11/(M01 + M10 + M11)) = (M01 + M10)/(M01 + M10 + M11)
= (0 + 1)/(0 + 1 + 3) = 1/4
Now, while calculating the coefficient, why was "M00" not included in the denominator? Can anyone please explain?
The Jacquard index of A and B is |A∩B|/|A∪B| = |A∩B|/(|A| + |B| - |A∩B|).
We have: |A∩B| = M11, |A| = M11 + M10, |B| = M11 + M01.
So |A∩B|/(|A| + |B| - |A∩B|) = M11 / (M11 + M10 + M11 + M01 - M11) = M11 / (M10 + M01 + M11).
This Venn diagram may help: