I'm trying to understand what's going on with my calculation of canberra distance. I write my own simple canberra.distance
function, however the results are not consistent with dist
function. I added option na.rm = T
to my function, to be able calculate the sum when there is zero denominator. From ?dist
I understand that they use similar approach: Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.
canberra.distance <- function(a, b){
sum( (abs(a - b)) / (abs(a) + abs(b)), na.rm = T )
}
a <- c(0, 1, 0, 0, 1)
b <- c(1, 0, 1, 0, 1)
canberra.distance(a, b)
> 3
# the result that I expected
dist(rbind(a, b), method = "canberra")
> 3.75
a <- c(0, 1, 0, 0)
b <- c(1, 0, 1, 0)
canberra.distance(a, b)
> 3
# the result that I expected
dist(rbind(a, b), method = "canberra")
> 4
a <- c(0, 1, 0)
b <- c(1, 0, 1)
canberra.distance(a, b)
> 3
dist(rbind(a, b), method = "canberra")
> 3
# now the results are the same
Pairs 0-0 and 1-1 seem to be problematic. In the first case (0-0) both numerator and denominator are equal to zero and this pair should be omitted. In the second case (1-1) numerator is 0 but denominator is not and the term is then also 0 and the sum should not change.
What am I missing here?
EDIT:
To be in line with R definition, function canberra.distance
can be modified as follows:
canberra.distance <- function(a, b){
sum( abs(a - b) / abs(a + b), na.rm = T )
}
However, the results are the same as before.