I am wanting to implement an error-weighted euclidean distance function in R (similar but not quite the same as Kumar & Patel. 2005. Rutcor Research Report. RRR 12-2005). I already know that I want to use pr_DB within proxy for this. Here is where I run into problems. The first thing I always do when looking at a new package is cut-and-paste the examples into R to see if they do what I expect them to do. This is not what happened in this case. I started with the simple included example, to wit:
mydist <- function(x,y) x * y
pr_DB$set_entry(FUN = mydist, names = c("test", "mydist"))
Okay, so far, so good. I figure, if it's going to work just like a normal "dist" type function, I'll toss in a very simple piece of data.
>toydat
x y
a -0.12817993 -1.03238513
b 1.56200731 0.93826937
c -1.24051847 -1.31005852
d -1.12892553 -1.57133401
e -1.10098308 0.06577006
Just a little toy set, of course. SOOOOOO, I try out the following:
toydist <- dist(toydat,method="mydist")
I get the following message:
Error in do.call(".External", c(list(CFUN, x, y, pairwise, if (!is.function(method)) get(method) else method), :
not a scalar return value
On a hunch, I tried out:
toydist <- dist(toydat,method="Euclidean")
toydist <- dist(toydat,method="Manhatten")
and others. They all work as expected. I am presuming that there is something special that must be done with the basic formula for it to work in a proper manner for computing distance matrices. What I want to compute is sqrt((xi - xj)2) + (yi - yj)2) + ...(ni - nj)2) ÷ sqrt((σxi2 + σxj2) + (σyi2 + σyj2) + ...(σni2 + σnj2)), in every pairwise combination for my data set.
I realize I could do this by spreadsheet for a single data set, but I want to do some clustering bootstrapping, which I know how to do in R. Not practical with a spreadsheet.
Now that the requisite time has passed, I can close this out formally. The function I came up with is thus: