We are given a huge set of points in 2D plane. We need to find, for each point the closest point within the set. For instance suppose the initial set is as follows:
foo <- data.frame(x=c(1,2,4,4,10),y=c(1,2,4,4,10))
The output should be like this:
ClosesPair(foo)
2
1
4
3
3 # (could be 4 also)
Any idea?
The traditional approach is to preprocess the data
and put it in a data structure, often a K-d tree,
for which the "nearest point" query is very fast.
There is an implementation in the nnclust
package.
library(nnclust)
foo <- cbind(x=c(1,2,4,4,10),y=c(1,2,4,4,10))
i <- nnfind(foo)$neighbour
plot(foo)
arrows( foo[,1], foo[,2], foo[i,1], foo[i,2] )
Here is an example; all wrapped into a single function. You might want to split it a bit for optimization.
ClosesPair <- function(foo) {
dist <- function(i, j) {
sqrt((foo[i,1]-foo[j,1])**2 + (foo[i,2]-foo[j,2])**2)
}
foo <- as.matrix(foo)
ClosestPoint <- function(i) {
indices <- 1:nrow(foo)
indices <- indices[-i]
distances <- sapply(indices, dist, i=i, USE.NAMES=TRUE)
closest <- indices[which.min(distances)]
}
sapply(1:nrow(foo), ClosestPoint)
}
ClosesPair(foo)
# [1] 2 1 4 3 3
Of cause, it does not handle ties very well.
Use the package spatstat
. It's got builtin functions to do this sort of stuff.