I have two vectors with a few thousand points, but generalized here:
A <- c(10, 20, 30, 40, 50)
b <- c(13, 17, 20)
How can I get the indicies of A
that are nearest to b
? The expected outcome would be c(1, 2, 2)
.
I know that findInterval
can only find the first occurrence, and not the nearest, and I'm aware that which.min(abs(b[2] - A))
is getting warmer, but I can't figure out how to vectorize it to work with long vectors of both A
and b
.
You can just put your code in a sapply. I think this has the same speed as a for loop so isn't technically vectorized though:
sapply(b,function(x)which.min(abs(x - A)))
FindInterval gets you very close. You just have to pick between the offset it returns and the next one:
#returns the nearest occurence of x in vec
nearest.vec <- function(x, vec)
{
smallCandidate <- findInterval(x, vec, all.inside=TRUE)
largeCandidate <- smallCandidate + 1
#nudge is TRUE if large candidate is nearer, FALSE otherwise
nudge <- 2 * x > vec[smallCandidate] + vec[largeCandidate]
return(smallCandidate + nudge)
}
nearest.vec(b,A)
returns (1,2,2), and should comparable to FindInterval in performance.
Here's a solution that uses R's often overlooked outer
function. Not sure if it'll perform better, but it does avoid sapply
.
A <- c(10, 20, 30, 40, 50)
b <- c(13, 17, 20)
dist <- abs(outer(A, b, '-'))
result <- apply(dist, 2, which.min)
# [1] 1 2 2