R: find nearest index

2019-01-19 03:04发布

问题:

I have two vectors with a few thousand points, but generalized here:

A <- c(10, 20, 30, 40, 50)
b <- c(13, 17, 20)

How can I get the indicies of A that are nearest to b? The expected outcome would be c(1, 2, 2).

I know that findInterval can only find the first occurrence, and not the nearest, and I'm aware that which.min(abs(b[2] - A)) is getting warmer, but I can't figure out how to vectorize it to work with long vectors of both A and b.

回答1:

You can just put your code in a sapply. I think this has the same speed as a for loop so isn't technically vectorized though:

sapply(b,function(x)which.min(abs(x - A)))


回答2:

FindInterval gets you very close. You just have to pick between the offset it returns and the next one:

#returns the nearest occurence of x in vec
nearest.vec <- function(x, vec)
{
    smallCandidate <- findInterval(x, vec, all.inside=TRUE)
    largeCandidate <- smallCandidate + 1
    #nudge is TRUE if large candidate is nearer, FALSE otherwise
    nudge <- 2 * x > vec[smallCandidate] + vec[largeCandidate]
    return(smallCandidate + nudge)
}

nearest.vec(b,A)

returns (1,2,2), and should comparable to FindInterval in performance.



回答3:

Here's a solution that uses R's often overlooked outer function. Not sure if it'll perform better, but it does avoid sapply.

A <- c(10, 20, 30, 40, 50)
b <- c(13, 17, 20)

dist <- abs(outer(A, b, '-'))
result <- apply(dist, 2, which.min)

# [1] 1 2 2