Compare two vectors of numbers based on threshold

2019-08-13 22:12发布

问题:

I have two vectors g and h. I want to compare the numbers in these two vectors and find out whether there are any common elements between them. But the common elements do not have to be exactly the same and can be within a range of (-0.5, +0.5). Therefore, g±0.5 is being compared with h±0.5.

g <- c(0.5, 5956.3, 38, 22.666, 590.3, 21.992, 9.3)
h <- c(0.7, 99.2, 39, 30, 21.68, 9.4, 22.333, 0.001, 0.000222, 9.999)

As an example, in the two vectors above, 0.5 from g and 0.7 from h match because they are in the vicinity of ±0.5 from each other. 9.4 and 9.3 also match. And furthermore, 22.666 and 22.333 also match, because their difference is also in the range (-0.5, +0.5).

It is important to note that EVERY element of g should be compared to EVERY element of h.

Is there a function to do this in R?

all.equal function unfortunately only compares each element from one vector to the element with the same index from another vector and thus expects equal length from the vectors. What I want to do is that I want to compare each element of vector g with each element of vector h.

回答1:

You can use outer to subtract all by all and condition those differences (the absolute value of them) to be less than or equal to 0.5, i.e.

m1 <- which(abs(outer(g, h, `-`)) <= 0.5, arr.ind = TRUE)

which gives,

     row col   #where row = g and col = h
[1,]   1   1
[2,]   6   5
[3,]   7   6
[4,]   4   7
[5,]   6   7
[6,]   1   8
[7,]   1   9

You can play around to get desired output (you did not specify how you want it). Here is one way,

cbind(g = g[m1[,1]], h = h[m1[,2]])

#            g        h
#    [1,]  0.500  0.700000
#    [2,] 21.992 21.680000
#    [3,]  9.300  9.400000
#    [4,] 22.666 22.333000
#    [5,] 21.992 22.333000
#    [6,]  0.500  0.001000
#    [7,]  0.500  0.000222


回答2:

lapply(g, function(x) abs(x - h) < 1.0)

This returns a list of vectors comparing each element of g with every element of h according to your tolerance of 1.0.



回答3:

Try this code:

comb<-expand.grid(g, h)
colnames(comb)<-c("g","h")

comb[abs(comb[,1]-comb[,2])<1,]
        g         h
1   0.500  0.700000
32 22.666 21.680000
34 21.992 21.680000
42  9.300  9.400000
46 22.666 22.333000
48 21.992 22.333000
50  0.500  0.001000
57  0.500  0.000222
70  9.300  9.999000