I spent a while the other day looking for a way to check if a row vector is contained in some set of row vectors in R. Basically, I want to generalize the %in%
operator to match a tuple instead of each entry in a vector. For example, I want:
row.vec = c("A", 3)
row.vec
# [1] "A" "3"
data.set = rbind(c("A",1),c("B",3),c("C",2))
data.set
# [,1] [,2]
# [1,] "A" "1"
# [2,] "B" "3"
# [3,] "C" "2"
row.vec %tuple.in% data.set
# [1] FALSE
for my made-up operator %tuple.in%
because the row vector c("A",3)
is not a row vector in data.set. Using the %in%
operator gives:
row.vec %in% data.set
# [1] TRUE TRUE
because "A" and 3 are in data.set
, which is not what I want.
I have two questions. First, are there any good existing solutions to this?
Second, since I couldn't find them (even if they exist), I tried to write my own function to do it. It works for an input matrix of row vectors, but I'm wondering if any experts have proposed improvements:
is.tuple.in <- function(matrix1, matrix2){
# Apply rbind() so that matrix1 has columns even if it is a row vector.
matrix1 = rbind(matrix1)
if(ncol(matrix1) != ncol(matrix2)){
stop("Matrices must have the same number of columns.") }
# Now check for the first row and handle other rows recursively
row.vec = matrix1[1,]
tuple.found = FALSE
for(i in 1:nrow(matrix2)){
# If we find a match, then this row exists in matrix 2 and we can break the loop
if(all(row.vec == matrix2[i,])){
tuple.found = TRUE
break
}
}
# If there are more rows to be checked, use a recursive call
if(nrow(matrix1) > 1){
return(c(tuple.found, is.tuple.in(matrix1[2:nrow(matrix1),],matrix2)))
} else {
return(tuple.found)
}
}
I see a couple problems with that that I'm not sure how to fix. First, I'd like the base case to be clear at the start of the function. I didn't manage to do this because I pass matrix1[2:nrow(matrix1),]
in the recursive call, which produces an error if matrix1
has one row. So instead of getting to a case where matrix1
is empty, I have an if condition at the end deciding if more iterations are necessary.
Second, I think the use of rbind()
at the start is sloppy, but I needed it for when matrix1
had been reduced to a single row. Without using rbind()
, ncol(matrix1)
produced an error in the 1-row case. I figure my trouble here has to do with a lack of knowledge about R data types.
Any help would be appreciated.