I have this toy character vector:
a = c("a","b","c","d","e","d,e","f")
in which some elements are concatenated with a comma (e.g. "d,e")
and a list that contains the unique elements of that vector, where in case of comma concatenated elements I do not keep their individual components.
So this is the list:
l = list("a","b","c","d,e","f")
I am looking for an efficient way to obtain the indices of the elements of a
in the l
list. For elements of a
that are represented by the comma concatenated elements in l
it should return the indices of the these comma concatenated elements in l
.
So the output of this function would be:
c(1,2,3,4,4,4,5)
As you can see it returns index 4 for a
elements: "d", "e", and "d,e"
You could use a strategy with factors. First, find the index for each element in your list with
l <- list("a","b","c","d,e","f")
idxtr <- Map(function(x) unique(c(x, strsplit(x, ",")[[1]])), unlist(l))
This build a list for each item in l
along with all possible matches for each element. Then we take the vector a
and create a factor with those levels, and then reassign based on the list we just build
a <- c("a","b","c","d","e","d,e","f")
a <- factor(a, levels=unlist(idxtr));
levels(a) <- idxtr
as.numeric(a)
# [1] 1 2 3 4 4 4 5
finally, to get the index, we use as.numeric on the factor
I would make your search vector into a set of regular expressions, by substituting the comma with a pipe. Add names
to the search vector too, according to its position in the list
.
L <- setNames(lapply(l, gsub, pattern = ",", replacement = "|"), seq_along(l))
Then you can do:
lapply(L, function(x) grep(x, a, value = TRUE))
# $`1`
# [1] "a"
#
# $`2`
# [1] "b"
#
# $`3`
# [1] "c"
#
# $`4`
# [1] "d" "e" "d,e"
#
# $`5`
# [1] "f"
The names
are important, because you can now use stack
to get what you are looking for.
stack(lapply(L, function(x) grep(x, a, value = TRUE)))
# values ind
# 1 a 1
# 2 b 2
# 3 c 3
# 4 d 4
# 5 e 4
# 6 d,e 4
# 7 f 5