Find indices of vector elements in a list

2019-09-21 04:19发布

问题:

I have this toy character vector:

a = c("a","b","c","d","e","d,e","f")

in which some elements are concatenated with a comma (e.g. "d,e")

and a list that contains the unique elements of that vector, where in case of comma concatenated elements I do not keep their individual components.

So this is the list:

l = list("a","b","c","d,e","f")

I am looking for an efficient way to obtain the indices of the elements of a in the l list. For elements of a that are represented by the comma concatenated elements in l it should return the indices of the these comma concatenated elements in l.

So the output of this function would be:

c(1,2,3,4,4,4,5)

As you can see it returns index 4 for a elements: "d", "e", and "d,e"

回答1:

You could use a strategy with factors. First, find the index for each element in your list with

l <- list("a","b","c","d,e","f")
idxtr <- Map(function(x) unique(c(x, strsplit(x, ",")[[1]])), unlist(l))

This build a list for each item in l along with all possible matches for each element. Then we take the vector a and create a factor with those levels, and then reassign based on the list we just build

a <- c("a","b","c","d","e","d,e","f")
a <- factor(a, levels=unlist(idxtr));
levels(a) <- idxtr
as.numeric(a)
# [1] 1 2 3 4 4 4 5

finally, to get the index, we use as.numeric on the factor



回答2:

I would make your search vector into a set of regular expressions, by substituting the comma with a pipe. Add names to the search vector too, according to its position in the list.

L <- setNames(lapply(l, gsub, pattern = ",", replacement = "|"), seq_along(l))

Then you can do:

lapply(L, function(x) grep(x, a, value = TRUE))
# $`1`
# [1] "a"
# 
# $`2`
# [1] "b"
# 
# $`3`
# [1] "c"
# 
# $`4`
# [1] "d"   "e"   "d,e"
# 
# $`5`
# [1] "f"

The names are important, because you can now use stack to get what you are looking for.

stack(lapply(L, function(x) grep(x, a, value = TRUE)))
#   values ind
# 1      a   1
# 2      b   2
# 3      c   3
# 4      d   4
# 5      e   4
# 6    d,e   4
# 7      f   5


标签: r list vector