How can I remove shared values from a list of vect

2019-06-06 01:16发布

问题:

I have a list :

x <- list("a" = c(1:6,32,24) , "b" = c(1:4,8,10,12,13,17,24), 
          "F" = c(1:5,9:15,17,18,19,20,32))
x

$a
[1]  1  2  3  4  5  6 32 24

$b
[1]  1  2  3  4  8 10 12 13 17,24

$F
[1]  1  2  3  4  5  9 10 11 12 13 14 15 17 18 19 20 32

Each vector in the list shares a number of elements with others. How I can remove shared values to get the following result?

 $a
    [1]  1  2  3  4  5  6 32 24

    $b
    [1]  8 10 12 13 17

    $F
    [1]   9  11  14 15 18 19 20

As you can see: the first vector does not change. The shared elements between first and second vectors will be removed from the second vector, and then we will remove the shared elements from third vectors after comparing it with first and second vectors. The target of this task is clustering dataset (the original data set contains 590 objects).

回答1:

x <- list("a" = c(1:6,32,24) , 
          "b" = c(1:4,8,10,12,13,17,24), 
          "F" = c(1:5,9:15,17,18,19,20,32))

This is inefficient since it re-makes the union of the previous set of lists at each step (rather than keeping a running total), but it was the first way I thought of.

for (i in 2:length(x)) {
   ## construct union of all previous lists
   prev <- Reduce(union,x[1:(i-1)])
   ## remove shared elements from the current list
   x[[i]] <- setdiff(x[[i]],prev)
}  

You could probably improve this by initializing prev as numeric(0) and making prev into c(prev,x[i-1]) at each step (although this grows a vector at each step, which is a slow operation). If you don't have a gigantic data set/don't have to do this operation millions of times it's probably good enough.



回答2:

You can use Reduce and setdiff on the list in the reverse order to find all elements of the last vector that do not appear in the others. Bung this into an lapply to run over partial sub-lists to get your desired output:

lapply(seq_along(x), function(y) Reduce(setdiff,rev(x[seq(y)])))
[[1]]
[1]  1  2  3  4  5  6 32 24

[[2]]
[1]  8 10 12 13 17

[[3]]
[1]  9 11 14 15 18 19 20

When scaling up, the number of rev calls may become an issue, so you might want to reverse the list once, outside the lapply as a new variable, and subset that within it.



标签: r list vector