I have a list :
x <- list("a" = c(1:6,32,24) , "b" = c(1:4,8,10,12,13,17,24),
"F" = c(1:5,9:15,17,18,19,20,32))
x
$a
[1] 1 2 3 4 5 6 32 24
$b
[1] 1 2 3 4 8 10 12 13 17,24
$F
[1] 1 2 3 4 5 9 10 11 12 13 14 15 17 18 19 20 32
Each vector in the list shares a number of elements with others. How I can remove shared values to get the following result?
$a
[1] 1 2 3 4 5 6 32 24
$b
[1] 8 10 12 13 17
$F
[1] 9 11 14 15 18 19 20
As you can see: the first vector does not change. The shared elements between first and second vectors will be removed from the second vector, and then we will remove the shared elements from third vectors after comparing it with first and second vectors. The target of this task is clustering dataset (the original data set contains 590 objects).
x <- list("a" = c(1:6,32,24) ,
"b" = c(1:4,8,10,12,13,17,24),
"F" = c(1:5,9:15,17,18,19,20,32))
This is inefficient since it re-makes the union
of the previous set of lists at each step (rather than
keeping a running total), but it was the
first way I thought of.
for (i in 2:length(x)) {
## construct union of all previous lists
prev <- Reduce(union,x[1:(i-1)])
## remove shared elements from the current list
x[[i]] <- setdiff(x[[i]],prev)
}
You could probably improve this by initializing prev
as numeric(0)
and making prev
into c(prev,x[i-1])
at each step (although this grows a vector at each step, which is a slow operation). If you don't have a gigantic data set/don't have to do this operation millions of times it's probably good enough.
You can use Reduce
and setdiff
on the list in the reverse order to find all elements of the last vector that do not appear in the others. Bung this into an lapply
to run over partial sub-lists to get your desired output:
lapply(seq_along(x), function(y) Reduce(setdiff,rev(x[seq(y)])))
[[1]]
[1] 1 2 3 4 5 6 32 24
[[2]]
[1] 8 10 12 13 17
[[3]]
[1] 9 11 14 15 18 19 20
When scaling up, the number of rev
calls may become an issue, so you might want to reverse the list once, outside the lapply
as a new variable, and subset that within it.