可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Say I have 5 vectors:
a <- c(1,2,3)
b <- c(2,3,4)
c <- c(1,2,5,8)
d <- c(2,3,4,6)
e <- c(2,7,8,9)
I know I can calculate the intersection between all of them by using Reduce()
together with intersect()
, like this:
Reduce(intersect, list(a, b, c, d, e))
[1] 2
But how can I find elements that are common in, say, at least 2 vectors? i.e.:
[1] 1 2 3 4 8
回答1:
It is much simpler than a lot of people are making it look. This should be very efficient.
Put everything into a vector:
x <- unlist(list(a, b, c, d, e))
Look for duplicates
unique(x[duplicated(x)])
# [1] 2 3 1 4 8
and sort
if needed.
Note: In case there can be duplicates within a list element (which your example does not seem to implicate), then replace x
with x <- unlist(lapply(list(a, b, c, d, e), unique))
Edit: as the OP has expressed interest in a more general solution where n >= 2, I would do:
which(tabulate(x) >= n)
if the data is only made of natural integers (1, 2, etc.) as in the example. If not:
f <- table(x)
names(f)[f >= n]
This is now not too far from James solution but it avoids the costly-ish sort
. And it is miles faster than computing all possible combinations.
回答2:
You could try all possible combinations, for example:
## create a list
l <- list(a, b, c, d)
## get combinations
cbn <- combn(1:length(l), 2)
## Intersect them
unique(unlist(apply(cbn, 2, function(x) intersect(l[[x[1]]], l[[x[2]]]))))
## 2 3 1 4
回答3:
Here's another option:
# For each vector, get a vector of values without duplicates
deduplicated_vectors <- lapply(list(a,b,c,d,e), unique)
# Flatten the lists, then sort and use rle to determine how many
# lists each value appears in
rl <- rle(sort(unlist(deduplicated_vectors)))
# Get the values that appear in two or more lists
rl$values[rl$lengths >= 2]
回答4:
This is an approach that counts the number of vectors each unique value occurs in.
unique_vals <- unique(c(a, b, c, d, e))
setNames(rowSums(!!(sapply(list(a, b, c, d, e), match, x = unique_vals)),
na.rm = TRUE), unique_vals)
# 1 2 3 4 5 8 6 7 9
# 2 5 3 2 1 2 1 1 1
回答5:
A variation of @rengis method would be:
unique(unlist(Map(`intersect`, cbn[1,], cbn[2,])))
#[1] 2 3 1 4 8
where,
l <- mget(letters[1:5])
cbn <- combn(l,2)
回答6:
Yet another approach, applying a vectorised function with outer
:
L <- list(a, b, c, d, e)
f <- function(x, y) intersect(x, y)
fv <- Vectorize(f, list("x","y"))
o <- outer(L, L, fv)
table(unlist(o[upper.tri(o)]))
# 1 2 3 4 8
# 1 10 3 1 1
The output above gives the number of pairs of vectors that share each of the duplicated elements 1, 2, 3, 4, and 8.