How to efficiently find common elements of two vectors with duplicate elements?
Example:
v1 <- c(1, 1, 2, 3, 3, 4)
v2 <- c(1, 1, 1, 3, 4, 5)
commonElements <- c(1, 1, 3, 4)
intersect
doesn't handle duplicate elements well.
How to efficiently find common elements of two vectors with duplicate elements?
Example:
v1 <- c(1, 1, 2, 3, 3, 4)
v2 <- c(1, 1, 1, 3, 4, 5)
commonElements <- c(1, 1, 3, 4)
intersect
doesn't handle duplicate elements well.
I like intersect
and table
s, so...
tv1 <- table(v1)
tv2 <- table(v2)
comvals <- intersect(names(tv1),names(tv2))
comtab <- apply(rbind(tv1[comvals],tv2[comvals]),2,min)
The information is still there, but in (what I view as) a nicer format:
> comtab
1 3 4
2 1 1
EDIT: If you really want that vector, though, it's: as.numeric(rep(names(comtab),comtab))
.
Here is another option:
common <- function(v1, v2) {
lvls <- unique(c(v1, v2))
v1a <- factor(v1, levels=lvls)
v2a <- factor(v2, levels=lvls)
v <- pmin(table(v1a), table(v2a))
as.numeric(rep(names(v), v))
}
common(rep(1:3, 1:3), rep(1:2, 1:2))
[1] 1 2 2
common(rep(c(1,3,5), 1:3), rep(c(5,2), 2))
[1] 5 5
EDIT: wrap a function, demonstrate different cases and speed up per @Dason's comment
I'm sure there are many ways to do this but I opted to sort it and use rle
to get the values and counts. table
could probably accomplish the same task as well.
common <- function(v1, v2){
r1 <- rle(sort(v1))
r2 <- rle(sort(v2))
vals <- intersect(r1$values, r2$values)
l1 <- r1$lengths[r1$values %in% vals]
l2 <- r2$lengths[r2$values %in% vals]
rep(vals, pmin(l1, l2))
}
common(v1, v2)
some examples
> common(v1, v2)
[1] 1 1 3 4
> common(c(1,1), c(3,2,1,3,1))
[1] 1 1
> common(c(1,2,3,2), c(1,2,3))
[1] 1 2 3