I couldn't find an answer to this anywhere, so here's my solution.
The question is: how can you calculate a power set in R?
It is possible to do this with the library "sets", with the command 2^as.set(c(1,2,3,4))
, which yields the output {{}, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2,
4}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, {1,
2, 3, 4}}
. However, this uses a recursive algorithm, which is rather slow.
Here's the algorithm I came up with.
It's non-recursive, so it's much faster than some of the other solutions out there (and ~100x faster on my machine than the algorithm in the "sets" package). The speed is still O(2^n).
The conceptual basis for this algorithm is the following:
for each element in the set:
for each subset constructed so far:
new subset = (subset + element)
Here's the R code:
EDIT: here's a somewhat faster version of the same concept; my original algorithm is in the third comment to this post. This one is 30% faster on my machine for a set of length 19.
powerset = function(s){
len = length(s)
l = vector(mode="list",length=2^len) ; l[[1]]=numeric()
counter = 1L
for(x in 1L:length(s)){
for(subset in 1L:counter){
counter=counter+1L
l[[counter]] = c(l[[subset]],s[x])
}
}
return(l)
}
This version saves time by initiating the vector with its final length at the start and keeping track with the "counter" variable of the position at which to save new subsets. It's also possible to calculate the position analytically, but that was slightly slower.
A subset can be seen as a boolean vector, indicating whether an element is in the subset of not.
Those boolean vectors can be seen as numbers written in binary.
Enumerating all the subsets of 1:n
is therefore equivalent to enumerating the numbers from 0
to 2^n-1
.
f <- function(set) {
n <- length(set)
masks <- 2^(1:n-1)
lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
}
f(LETTERS[1:4])
There is a function powerset
in the package HapEstXXR
which seems to be faster than your function and the function in the other answer. See below (your function is called your.powerset
)
require('microbenchmark')
require('HapEstXXR')
microbenchmark(powerset(LETTERS[1:10]), f(LETTERS[1:10]), your.powerset(LETTERS[1:10]), times=100)
Unit: microseconds
expr min lq median uq max neval
powerset(LETTERS[1:10]) 314.845 388.4225 594.2445 686.6455 857.513 100
f(LETTERS[1:10]) 7144.132 7937.6040 8222.1330 8568.5120 17805.335 100
your.powerset(LETTERS[1:10]) 5016.981 5564.2880 5841.9810 6025.0690 29138.763 100
Since powerset
seems to be very fast you might want to look at the code of the function in the HapEstXXR
package.
Here's another simple approach that seems to perform reasonably well for small sets. There are obvious memory issues with this method as the data cardinality increases.
getPowSet <- function(set) {
n <- length(set)
keepBool <- sapply(2^(1:n - 1), function(k)
rep(c(FALSE, TRUE), each=k, times=(2^n / (2*k))))
lapply(1:2^n, function(j) set[keepBool[j, ]])
}
Speed comparison for n=10
:
microbenchmark(powerset(LETTERS[1:10]), f(LETTERS[1:10]), getPowSet(LETTERS[1:10]))
Unit: milliseconds
expr min lq mean median uq max neval
powerset(LETTERS[1:10]) 2.466167 2.551928 2.656964 2.581211 2.637358 3.676877 100
f(LETTERS[1:10]) 2.923339 3.029928 3.115222 3.104413 3.175931 4.033136 100
getPowSet(LETTERS[1:10]) 2.415290 2.490522 2.574131 2.547115 2.617198 3.664040 100
But then for n=15
the original function seems to perform better:
microbenchmark(powerset(LETTERS[1:15]), f(LETTERS[1:15]), getPowSet(LETTERS[1:15]))
Unit: milliseconds
expr min lq mean median uq max neval
powerset(LETTERS[1:15]) 81.48276 88.50272 94.88927 91.61366 94.8262 174.0636 100
f(LETTERS[1:15]) 110.86208 118.08736 124.38501 122.35872 126.7830 189.3131 100
getPowSet(LETTERS[1:15]) 86.16286 93.32314 98.14936 96.85443 100.6075 159.1030 100
The below should create a power set, minus the empty set element.
powerset <- function(x) {
sets <- lapply(1:(length(x)), function(i) combn(x, i, simplify = F))
unlist(sets, recursive = F)
}