In this thread, I am trying to include all the commonly asked questions and their answers here. I hope this will be useful for someone.
General question: How to generate sequences of r
objects from n
objects?
- combination vs permutation.
- with replacement vs without replacement.
- distinct items vs non-distinct items (multisets).
There are in total 2^3=8
questions of this type.
[Update]
Josh O'Brien suggests that the 8 questions are related to twelvefold way. Indeed, the "distinct" questions are included in twelvefold way, while the "non-distinct" questions are not included. Anyway, it is interesting to compare the 8 questions here with twelvefold way. See the comments for further readings.
EDIT: I have updated the answer to use a more efficient package
arrangements
Getting start of using
arrangement
arrangements contains some efficient generators and iterators for permutations and combinations. It has been demonstrated that
arrangements
outperforms most of the existing packages of similar kind. Some benchmarks could be found here.Here are the answers to the above questions
Compare to other packages
There are few advantages of using
arrangements
over the existing packages.Integral framework: you don't have to use different packages for different methods.
It is very efficient. See https://randy3k.github.io/arrangements/articles/benchmark.html for some benchmarks.
It is memory efficient, it is able to generate all 13! permutation of 1 to 13, existing packages will fail to do so because of the limitation of matrix size. The
getnext()
method of the iterators allow users to get the arrangements one by one.The generated arrangements are in dictionary order which may be desired for some users.
A Walk Through a Slice of Combinatorics in R*
Below, we examine packages equipped with the capabilities of generating combinations & permutations. If I have left out any package, please forgive me and please leave a comment or better yet, edit this post.
Outline of analysis:
Before we begin, we note that combinations/permutations with replacement of distinct vs. non-distint items chosen m at a time are equivalent. This is so, because when we have replacement, it is not specific. Thus, no matter how many times a particular element originally occurs, the output will have an instance(s) of that element repeated 1 to m times.
1. Introduction
gtools
v 3.8.1combinat
v 0.0-8multicool
v 0.1-10partitions
v 1.9-19RcppAlgos
v 2.0.1 (I am the author)arrangements
v 1.1.0gRbase
v 1.8-3I did not include
permute
,permutations
, orgRbase::aperm/ar_perm
as they are not really meant to attack these types of problems.|--------------------------------------- OVERVIEW ----------------------------------------|
The tasks,
m at a time
andgeneral vector
, refer to the capability of generating results "m at a time" (when m is less than the length of the vector) and rearranging a "general vector" as opposed to1:n
. In practice, we are generally concerned with finding rearrangements of a general vector, therefore all examinations below will reflect this (when possible).All benchmarks were ran on 3 different set-ups.
The listed results were obtained from setup #1 (i.e. MBPro). The results for the other two systems were similar. Also,
gc()
is periodically called to ensure all memory is available (See?gc
).2. Combinations
First, we examine combinations without replacement chosen m at a time.
RcppAlgos
combinat
(orutils
)gtools
arrangements
gRbase
How to:
Benchmark:
Now, we examine combinations with replacement chosen m at a time.
RcppAlgos
gtools
arrangements
How to:
Benchmark:
3. Permutations
First, we examine permutations without replacement chosen m at a time.
RcppAlgos
gtools
arrangements
How to:
Benchmark:
Next, we examine permutations without replacement with a general vector (returning all permutations).
RcppAlgos
gtools
combinat
multicool
arrangements
How to:
Benchmark:
Now, we examine permutations without replacement for
1:n
(returning all permutations).RcppAlgos
gtools
combinat
multicool
partitions
arrangements
How to:
Benchmark:
Lastly, we examine permutations with replacement.
RcppAlgos
iterpc
gtools
arrangements
How to:
This next benchmark is a little surprising given the results up until now.
That is not a typo...
gtools::permutations
is almost as fast as the other compiled functions. I encourage the reader to go check out the source code ofgtools::permutations
as it is one of the most elegant displays of programming out there (R
or otherwise).4. Multisets
First, we examine combinations of multisets.
RcppAlgos
arrangements
To find combinations/permutations of multisets, with
RcppAlgos
use thefreqs
arguments to specify how many times each element of the source vector,v
, is repeated.Benchmark:
For permutations of multisets chosen m at a time, we have:
RcppAlgos
arrangements
How to:
Benchmark:
For permutations of multisets returning all permutations, we have:
RcppAlgos
multicool
arrangements
How to:
Benchmark:
5. Summary
Both
gtools
andcombinat
are well established packages for rearranging elements of a vector. Withgtools
there are a few more options (see the overview above) and withcombinat
, you can rearrangefactors
. Withmulticool
, one is able to rearrange multisets. Althoughpartitions
andgRbase
are limited for the purposes of this question, they are powerhouses packed with highly efficient functions for dealing with partitions and array objects respectively.arrangements
layout
argument (r = row-major
,c = column-major
, andl = list
).collect
&getnext
when working with iterators.2^31 - 1
combinations/permutations viagetnext
. N.B.RcppAlgos
(vialower/upper
see below) andmulticool
(vianextPerm
) are also capable of doing this.getnext
, this function, allows for a specific number of results by utilizing thed
argument.Observe:
This feature is really nice when you only want a few combinations/permutations. With traditional methods, you would have to generate all combinations/permutations and then subset. This would render the previous example impossible as there are more than
10^17
results (i.e.ncombinations(1000, 7, bigz = TRUE)
= 194280608456793000).This feature along with the improvements to the generators in
arrangements
, allow it to be very efficient with respect to memory.RcppAlgos
upper
(formallyrowCap
) that is analogous to thed
argument ofgetnext
.Observe:
2.0.0
, there is an argument calledlower
that allows one to start generation at a specific combination/permutation. This sets up nicely for parallelization and allows for fast generation beyond2^31 - 1
as chunks are generated independently.Parallel example with more than 6 billion combinations:
In case you were wondering how each package scales, I will leave you with this final example that measures how fast each package can generate over 100 million results (N.B.
gtools::combinations
is left out here as it will throw the error:evaluation nested too deeply...
). Also, we are explicitly callingcombn
from theutils
package because I was unable to get a successful run fromcombinat::combn
. The differences in memory usage between these two is quite bizarre given that they are only marginally different (see?utils::combn
under the "Authors" section).Observe:
6. Memory
When executing
comboGeneral
as well asarrangements::combinations
, the memory will jump almost 2 Gbs before callinggc
. This seems about right as#rows * #nols * bytesPerCell / 2^30 bytes = choose(1000,3) * 3 * 4 / 2^30 bytes = (166167000 * 3 * 4)/2^30 = 1.857 Gbs
). However, when executingcombn
, the memory behavior was eratic (e.g. sometimes it would use all 16 Gb of memory and other times it would only spike a couple of Gbs). When I tested this on the Windows set-up, it would often crash.We can confirm this using
Rprof
along withsummaryRporf
. Observe:With
RcppAlgos
&arrangements
,mem.total
registers just over1900 Mb
.And here is the memory profile on a smaller vector comparing
gtools
,utils
, andcombinat
.Interestingly,
utils::combn
andcombinat::combn
use different amounts of memory and take different amounts of time to execute. This does not hold up with smaller vectors:And with
gtools
the total memory used is a little over 3x as much asutils
. It should be noted that for these 3 packages, I obtained different results every-time I ran them (e.g. forcombinat::combn
sometimes I would get 9000 Mb and then I would get 13000 Mb).Still, none can match
RcppAlgos
ORarrangements
. Both only use 51 Mb when ran on the example above.benchmark script: https://gist.github.com/randy3k/bd5730a6d70101c7471f4ae6f453862e (rendered by https://github.com/tidyverse/reprex)
*: An homage to A Walk through Combinatorics by Miklós Bóna