First, define two integers N
and K
, where N >= K
, both known at compile time. For example: N = 8
and K = 3
.
Next, define a set of integers [0, N)
(or [1, N]
if that makes the answer simpler) and call it S
. For example: {0, 1, 2, 3, 4, 5, 6, 7}
The number of subsets of S
with K
elements is given by the formula C(N, K)
. Example
My problem is this: Create a perfect minimal hash for those subsets. The size of the example hash table will be C(8, 3)
or 56
.
I don't care about ordering, only that there be 56 entries in the hash table, and that I can determine the hash quickly from a set of K
integers. I also don't care about reversibility.
Example hash: hash({5, 2, 3}) = 42
. (The number 42 isn't important, at least not here)
Is there a generic algorithm for this that will work with any values of N
and K
? I wasn't able to find one by searching Google, or my own naive efforts.
There is an algorithm to code and decode a combination into its number in the lexicographical order of all combinations with a given fixed
K
. The algorithm is linear toN
for both code and decode of the combination. What language are you interested in?EDIT: here is example code in c++(it founds the lexicographical number of a combination in the sequence of all combinations of n elements as opposed to the ones with
k
elements but is really good starting point):I am sorry I have an algorithm for the problem you are asking for right now, but I believe it will be a good exercise to try to understand what I do above. Truth is this is one of the algorithms I teach in the course "Design and analysis of algorithms" and that is why I had it pre-written.
This is what you (and I) need:
hash()
mapsk-tuples
from[1..n]
onto the set1..C(n,k)\subset N
. The effort isk
subtractions (andO(k)
is a lower bound anyway, see Strandjev's remark above):