Rank and unrank Combination with constraints

2020-07-20 03:20发布

问题:

I want to rank and unrank combinations with an Element distance constraint. Selected elements cannot repeated.

For example:

n := 10 elements choose from

k := 5 elements being choosen

d := 3 max distance between 2 choosen elements

1,3,5,8,9 matches the constraint

1,5,6,7,8 dont matches the constraint

How can ranking the combination with given distance constraint, where 1,2,3,4,5 is smaller than 1,2,3,4,6 ? Is there a way do the ranking without compute the combinations with smaller Rank?

回答1:

You can do this by first creating and populating a two-dimensional array, which I will call NVT for "number of valid tails", to record the number of valid "tails" that start at a certain position with a given value. For example, NVT[4][6] = 3, because a combination that has 6 in position #4 can end in 3 distinct ways: (…, 6, 7), (…, 6, 8), and (…, 6, 9).

  • To populate NVT, start with NVT[k][…], which is just a row of all 1s. Then work your way back to earlier positions; for example, NVT[2][5] = NVT[3][6] + NVT[3][7] + NVT[3][8], because a "tail" starting at position #3 with value 5 will consist of that 5 plus a "tail" starting at position #4 with value 6, 7, or 8.
  • Note that we don't care whether there's actually a valid way to reach a given tail. For example, NVT[4][1] = 3 because of the valid tails (1, 2), (1, 3), and (1, 4), even though there are no complete combinations of the form (…, 1, _).

Once you've done that, you can compute the rank of a combination C as follows:

  • For position #1, count up all the valid tails starting at position #1 with a value less than C[1]. For example, if C starts with 3, then this will be NVT[1][1] + NVT[1][2], representing all combinations that start with 1 or 2.
  • Then do the same for all subsequent positions. These will represent combinations that start off the same way as C up until a given position, but then have a lesser value at that position.
  • For example, if C is (1, 3, 5, 8, 9), this comes out to
    0 +
    (NVT[2][1] + NVT[2][2]) +
    (NVT[3][1] + NVT[3][2] + NVT[3][3] + NVT[3][4]) +
    (NVT[4][1] + NVT[4][2] + NVT[4][3] + NVT[4][4] + NVT[4][5] + NVT[4][6] + NVT[4][7]) + (NVT[5][1] + NVT[5][2] + NVT[5][3] + NVT[5][4] + NVT[5][5] + NVT[5][6] + NVT[5][7] + NVT[5][8]).

Conversely, you can find the combination C with a given rank r as follows:

  • Create a temporary variable rr, for "remaining rank", initially equal to r.
  • To find C[1] — the value in position #1 — count up valid tails starting at position #1, starting with the least possible value (namely 1), stopping once this would exceed rr. For example, since NVT[1][1] = 66 and NVT[1][2] = 27, the combination with rank 75 must start with 2 (because 75 ≥ 66 and 75 < 66 + 27). Then subtract this sum from rr (in this case leaving 75 − 66 = 9).
  • Then do the same for all subsequent positions, making sure to keep in mind the least possible value given what was in the previous position. Continuing our example with r = 75, C[1] = 2, and rr = 9, we know that C[2] ≥ 3; and since NVT[2][3] = 23 > rr, we immediately find that C[2] = 3.

Complexity analysis:

  • Space:
    • NVT requires O(nk) space.
    • Returning a combination as a length-k array inherently requires O(k) space; but if we return the combination one value at a time (by invoking a callback or printing to a console or something), then the computation itself doesn't actually depend on this array, and only requires O(1) extra space.
    • Aside from that, everything else can be managed in O(1) space; we don't need any recursion or temporary arrays or anything.
  • Time:
    • Populating NVT takes O(nkd) time. (Note: if d is greater than n, then we can just set d equal to n.)
    • Given NVT, computing the rank of a given combination takes worst-case O(nk) time.
    • Given NVT, computing the combination with a given rank takes worst-case O(nk) time.

Implementation note: The details of the above are a bit fiddly; it would be easy to get an off-by-one error, or mix up two variables, or whatnot, if you don't have concrete data to look at. Since there are only 168 valid combinations for your example, I recommend generating all of them, so that you can reference them during debugging.


Possible additional optimization: If you expect n to be quite large, and you expect to do a lot of queries to "rank" and "unrank" combinations, then you might find it useful to create a second array, which I will call NVTLT for "number of valid tails less than", to record the number of valid "tails" that start at a certain position with a value less than a given value. For example, NVTLT[3][5] = NVT[3][1] + NVT[3][2] + NVT[3][3] + NVT[3][4], or if you prefer, NVTLT[3][5] = NVTLT[3][4] + NVT[3][4]. (You can do this as an in-place transformation, completely overwriting NVT, so it's an O(nk) pass with no additional space.) Using NVTLT instead of NVT for your queries will let you do binary search for values, rather than linear search, giving worst-case O(k log n) time instead of O(nk) time. Note that this optimization is even trickier than the above, so even if you intend to perform this optimization, I recommend starting with the above, getting it working perfectly, and only then adding this optimization.