Random Permutations

2019-04-08 15:35发布

I am having trouble figuring out a decent way of randomly shuffling the elements in an std::vector and, after some operations, restoring the original order. I know that this should be a rather trivial algorithm, but I guess I'm too tired...

Since I am constrained to use a custom random number generator class, I guess I can't use std::random_shuffle, which doesn't help anyway, because I also need to preserve the original order. So, my approach was to create an std::map which serves as a mapping between the original positions and the random ones, like this:

std::map<unsigned int, unsigned int> getRandomPermutation (const unsigned int &numberOfElements)
{
    std::map<unsigned int, unsigned int> permutation;

    //populate the map
    for (unsigned int i = 0; i < numberOfElements; i++)
    {
        permutation[i] = i;
    }

    //randomize it
    for (unsigned int i = 0; i < numberOfElements; i++)
    {
        //generate a random number in the interval [0, numberOfElements)
        unsigned long randomValue = GetRandomInteger(numberOfElements - 1U);

        //broken swap implementation
        //permutation[i] = randomValue;
        //permutation[randomValue] = i;

        //use this instead:
        std::swap(permutation[i], permutation[randomValue]);
    }

    return permutation;
}

I am not sure that the above algorithm is a proper implementation for a random permutation, so any improvements are welcome.

Now, here is how I've managed to make use of this permutation map:

std::vector<BigInteger> doStuff (const std::vector<BigInteger> &input)
{
    /// Permute the values in a random order
    std::map<unsigned int, unsigned int> permutation = getRandomPermutation(static_cast<unsigned int>(input.size()));

    std::vector<BigInteger> temp;

    //permute values
    for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
    {
        temp.push_back(input[permutation[i]]);
    }

    //do all sorts of stuff with temp

    /// Reverse the permutation
    std::vector<BigInteger> output;
    for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
    {
        output.push_back(temp[permutation[i]]);
    }

    return output;
}

Something tells me that I should be able to use only one std::vector<BigInteger> for this algorithm, but, right now, I just can't figure out the optimal solution. Honestly, I don't really care about the data in input, so I could even make it non-const, overwrite it, and skip creating a copy of it, but the question is how to implement the algorithm?

If I do something like this, I end up shooting myself in the foot, right? :)

for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
{
    BigInteger aux = input[i];
    input[i] = input[permutation[i]];
    input[permutation[i]] = aux;
}

EDIT: Following Steve's remark about using "Fisher-Yates" shuffle, I changed my getRandomPermutation function accordingly:

std::map<unsigned int, unsigned int> getRandomPermutation (const unsigned int &numberOfElements)
{
    std::map<unsigned int, unsigned int> permutation;

    //populate the map
    for (unsigned int i = 0; i < numberOfElements; i++)
    {
        permutation[i] = i;
    }

    //randomize it
    for (unsigned int i = numberOfElements - 1; i > 0; --i)
    {
        //generate a random number in the interval [0, numberOfElements)
        unsigned long randomValue = GetRandomInteger(i);

        std::swap(permutation[i], permutation[randomValue]);
    }

    return permutation;
}

4条回答
女痞
2楼-- · 2019-04-08 15:45

If you're looking for specific errors in your code:

permutation[i] = randomValue;
permutation[randomValue] = i;

is wrong. Observe that once you're finished, each value does not necessarily appear exactly once among the values of the map. So it's not a permutation, let alone a uniformly-distributed random one.

The proper means to generate a random permutation is what Tony says, use std::random_shuffle on a vector that initially represents the identity permutation. Or if you want to know how a shuffle is properly performed, look up "Fisher-Yates". In general, any approach that makes N random selections uniformly from 0 .. N-1 is doomed to failure, because that means it has N^N possible ways it can run. But there are N! possible permutations of N items, and N^N is generally not divisible by N!. Hence it's impossible for each permutation to be the result of an equal number of random selections, i.e. the distribution is not uniform.

the question is how to implement the algorithm?

So, you have your permutation, and you want to re-order the elements of input in-place, according to that permutation.

The key thing to know is that every permutation is a composition of "cycles". That is to say, if you repeatedly follow the permutation from a given starting point, you come back to where you started (and this path is the cycle to which that starting point belongs). There may be more than one such cycle in a given permutation, and if permutation[i] == i for some i, then the cycle of i has length 1.

The cycles are all disjoint, that is to say each element appears in exactly one cycle. Because cycles don't "interfere" with each other, we can apply a permutation by applying each cycle, and we can do the cycles in any order. So, for each index i we need to:

  • check whether we've already done i. If so, move on to the next index.
  • set current = i
  • swap index[current] with index[permutation[current]]. So index[current] is set to its correct value (the next element in the cycle), and its old value is "pushed" forward along the cycle.
  • mark current as "done"
  • if permutuation[current] is i, we've finished the cycle. So the first value of the cycle ends up in the spot formerly occupied by the last element of the cycle, which is right. Move on to the next index.
  • set current = permutation[current] and go back to the swap step.

Depending on the types involved, you can optimize around the swaps - it may be better to copy/move to a temporary variable and the start of each cycle, then do a copy/move instead of a swap at each step of the cycle, and finally copy/move the temporary to the end of the cycle.

Reversing the process is the same, but using the "inverse" of the permutation. The inverse inv of a permutation perm, is the permutation such that inv[perm[i]] == i for each i. You can either compute the inverse and use the exact code above, or you can use code similar to the above, except move the elements in the opposite direction along each cycle.

An alternative to all that, since you implemented Fisher-Yates yourself -- as you're running Fisher-Yates, for each swap you perform record the two indices swapped in a vector<pair<size_t,size_t>>. Then you don't have to worry about cycles. You can apply the permutation to the vector by applying the same sequence of swaps. You can reverse the permutation by applying the reversed sequence of swaps.

查看更多
等我变得足够好
3楼-- · 2019-04-08 15:46

If you're "randomising" a vector of n elements, you can create another std::vector<size_t> index(n), set index[x] = x for 0 <= x < n, then shuffle index. Then your lookups take the form: original_vector[index[i]]. The order of the original vector's never changed so no need to restore ordering.

...constrained to use a custom random number generator class, I guess I can't use std::random_shuffle...

Have you noticed this overload?

template <class RandomAccessIterator, class RandomNumberGenerator>
void random_shuffle ( RandomAccessIterator first, RandomAccessIterator last,
                    RandomNumberGenerator& rand );

For details of how to wrap your random number generator with a compatible object, see http://www.sgi.com/tech/stl/RandomNumberGenerator.html

查看更多
劳资没心,怎么记你
4楼-- · 2019-04-08 15:49

Given a ordered sequence of elements a,b,c,d,e you first create a new indexed sequence: X=(0,a),(1,b),(2,c),(3,d),(4,e). Then, you randomly shuffle that sequence and obtain the second element of each pair to get the random sequence. To restore the original sequence you sort the X set incrementally using the first element of each pair.

查看更多
姐就是有狂的资本
5楼-- · 2019-04-08 15:53

Note that, depending on your application, if it is important that you have a truly uniformly distributed permutation, you cannot use any algorithm that calls a typical pseudo-random number generator more that once.

The reason is that most pseudo-random number generators, such as the one in clib, are Linear congruential. Those have a weakness where they'll generate numbers that cluster in planes - so your permutations will not be perfectly uniformly distributed. Using a higher-quality generator should get around that.

See http://en.wikipedia.org/wiki/Linear_congruential_generator

Alternatively, you could just generate a single random number in the range 0..(n!-1) and pass it to the unrank function for permutations. For small enough n, you can store those and get a constant time algorithm, but if n is too large for that, the best unrank function is O(n). Applying the resulting permutation is going to be O(n) anyway.

查看更多
登录 后发表回答