Is there anything wrong with this shuffling algori

2019-04-10 01:14发布

问题:

I have been doing a little recreational holiday computing. My mini-project was a simulation of the Italian game of "tomboli". A key building block was a simulation of the following process;

The game is controlled by a man with a bag of 90 marbles, numbered 1 to 90. He draws marbles one by one randomly from the bag, each time calling out the marble number to the players.

After a little thought I wrote the following code for this building block;

// NBR marbles, numbered 1...NBR are in a bag. Simulate randomly
//  pulling them from the bag, one by one, until the bag is empty
void bag( int random_sequence[NBR] )
{
    int i;

    // Store each marble as it is pulled out
    int *store = random_sequence;

    // Array of marbles still in the bag
    int not_yet_pulled[NBR];
    for( i=0; i<NBR; i++ )
        not_yet_pulled[i] = i+1;    // eg NBR=90; 1,2,3 ... 90

    // Loop pulling marbles from the bag, one each time through
    for( i=NBR; i>=1; i-- )
    {
        int x = rand();
        int idx = x%i;  // eg i=90 idx is random in range 0..89
                        // eg i=89 idx is random in range 0..88
                        //            ...
                        // eg i=1  idx is random in range 0..0
                        //    (so we could optimize when i=1 but not worth the bother)
        *store++  = not_yet_pulled[idx];

        // Replace the marble just drawn (so it cannot be pulled again)
        //     with the last marble in the bag. So;
        //     1) there is now one less marble in the bag
        //     2) only marbles not yet pulled are still in the bag
        // If we happened to pull the last marble in the *current subarray*, this is
        //    not required but does no harm.
        not_yet_pulled[idx] = not_yet_pulled[i-1];
    }
}

I know there are subtleties and traps all over the place in game simulation with random numbers, so although I am pretty happy with my code my confidence is a little less than 100%. So my questions are;

1) Is there anything wrong with my code ?

2) [if the answer to 1) is no] Am I unknowingly using a standard shuffling algorithm ?

3) [if the answer to 2) is no] How does my algorithm compare to standard alternatives ?

EDIT Thanks to all who answered. I am going to accept Aidan Cully's answer because it turns out I was rediscovering the Fisher-Yates algorithm, and revealing that gets to the heart of the matter. Of course it is no surprise I could have saved myself time and effort by doing some research up front. But on the other hand it was a fun hobby project. The rest of the simulation was routine, this was the most interesting part, and I would have deprived myself of enjoyment by not having a go myself. Additionally, I was trying to simulate a man taking marbles from a bag, and it was fairly late in the piece that I realized that the situation was exactly analagous to shuffling cards.

Another point of interest is that there is a small flaw, identified by Ken who points out that the oft repeated pattern rand()%N is not a really good way of picking a random number from the range 0..N-1.

Finally, my version of Fisher-Yates lacks the elegant trick that allows the nice property of shuffling in place. As a result, my algorithm would end up with an equally random but reversed shuffle.

回答1:

You're using the Fisher-Yates shuffling algorithm.



回答2:

Use the Fisher-Yates-Knuth shuffle:

public static void shuffle(int[] array) 
{
    Random rng = new Random();       // java.util.Random.
    // n is the number of items left to shuffle
    for (int n = array.length; n > 1; n--) 
    {
        // Pick a random element to move to the end
        int k = rng.nextInt(n);  // 0 <= k <= n - 1.
        // Simple swap of variables
        int tmp = array[k];
        array[k] = array[n - 1];
        array[n - 1] = tmp;
    }
}

It looks like your code might work, but I'm not sure. It's more obfuscated than the standard algorithm.



回答3:

int idx = x%i;  // eg i=90 idx is random in range 0..89

It's in that range, but it's not evenly distributed, unless 90 (or NBR) divides max(rand()). If you're on a 2-bit computer, that's probably not true. Here, idx is slightly more likely to be 0 than to be 89, for example.



回答4:

Analysing algorithms to check if they are truly random is exceedingly hard.
Apart from people with college level maths (or as Americans put it, a major in Math), this is way beyond most people's skill to even validate.

Thus you should try and use algorithms that are already built.
Have you looked at std::random_shuffle()?

 void bag( int random_sequence[NBR] )
 {
     for(int i=0; i<NBR; ++i) 
     {    random_sequence[i] = i+1;
     }
     std::random_shuffle(random_sequence,random_sequence + NBR);
 }

Quote from std::random_shuffle() page:

This algorithm is described in section 3.4.2 of Knuth (D. E. Knuth, The Art of Computer Programming. Volume 2: Seminumerical Algorithms, second edition. Addison-Wesley, 1981). Knuth credits Moses and Oakford (1963) and Durstenfeld (1964). Note that there are N! ways of arranging a sequence of N elements. Random_shuffle yields uniformly distributed results; that is, the probability of any particular ordering is 1/N!. The reason this comment is important is that there are a number of algorithms that seem at first sight to implement random shuffling of a sequence, but that do not in fact produce a uniform distribution over the N! possible orderings. That is, it's easy to get random shuffle wrong.



回答5:

An alternative to rand() % i which will have a better near-uniform distribution (at the expense of performance) is (int) ((rand() / (double) (RAND_MAX+1)) * i).

Or, use a pseudo-random number generation algorithm known to work well such as the Mersenne twister.



回答6:

Just a couple stylistic points:

  1. Your signature of taking an array with a given length might give the illusion that the parameter is guaranteed by the compiler to contain at least IDX elements. It's not.
  2. I would probably give the loop index in the second for loop a more desciptive name like marblesRemaining, so it's clearer what it is and does not need to be explained by comments. It would also separate it from the entirely different use it has in the first loop.


回答7:

Quibbles over random number generation aside, your shuffle algorithm looks correct.

You can improve it, though: with a little thought, you can see that you can shuffle the numbers in place. So, instead of allocating a temporary array, you can just use the output buffer.



回答8:

As others have already commented, use a proven shuffling algorithm.

It is worth noting that your C/C++ library is only providing a pseudo-random number.

Systems that require high reliability of the randomization algorithm use dedicated hardware to generate the random number. High-end poker sites are a good example. See for example the Pokerstars writeup on their random number generation technique.

Early versions of Netscape encryption were broken because the hackers were able to predict the "random" number used because the pseudo-random number generator was seeded with the current time. See this writeup on Wikipedia.