Does qsort demand consistent comparisons or can I

2020-02-29 19:14发布

Update: Please file this under bad ideas. You don't get anything for free in life and here is certainly proof. A simple idea gone bad. It is definitely something to learn from however.

Lazy programming challenge. If I pass a function that 50-50 returns true or false for the qsort's comparision function I think that I can effectively unsort an array of structures writing 3 lines of code.

int main ( int argc, char **argv)
{
    srand( time(NULL) );  /* 1 */
    ...
    /* qsort(....) */     /* 2 */
}

...

int comp_nums(const int *num1, const int *num2)
{
    float frand = 
          (float) (rand()) / ((float) (RAND_MAX+1.0));  /* 3 */

    if (frand >= 0.5f)
         return GREATER_THAN;
    return LESS_THAN;
}

Any pitfalls I need to look for? Is it possible in fewer lines through swapping or is this the cleanest I get for 3 non trivial lines?

标签: c sorting random
6条回答
等我变得足够好
2楼-- · 2020-02-29 19:33

No, this won't properly shuffle the array, it will barely move elements around their original locations, with exponential distribution.

查看更多
等我变得足够好
3楼-- · 2020-02-29 19:42

In addition to the other answers, this is worse than a simple Fisher-Yates shuffle because it is too slow. The qsort algorithm is O(n*log(n)), the Fisher-Yates is O(n).

Some more detail is available in Wikipedia on why this kind of "shuffle" does not generally work as well as the Fisher-Yates method:

Comparison with other shuffling algorithms

The Fisher-Yates shuffle is quite efficient; indeed, its asymptotic time and space complexity are optimal. Combined with a high-quality unbiased random number source, it is also guaranteed to produce unbiased results. Compared to some other solutions, it also has the advantage that, if only part of the resulting permutation is needed, it can be stopped halfway through, or even stopped and restarted repeatedly, generating the permutation incrementally as needed. In high-level programming languages with a fast built-in sorting algorithm, an alternative method, where each element of the set to be shuffled is assigned a random number and the set is then sorted according to these numbers, may be faster in practice[citation needed], despite having worse asymptotic time complexity (O(n log n) vs. O(n)). Like the Fisher-Yates shuffle, this method will also produce unbiased results if correctly implemented, and may be more tolerant of certain kinds of bias in the random numbers. However, care must be taken to ensure that the assigned random numbers are never duplicated, since sorting algorithms in general won't order elements randomly in case of a tie. A variant of the above method that has seen some use in languages that support sorting with user-specified comparison functions is to shuffle a list by sorting it with a comparison function that returns random values. However, this does not always work: with a number of commonly used sorting algorithms, the results end up biased due to internal asymmetries in the sorting implementation.[7]

This links to here:

just one more thing While writing this article I experimented with various versions of the methods and discovered one more flaw in the original version (renamed by me to shuffle_sort). I was wrong when I said “it returns a nicely shuffled array every time it is called.”

The results are not nicely shuffled at all. They are biased. Badly. That means that some permutations (i.e. orderings) of elements are more likely than others. Here’s another snippet of code to prove it, again borrowed from the newsgroup discussion:

N = 100000
A = %w(a b c)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
  sorted = A.shuffle  
  Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
  puts "#{key}: #{Score[key]}"
end

This code shuffles 100,000 times array of three elements: a, b, c and records how many times each possible result was achieved. In this case, there are only six possible orderings and we should got each one about 16666.66 times. If we try an unbiased version of shuffle (shuffle or shuffle_sort_by), the result are as expected:

 
 abc: 16517
 acb: 16893
 bac: 16584
 bca: 16568
 cab: 16476
 cba: 16962

Of course, there are some deviations, but they shouldn’t exceed a few percent of expected value and they should be different each time we run this code. We can say that the distribution is even.

OK, what happens if we use the shuffle_sort method?

 abc: 44278 
 acb: 7462
 bac: 7538
 bca: 3710
 cab: 3698
 cba: 33314

This is not an even distribution at all. Again?

It shows how the sort method is biased and goes into detail why this is so. FInally he links to Coding Horror:

Let's take a look at the correct Knuth-Fisher-Yates shuffle algorithm.

for (int i = cards.Length - 1; i > 0; i--)
{
  int n = rand.Next(i + 1);
  Swap(ref cards[i], ref cards[n]);
}

Do you see the difference? I missed it the first time. Compare the swaps for a 3 card deck:

 
Naïve shuffle   Knuth-Fisher-Yates shuffle
rand.Next(3);    rand.Next(3);
rand.Next(3);    rand.Next(2);
rand.Next(3); 

The naive shuffle results in 3^3 (27) possible deck combinations. That's odd, because the mathematics tell us that there are really only 3! or 6 possible combinations of a 3 card deck. In the KFY shuffle, we start with an initial order, swap from the third position with any of the three cards, then swap again from the second position with the remaining two cards.

查看更多
萌系小妹纸
4楼-- · 2020-02-29 19:43

Bad idea. I mean really bad.

Your solution gives an unpredictable result, not a random result and there is a big difference. You have no real idea of what a qsort with a random comparison will do and whether all combinations are equally likely. This is the most important criterion for a shuffle: all combinations must be equally likely. Biased results equal big trouble. There's no way to prove that in your example.

You should implement the Fisher-Yates shuffle (otherwise known as the Knuth shuffle).

查看更多
Animai°情兽
5楼-- · 2020-02-29 19:48

The Old New Thing takes on this one

I think the basic idea of randomly partition the set recursively on the way down and concatenate the results on the way up will work (It will average O(n*log n) binary decisions and that is darn close to log2(fact(n)) but q-sort will not be sure to do that with a random predicate.

BTW I think the same argument and issues can be said for any O(n*log n) sort strategy.

查看更多
Evening l夕情丶
6楼-- · 2020-02-29 19:51

Rand isn't the most random thing out there... If you want to shuffle cards or something this isn't the best. Also a Knuth shuffle would be quicker, but your solution is ok if it doesn't loop forever

查看更多
Emotional °昔
7楼-- · 2020-02-29 19:55

The comparison function isn't supposed to return a boolean type, it's supposed to return a negative number, a positive number, or zero which qsort() uses to determine which argument is greater than the other.

查看更多
登录 后发表回答