Update: Please file this under bad ideas. You don't get anything for free in life and here is certainly proof. A simple idea gone bad. It is definitely something to learn from however.
Lazy programming challenge. If I pass a function that 50-50 returns true or false for the qsort's comparision function I think that I can effectively unsort an array of structures writing 3 lines of code.
int main ( int argc, char **argv)
{
srand( time(NULL) ); /* 1 */
...
/* qsort(....) */ /* 2 */
}
...
int comp_nums(const int *num1, const int *num2)
{
float frand =
(float) (rand()) / ((float) (RAND_MAX+1.0)); /* 3 */
if (frand >= 0.5f)
return GREATER_THAN;
return LESS_THAN;
}
Any pitfalls I need to look for? Is it possible in fewer lines through swapping or is this the cleanest I get for 3 non trivial lines?
Bad idea. I mean really bad.
Your solution gives an unpredictable result, not a random result and there is a big difference. You have no real idea of what a qsort with a random comparison will do and whether all combinations are equally likely. This is the most important criterion for a shuffle: all combinations must be equally likely. Biased results equal big trouble. There's no way to prove that in your example.
You should implement the Fisher-Yates shuffle (otherwise known as the Knuth shuffle).
In addition to the other answers, this is worse than a simple Fisher-Yates shuffle because it is too slow. The qsort algorithm is O(n*log(n)), the Fisher-Yates is O(n).
Some more detail is available in Wikipedia on why this kind of "shuffle" does not generally work as well as the Fisher-Yates method:
Comparison with other shuffling
algorithms
The Fisher-Yates shuffle is quite
efficient; indeed, its asymptotic time
and space complexity are optimal.
Combined with a high-quality unbiased
random number source, it is also
guaranteed to produce unbiased
results. Compared to some other
solutions, it also has the advantage
that, if only part of the resulting
permutation is needed, it can be
stopped halfway through, or even
stopped and restarted repeatedly,
generating the permutation
incrementally as needed. In high-level
programming languages with a fast
built-in sorting algorithm, an
alternative method, where each element
of the set to be shuffled is assigned
a random number and the set is then
sorted according to these numbers, may
be faster in practice[citation
needed], despite having worse
asymptotic time complexity (O(n log n)
vs. O(n)). Like the Fisher-Yates
shuffle, this method will also produce
unbiased results if correctly
implemented, and may be more tolerant
of certain kinds of bias in the random
numbers. However, care must be taken
to ensure that the assigned random
numbers are never duplicated, since
sorting algorithms in general won't
order elements randomly in case of a
tie. A variant of the above method
that has seen some use in languages
that support sorting with
user-specified comparison functions is
to shuffle a list by sorting it with a
comparison function that returns
random values. However, this does not
always work: with a number of commonly
used sorting algorithms, the results
end up biased due to internal
asymmetries in the sorting
implementation.[7]
This links to here:
just one more thing While writing this
article I experimented with various
versions of the methods and discovered
one more flaw in the original version
(renamed by me to shuffle_sort
). I was
wrong when I said “it returns a nicely
shuffled array every time it is
called.”
The results are not nicely shuffled at
all. They are biased. Badly. That
means that some permutations (i.e.
orderings) of elements are more likely
than others. Here’s another snippet of
code to prove it, again borrowed from
the newsgroup discussion:
N = 100000
A = %w(a b c)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
sorted = A.shuffle
Score[sorted.join("")] += 1
end
Score.keys.sort.each do |key|
puts "#{key}: #{Score[key]}"
end
This code
shuffles 100,000 times array of three
elements: a, b, c and records how many
times each possible result was
achieved. In this case, there are only
six possible orderings and we should
got each one about 16666.66 times. If
we try an unbiased version of shuffle
(shuffle
or shuffle_sort_by
), the
result are as expected:
abc: 16517
acb: 16893
bac: 16584
bca: 16568
cab: 16476
cba: 16962
Of course,
there are some deviations, but they
shouldn’t exceed a few percent of
expected value and they should be
different each time we run this code.
We can say that the distribution is
even.
OK, what happens if we use the
shuffle_sort method?
abc: 44278
acb: 7462
bac: 7538
bca: 3710
cab: 3698
cba: 33314
This is not
an even distribution at all. Again?
It shows how the sort method is biased and goes into detail why this is so. FInally he links to Coding Horror:
Let's take a look at the correct
Knuth-Fisher-Yates shuffle algorithm.
for (int i = cards.Length - 1; i > 0; i--)
{
int n = rand.Next(i + 1);
Swap(ref cards[i], ref cards[n]);
}
Do you see the difference? I missed
it the first time. Compare the swaps
for a 3 card deck:
Naïve shuffle Knuth-Fisher-Yates shuffle
rand.Next(3); rand.Next(3);
rand.Next(3); rand.Next(2);
rand.Next(3);
The naive shuffle
results in 3^3 (27) possible deck
combinations. That's odd, because the
mathematics tell us that there are
really only 3! or 6 possible
combinations of a 3 card deck. In the
KFY shuffle, we start with an initial
order, swap from the third position
with any of the three cards, then swap
again from the second position with
the remaining two cards.
No, this won't properly shuffle the array, it will barely move elements around their original locations, with exponential distribution.
The comparison function isn't supposed to return a boolean type, it's supposed to return a negative number, a positive number, or zero which qsort()
uses to determine which argument is greater than the other.
The Old New Thing takes on this one
I think the basic idea of randomly partition the set recursively on the way down and concatenate the results on the way up will work (It will average O(n*log n) binary decisions and that is darn close to log2(fact(n)) but q-sort will not be sure to do that with a random predicate.
BTW I think the same argument and issues can be said for any O(n*log n) sort strategy.
Rand isn't the most random thing out there... If you want to shuffle cards or something this isn't the best. Also a Knuth shuffle would be quicker, but your solution is ok if it doesn't loop forever