According Scott Meyers, in his Effective STL book - item 46. He claimed that std::sort
is about 670% faster than std::qsort
due to the fact of inline. I tested myself, and I saw that qsort is faster :( ! Could anyone help me to explain this strange behavior?
#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <ctime>
#include <cstdio>
const size_t LARGE_SIZE = 100000;
struct rnd {
int operator()() {
return rand() % LARGE_SIZE;
}
};
int comp( const void* a, const void* b ) {
return ( *( int* )a - *( int* )b );
}
int main() {
int ary[LARGE_SIZE];
int ary_copy[LARGE_SIZE];
// generate random data
std::generate( ary, ary + LARGE_SIZE, rnd() );
std::copy( ary, ary + LARGE_SIZE, ary_copy );
// get time
std::time_t start = std::clock();
// perform quick sort C using function pointer
std::qsort( ary, LARGE_SIZE, sizeof( int ), comp );
std::cout << "C quick-sort time elapsed: " << static_cast<double>( clock() - start ) / CLOCKS_PER_SEC << "\n";
// get time again
start = std::clock();
// perform quick sort C++ using function object
std::sort( ary_copy, ary_copy + LARGE_SIZE );
std::cout << "C++ quick-sort time elapsed: " << static_cast<double>( clock() - start ) / CLOCKS_PER_SEC << "\n";
}
This is my result:
C quick-sort time elapsed: 0.061
C++ quick-sort time elapsed: 0.086
Press any key to continue . . .
Update
Effective STL 3rd Edition ( 2001 )
Chapter 7 Programming with STL
Item 46: Consider function objects instead of functions as algorithm parameters.
Best regards,
Another reason that qsort may perform much better than expected is that newer compilers can inline and optimize through the function pointer.
If the C header defines an inline implementation of qsort instead of implementing it inside of a library and the compiler supports indirect function inlining, then qsort can be just as fast as std::sort.
The two sorting algorithms, without optimizations enabled, should have comparable performance. The reason that the C++
sort
tends to appreciably beatqsort
is that the compiler can inline the comparisons being made, since the compiler has type information about what function is being used to perform the comparison. Did you run these tests with optimization enabled? If not, try turning it on and running this test again.std::clock() is not a viable timing clock. You should use a platform-specific higher resolution timer, like the Windows High Performance Timer. More than that, the way that you call clock() is that first, text is output to the console, which is included in the time. This definitely invalidates the test. In addition, make sure that you compiled with all optimizations.
Finally, I copied and pasted your code, and got 0.016 for qsort and 0.008 for std::sort.
Don´t know how std::sort was implemented years ago. But std::sort can be much faster, because std::sort is quicksort with a heap sort fallback. Heapsort is a linearhitmic alghoritm, meaning if you have twice the sorting data, the sorting time doubles. Actually it more than doubles because it is not exactly linear, but however, qsort can be quadratic, so needing exponential more time for sorting twice the input.
On my machine adding some meat (making the array 10 million elements and moving it in the data section) and compiling with
I get as result
Be also careful about modern "green" CPUs that may be configured to run at a variable speed depending on the load of the system. When benchmarking this kind of behavior can drive you crazy (on my machine I've setup two small script
normal
andfast
that I can use when making speed tests).I am surprised that no one mentions caches.
In your code, you start by touching ary and *ary_copy* so they are resident in the cache at the time of qsort. During qsort, *ary_copy* might get evicted. At the time of std::sort, the elements would have to be fetched from memory or a larger (read slower) cache level. This will of course depend on your cache sizes.
Try to reverse the test, i.e., start by running std::sort.
As some people have pointed out; making the array larger will make the test more fair. The reason is that a large array is less likely to fit in cache.