I have a question that could seem very basic, but it is in a context where "every CPU tick counts" (this is a part of a larger algorithm that will be used on supercomputers).
The problem is quite simple : what is the fastest way to sort a list of unsigned long long int numbers and their original indexes ? (At the beginning, the unsigned long long int numbers are in a completely random order.)
Example :
Before
Numbers: 32 91 11 72
Indexes: 0 1 2 3
After
Numbers: 11 32 72 91
Indexes: 2 0 3 1
By "fastest way", I mean : what algorithm to use : std::sort, C qsort, or another sorting algorithm available on the web ? What container to use (C array, std::vector, std::map...) ? How to sort the indexes at the same time (use structures, std::pair, std::map...) ?
How many element to sort ? -> typically 4Go of numbers
The obvious starting point would be a structure with
operator<
defined for it:...and an std::vector to hold the data:
and
std::sort
to do the sorting:The simple fact is, that the normal containers (and such) are sufficiently efficient that using them doesn't make your code substantially less efficient. You might be able to do better by writing some part in a different way, but you might about as easily do worse. Start from solid and readable, and test -- don't (attempt to) optimize prematurely.
Edit: of course in C++11, you can use a lambda expression instead:
This is generally a little more convenient to write. Readability depends--for something simple like this, I'd say
sort ... by_number
is pretty readable, but that depends (heavily) on the name you give to the comparison operator. The lambda makes the actual sorting criteria easier to find, so you don't need to choose a name carefully for the code to be readable.This will be used on supercomputers?
In that case you may want to look into parallel sorting algorithms. That will only make sense for sorting large data sets, but the win if you need it is substantial.
It might be worth separating numbers and indexes and then just sorting indexes, like this:
This prints:
The idea is that the elements being sorted are small and thus fast to move around during the sort. On modern CPUs however, the effects of indirect access to
numbers
on caching could spoil these gains, so I recommend benchmarking on realistic amounts of data before making a final decision to use it.Use
std::vector
andstd::sort
. That should provided the fastest sort method. To Find the original index create a struct.Then make your own compare Predicate for sort that compares the num in the struct.
std::sort(vec.begin(), vec.end(), Predicate())
std::sort
has proven to be faster than the oldqsort
because of the lack of indirection and the possibility of inlining critical operations.The implementations of
std::sort
are likely to be highly optimized and hard to beat, but not impossible. If your data is fixed length and short you might find Radix sort to be faster. Timsort is relatively new and has delivered good results for Python.You might keep the index array separate from the value array, but I think the extra level of indirection will prove to be a speed killer. Better to keep them together in a struct or
std::pair
.As always with any speed critical application, you must try some actual implementations and compare them to know for sure which is fastest.
You might find this to be an interesting read. I would start with STL's sort and only then try and improve on it if I could. I'm not sure if you have access to a C++11 compiler (like gcc4.7) on this super computer, but I would suggest that std::sort with std::futures and std::threads would get you quite a bit of the way there with regard to parallelizing the problem in a maintainable way.
Here is another question that compares std::sort with qsort.
Finally, there is this article in Dr. Dobb's that compares the performance of parallel algorithms.