可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a question that could seem very basic, but it is in a context where "every CPU tick counts" (this is a part of a larger algorithm that will be used on supercomputers).
The problem is quite simple : what is the fastest way to sort a list of unsigned long long int numbers and their original indexes ? (At the beginning, the unsigned long long int numbers are in a completely random order.)
Example :
Before
Numbers: 32 91 11 72
Indexes: 0 1 2 3
After
Numbers: 11 32 72 91
Indexes: 2 0 3 1
By "fastest way", I mean : what algorithm to use : std::sort, C qsort, or another sorting algorithm available on the web ? What container to use (C array, std::vector, std::map...) ? How to sort the indexes at the same time (use structures, std::pair, std::map...) ?
How many element to sort ? -> typically 4Go of numbers
回答1:
The obvious starting point would be a structure with operator<
defined for it:
struct data {
unsigned long long int number;
size_t index;
};
struct by_number {
bool operator()(data const &left, data const &right) {
return left.number < right.number;
}
};
...and an std::vector to hold the data:
std::vector<data> items;
and std::sort
to do the sorting:
std::sort(items.begin(), items.end(), by_number());
The simple fact is, that the normal containers (and such) are sufficiently efficient that using them doesn't make your code substantially less efficient. You might be able to do better by writing some part in a different way, but you might about as easily do worse. Start from solid and readable, and test -- don't (attempt to) optimize prematurely.
Edit: of course in C++11, you can use a lambda expression instead:
std::sort(items.begin(), items.end(),
[](data const &a, data const &b) { return a.number < b.number; });
This is generally a little more convenient to write. Readability depends--for something simple like this, I'd say sort ... by_number
is pretty readable, but that depends (heavily) on the name you give to the comparison operator. The lambda makes the actual sorting criteria easier to find, so you don't need to choose a name carefully for the code to be readable.
回答2:
std::pair
and std::sort
fit your requirements ideally: if you put the value into the pair.first
and the index in pair.second
, you can simply call a sort
on a vector of pair
s, like this:
// This is your original data. It does not need to be in a vector
vector<long> orig;
orig.push_back(10);
orig.push_back(3);
orig.push_back(6);
orig.push_back(11);
orig.push_back(2);
orig.push_back(19);
orig.push_back(7);
// This is a vector of {value,index} pairs
vector<pair<long,size_t> > vp;
vp.reserve(orig.size());
for (size_t i = 0 ; i != orig.size() ; i++) {
vp.push_back(make_pair(orig[i], i));
}
// Sorting will put lower values ahead of larger ones,
// resolving ties using the original index
sort(vp.begin(), vp.end());
for (size_t i = 0 ; i != vp.size() ; i++) {
cout << vp[i].first << " " << vp[i].second << endl;
}
回答3:
std::sort
has proven to be faster than the old qsort
because of the lack of indirection and the possibility of inlining critical operations.
The implementations of std::sort
are likely to be highly optimized and hard to beat, but not impossible. If your data is fixed length and short you might find Radix sort to be faster. Timsort is relatively new and has delivered good results for Python.
You might keep the index array separate from the value array, but I think the extra level of indirection will prove to be a speed killer. Better to keep them together in a struct or std::pair
.
As always with any speed critical application, you must try some actual implementations and compare them to know for sure which is fastest.
回答4:
It might be worth separating numbers and indexes and then just sorting indexes, like this:
#include <vector>
#include <algorithm>
#include <iostream>
void PrintElements(const std::vector<unsigned long long>& numbers, const std::vector<size_t>& indexes) {
std::cout << "\tNumbers:";
for (auto i = indexes.begin(); i != indexes.end(); ++i)
std::cout << '\t' << numbers[*i];
std::cout << std::endl;
std::cout << "\tIndexes:";
for (auto i = indexes.begin(); i != indexes.end(); ++i)
std::cout << '\t' << *i;
std::cout << std::endl;
}
int main() {
std::vector<unsigned long long> numbers;
std::vector<size_t> indexes;
numbers.reserve(4); // An overkill for this few elements, but important for billions.
numbers.push_back(32);
numbers.push_back(91);
numbers.push_back(11);
numbers.push_back(72);
indexes.reserve(numbers.capacity());
indexes.push_back(0);
indexes.push_back(1);
indexes.push_back(2);
indexes.push_back(3);
std::cout << "BEFORE:" << std::endl;
PrintElements(numbers, indexes);
std::sort(
indexes.begin(),
indexes.end(),
[&numbers](size_t i1, size_t i2) {
return numbers[i1] < numbers[i2];
}
);
std::cout << "AFTER:" << std::endl;
PrintElements(numbers, indexes);
return EXIT_SUCCESS;
}
This prints:
BEFORE:
Numbers: 32 91 11 72
Indexes: 0 1 2 3
AFTER:
Numbers: 11 32 72 91
Indexes: 2 0 3 1
The idea is that the elements being sorted are small and thus fast to move around during the sort. On modern CPUs however, the effects of indirect access to numbers
on caching could spoil these gains, so I recommend benchmarking on realistic amounts of data before making a final decision to use it.
回答5:
struct SomeValue
{
unsigned long long val;
size_t index;
bool operator<(const SomeValue& rhs)const
{
return val < rhs.val;
}
}
#include <algorithm>
std::vector<SomeValue> somevec;
//fill it...
std::sort(somevec.begin(),somevec.end());
回答6:
Use std::vector
and std::sort
. That should provided the fastest sort method. To Find the original index create a struct.
struct A {
int num;
int index;
}
Then make your own compare Predicate for sort that compares the num in the struct.
struct Predicate {
bool operator()(const A first, const A second) {
return first.num < second.num;
}
}
std::sort(vec.begin(), vec.end(), Predicate())
回答7:
This will be used on supercomputers?
In that case you may want to look into parallel sorting algorithms. That will only make sense for sorting large data sets, but the win if you need it is substantial.
回答8:
You might find this to be an interesting read. I would start with STL's sort and only then try and improve on it if I could. I'm not sure if you have access to a C++11 compiler (like gcc4.7) on this super computer, but I would suggest that std::sort with std::futures and std::threads would get you quite a bit of the way there with regard to parallelizing the problem in a maintainable way.
Here is another question that compares std::sort with qsort.
Finally, there is this article in Dr. Dobb's that compares the performance of parallel algorithms.