可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Solution 1:
If i have a class like,
class car{ public: int a; string b; bool c;};
i can build a vector of 200 cars:
std::vector<car> allcas;
allcars.resize(200)
at runtime, i just do:
this_car=allcars[102];
then ....
Solution 2:
i have
std::vector<int> a; a.resize(200);
std::vector<string>b; b.resize(200);
std::vector<bool> c; c.resize(200);
this_car_a = a[102];
this_car_b = b[102];
this_car_c = c[102];
Question:
Which one is faster?
Does anyone have an idea? thanks a lot in advance!
回答1:
A "struct of vectors" has a couple of advantages over a "vector of structs":
- If your inner loop doesn't use every element of the struct, then struct-of-vectors can save on memory bandwidth, as unused element vectors will not be loaded into cache.
- It is easier to vectorize. A struct-of-vectors may enable you to use the vector processing instructions of your processor (through assembly, intrinsics, or clever compilers) to speed up your inner loops.
On the other hand, premature optimization is the root of all evil:
- Using a struct-of-vectors is more difficult, awkward, and obscure.
- You generally don't know where your performance bottlenecks are until you've got your code up and running. Is it worth making your code more verbose, fragile, and difficult? You won't know until you actually profile it.
- The benefits of struct-of-vectors programming vary on a case by case basis. It doesn't always yield a speedup; you could actually end up with worse performance.
- In particular, if your access pattern is random (as opposed to sequential or otherwise localized) a struct-of-vectors organization could end up loading much more useless data from memory, if each cache line includes elements from multiple nearby objects...
So, my recommendation is to use vector-of-structs by default, but keep struct-of-vectors in mind as an alternative (i.e., make sure you can switch later, if you expect sequential/local access patterns and it doesn't cost much effort up front). Once your program is running, you can profile it to see where the performance-critical sections are, and try out struct-of-vector and vectorized operations where they'll do the most good.
回答2:
If a
, b
and c
belong together and form an object together, why the hell would you split them? Go for clarity and readability first. Anything else comes after that. Also, I think v2 would be slower. More access on the vector. Didn't time it though. As always for questions about speed, time it.
回答3:
CPUs love prefetching.
If you are going to linearly traverse your data in the following pattern...
abcabcacb...
...then you are better off (performance-wise) with solution #1. If you are going to access them as:
aaa...bbb..ccc...
...then go for solution #2.
However, if you are not going to do a linear traversal or if you did not actually benchmark your code and concluded that you really need to squeeze every last drop of performance out of this piece of code, do your maintainability a favor and stick with Solution #1.
--- EDIT ---
In a multi-threaded environment, the physical layout of data may lead to false sharing. Essentially, keeping too close the pieces of data that are concurrently accessed by different threads may cause cache contention and destroy the scalability.
So, if you concurrently access a
from one thread and b
from another, it may be worth splitting them physically apart and implementing the solution #2. If, on the other hand, you access two "sibling" a
s, stick with the solution #1.
--- EDIT 2 ---
For the excellent treatment of this subject, I warmly recommend Herb Sutter's talk "Things Your Programming Language Never Told You", still available at:
http://video.google.com/videoplay?docid=-4714369049736584770
http://www.nwcpp.org/Downloads/2007/Machine_Architecture_-_NWCPP.pdf
回答4:
First of all, splitting them is a horrible idea for maintainability reasons, which should be your foremost concern.
Second of all, you just tripled your allocation time (three allocations instead of one), deallocation time (same), and destroyed cache locality of reference (probably a slowdown).
Third, The only benefit would be if you only read one member for all the cars over and over, and rarely alter the cars.
回答5:
It really depends on how you want to use your data. For example, if you only want to access one field:
car this_car = allcars[12];
cout << this_car.a;
Then this causes you to create a copy of this_car. In this case you would be needlessly copying fields b and c. Of course, you can fix this by getting by reference:
car & this_car = allcars[12];
This is potentially still slower than just doing
a = a[12];
However, if you want to access multiple properties of your class, then it is almost certainly better to store as them together. At this point you'll probably get better performance because of locality of reference, however it is all really dependent on the compiler, memory manager, etc.
In the end, the answer to which is best performance is: it depends. This will definitely not be a bottleneck decision, and it is definitely better to keep them in a single struct for code readability / your own sanity.
回答6:
It depends on the size of the struct members and on your pattern access. One singleton access is irrelevant, but consider you do an iteration over a vector and you're only interested in member a
. The wider the struct is, the fewer struct entries will fit in a cache line and the more cache misses you'll occur. Moving all a
members separate in a vector increases the cache line density and thus increases the performance. It can be quite significant (1.5x, 2x, even more).
However, is far more important to focus in code maintainability, make it readable, debuggable and easy to refactor. The code should clearly express the intent. Such micro optimizations as you're asking about should only be considered for measured bottlenecks. Get yourself a copy of the Software Optimization Cookbook.