Can I have a great list of common C++ optimization practices?
What I mean by optimization is that you have to modify the source code to be able to run a program faster, not changing the compiler settings.
Can I have a great list of common C++ optimization practices?
What I mean by optimization is that you have to modify the source code to be able to run a program faster, not changing the compiler settings.
Two ways to write better programs:
Make best use of language
profile your application
There is not much language specific optimization one can do - it is limited to using language constructs (learn from #1). The main benefit comes from #2 above.
I will echo what others have said: a better algorithm is going to win in terms of performance gains.
That said, I work in image processing, which as a problem domain can be stickier. For example, many years ago I had a chunk of code that looked like this:
void FlipBuffer(unsigned char *start, unsigned char *end)
{
unsigned char temp;
while (start <= end) {
temp = _bitRev[*start];
*start++ = _bitRev[*end];
*end-- = temp;
}
}
which rotates a 1-bit frame buffer 180 degrees. _bitRev is a 256 byte table of reversed bits. This code is about as tight as you can get it. It ran on an 8MHz 68K laser printer controller and took roughly 2.5 seconds for a legal sized piece of paper. To spare you the details, the customer couldn't bear 2.5 seconds. The solution was an identical algorithm to this. The difference was that
So 5x: no algorithm change.
The point is that you also need to understand your problem domain and what bottlenecks means. In image processing, algorithm is still king, but if your loops are doing extra work, multiply that work by several million and that's the price you pay.
Don't forget about few things:
- "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (c) Donald Knuth
- We could get more if we will optimize algorithms than code.
- We will optimize only slow parts of existing code, which will be detected by profiler or other special tool.
Agner Fog has done a great job analyzing the output of several compilers regarding C++ constructs. You will find his work here: http://www.agner.org/optimize/.
Intel offers a great document too - the "Intel® 64 and IA-32 Architectures Optimization Reference Manual", which you will find at http://www.intel.com/products/processor/manuals/index.htm. Although it mainly targets IA-32 architectures, it contains general advice that can be applied on most platforms. Obviously, it and Agner Fog's guide do intersect a bit.
As mentioned in other answers, micro-optimization is obviously the last step you want take to make your program faster, after profiling and algorithm selection.
You might be interested in this: Optimizing C++ Wikibook
I don't have a site off the top of my head but the book "Exceptional C++" by Sutter is superb for C/C++ development. I highly recommend every C++ programmer read this book as it gives great insight in to not only optimization but smart usage of the language so that you will program truly exceptional.
It is common in other engineering disciplines to assign budgets to the components of the system. For example, the engines of a VTOL aircraft are designed to provide a certain amount of lift, so the weight must be within a limit. At a high level, each part of the aircraft is given a portion of the weight budget which it should meet.
The reason this is done top down, rather than waiting until it's too bloated to get off the deck and then weighing each part and filing a bit off of the heaviest bit, is partly due to the cost of changing fabricated components. But a large part of it is that if you create a system where everything is a bit over budget everywhere, you can't just fix it in one place.
The classic software example is the SGI Indy Irix 5.1, which partly is why graphics intensive users have Macs and Windows machines now rather than SGI boxes.
"What's most frightening about the 5.1 performance is that nobody knows exactly where it went. If you start asking around, you get plenty of finger-pointing and theories, but few facts. In the May report, I proposed a "5% theory", which states that each little thing we add (Motif, internationalization, drag-and-drop, DSOs, multiple fonts, and so on) costs roughly 5% of the machine. After 15 or 20 of these, most of the performance is gone."
Frequently in discussions of performance, 5% is is said to be insignificant, and the advice is to wait until there is a problem and then look for a single bottleneck. For a large system, waiting until you have a problem may just lose you your main business.
You asked for sites/sources containing optimization wisdom.
Some good ones have been suggested.
I might add that they will nearly all say that profiling is the best if not the only way to locate performance problems.
I'm not sure where this folk-wisdom originated or how it was justified, but there is a better way.
ADDED:
It is true that the "wrong algorithm" can kill performance, but that's certainly not the only way.
I do a lot of performance tuning. On large software, what usually kills performance is too much data structure and too many layers of abstraction.
What seem like innocent one-liner method calls to abstract objects tempt you to forget what that call could cost you. Multiply this tendency over several layers of abstraction, and you find things like, for example, spending all your time allocating and collecting things like iterators and collection classes when simple arrays with indexing would have been sufficient (and no less maintainable) but less "proper".
That's the problem with "common wisdom". It often is exactly the opposite of wisdom.
Most techniques are compiler-specific, as different compilers optimize differently.
If you want some optimization tips that are compiler agnostic, here are two for you:
(apologies to Michael A. Jackson)
++p is usually faster than p++ and --p is faster than p--, especially for objects of types with overloaded prefix and postfix increment and decrement operators, because prefix form just increments or decrements something and returns the new value, whereas the postfix form increments or decrements something, but has to keep the old value somewhere to return it. That is, instead of (replace int with your favorite class here)
for ( int i ( 0); i < x; i++)
always write
for ( int i ( 0); i < x; ++i)
Most optimizations are language-agnostic. Understand your code, and understand the hardware you're running on, and you can do most low-level optimizations.
Understand your problem domain, and suitable algorithms, and you can do any high-level optimizations.
The only C++-specific optimization advice I can think of is "understand what your code means". Understand when C++ copies temporaries around, understand which constructors and destructors are called when.
And prefer functors to function pointers, as the former can be inlined by the compiler. In general, move as much as possible to compile-time rather than runtime. Use templates to do the heavy lifting.
And of course, don't try optimizing until you've profiled and fount out that 1) optimization is necessary, and 2) what needs optimizing.
Edit: A comment asked about functors vs function pointers being inlined. Here's the explanation:
Functions are usually compiled in isolation. So what does the compiler know about a function F that takes a function pointer FP as argument? Nothing, it has to go look up where the F is called from, and maybe there it can find a definite clue as to which function FP points to. If it can determine that when called from here, FP will ALWAYS point to function G, then yes, it can make an inlined version of F, with G inlined inside it, for this particular call site. But it can't simply inline G without inlining F, because F might be called from other places as well, where it is passed a different function pointer. And even then, it requires some expensive global optimizations to determine that anything can be inlined.
Imagine you pass a functor such as this instead:
struct Ftor {
void operator()() { ... }
};
so the function F looks like this:
void F(const FTor& ft) {
...
ft();
...
}
now the compiler knows exacty which function gets called: Line 2 in the function calls Ftor::operator(). And so the call can be inlined easily.
Of course, in practice, you'd typically template it, so the function can be called with any functor type:
template <typename F>
void F(const F& ft) {
...
ft();
...
}
Sorry I don't have any references for you, but I do have another anecdote to add to the pile.
I had a rather large std::map that I was generating using Microsoft's CString object as the key. Performance was unacceptable. Since all of my strings were identical in length, I created a class wrapper around an old-fashioned fixed-size array of chars, to emulate the interface of CString. Unfortunately I can't remember the exact speedup, but it was significant, and the resulting performance was more than adequate.
Sometimes you need to know a little about the library constructs you rely upon.
Template Meta-Programming can be used to move from dynamic polymorphism to compile time polymorphism, producing insanely optimal code in the process. Alexandrescu's Modern C++ Design is a good book which covers TMP in depth. Not every page is about optimization, but it is an oft recurring consideration in the program designs.
Contrary to what many people have said, there are many language specific optimizations you can do. This is an excellent resource on Wikibooks. Keep this in mind when designing your code, then profile, profile, profile.
The best optimization one can get is by revisiting the design, and after profiling the performance relevant parts/algorithms of the application. This is usually not language specific.
What I mean is (just as an idea) if you would get 30% performance improvement by selecting a slightly better algorithm (or collection/container class) the improvement you can expect from a C++ related refactoring would be at most 2%. A design improvement could give you anything above 30%.
If you have a concrete application, the best strategy is to measure and profile the application. Profiling gives usually the most instant idea of which parts are performance relevant.
Here are a couple catch all paths for optimization.
There is no one way for optimization problems... they are always hand tuned to the hardware/software/system-considerations.
Assuming you have the best algorithm:
Example seen here: What is the fastest way to swap values in C?
General tips:
http://www.ceet.niu.edu/faculty/kuo/exp/exp6/figuree6-1.jpg:
http://www.asc.edu/seminars/optimization/fig1.gif:
There are many things you can do to optimize C++ code. Some of the broader suggestions are listed above.
A couple specific examples are:
In general, following a fundamental programming paradigm such as Data Oriented Programming will yield higher performance as DOP is specifically formulated to focus on performance (in all forms: memory layouts, cache coherency, runtime costs, etc.)
More info here: https://people.cs.clemson.edu/~dhouse/courses/405/papers/optimize.pdf