List of common C++ Optimization Techniques [closed

I will echo what others have said: a better algorithm is going to win in terms of performance gains.

That said, I work in image processing, which as a problem domain can be stickier. For example, many years ago I had a chunk of code that looked like this:

void FlipBuffer(unsigned char *start, unsigned char *end)
{
    unsigned char temp;

    while (start <= end) {
        temp = _bitRev[*start];
        *start++ = _bitRev[*end];
        *end-- = temp;
    }
 }

which rotates a 1-bit frame buffer 180 degrees. _bitRev is a 256 byte table of reversed bits. This code is about as tight as you can get it. It ran on an 8MHz 68K laser printer controller and took roughly 2.5 seconds for a legal sized piece of paper. To spare you the details, the customer couldn't bear 2.5 seconds. The solution was an identical algorithm to this. The difference was that

I used a 128K table and operated on words instead of bytes (the 68K is much happier on words)
I used Duff's device to unroll the loop as much as would fit within a short branch
I put in an optimization to skip blank words
I finally rewrote it in assembly to take advantage of the sobgtr instruction (subtract one and branch on greater) and have "free" post increment and pre-decrements in the right places.

So 5x: no algorithm change.

The point is that you also need to understand your problem domain and what bottlenecks means. In image processing, algorithm is still king, but if your loops are doing extra work, multiply that work by several million and that's the price you pay.

回答3:

Don't forget about few things:
- "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (c) Donald Knuth
- We could get more if we will optimize algorithms than code.
- We will optimize only slow parts of existing code, which will be detected by profiler or other special tool.

回答4:

Agner Fog has done a great job analyzing the output of several compilers regarding C++ constructs. You will find his work here: http://www.agner.org/optimize/.

Intel offers a great document too - the "Intel® 64 and IA-32 Architectures Optimization Reference Manual", which you will find at http://www.intel.com/products/processor/manuals/index.htm. Although it mainly targets IA-32 architectures, it contains general advice that can be applied on most platforms. Obviously, it and Agner Fog's guide do intersect a bit.

As mentioned in other answers, micro-optimization is obviously the last step you want take to make your program faster, after profiling and algorithm selection.

回答5:

You might be interested in this: Optimizing C++ Wikibook

回答6:

I don't have a site off the top of my head but the book "Exceptional C++" by Sutter is superb for C/C++ development. I highly recommend every C++ programmer read this book as it gives great insight in to not only optimization but smart usage of the language so that you will program truly exceptional.

回答7:

It is common in other engineering disciplines to assign budgets to the components of the system. For example, the engines of a VTOL aircraft are designed to provide a certain amount of lift, so the weight must be within a limit. At a high level, each part of the aircraft is given a portion of the weight budget which it should meet.

The reason this is done top down, rather than waiting until it's too bloated to get off the deck and then weighing each part and filing a bit off of the heaviest bit, is partly due to the cost of changing fabricated components. But a large part of it is that if you create a system where everything is a bit over budget everywhere, you can't just fix it in one place.

The classic software example is the SGI Indy Irix 5.1, which partly is why graphics intensive users have Macs and Windows machines now rather than SGI boxes.

"What's most frightening about the 5.1 performance is that nobody knows exactly where it went. If you start asking around, you get plenty of finger-pointing and theories, but few facts. In the May report, I proposed a "5% theory", which states that each little thing we add (Motif, internationalization, drag-and-drop, DSOs, multiple fonts, and so on) costs roughly 5% of the machine. After 15 or 20 of these, most of the performance is gone."

Frequently in discussions of performance, 5% is is said to be insignificant, and the advice is to wait until there is a problem and then look for a single bottleneck. For a large system, waiting until you have a problem may just lose you your main business.

回答8:

You asked for sites/sources containing optimization wisdom.

Some good ones have been suggested.

I might add that they will nearly all say that profiling is the best if not the only way to locate performance problems.

I'm not sure where this folk-wisdom originated or how it was justified, but there is a better way.

ADDED:

It is true that the "wrong algorithm" can kill performance, but that's certainly not the only way.

I do a lot of performance tuning. On large software, what usually kills performance is too much data structure and too many layers of abstraction.

What seem like innocent one-liner method calls to abstract objects tempt you to forget what that call could cost you. Multiply this tendency over several layers of abstraction, and you find things like, for example, spending all your time allocating and collecting things like iterators and collection classes when simple arrays with indexing would have been sufficient (and no less maintainable) but less "proper".

That's the problem with "common wisdom". It often is exactly the opposite of wisdom.

回答9:

Most techniques are compiler-specific, as different compilers optimize differently.

If you want some optimization tips that are compiler agnostic, here are two for you:

Don't do it.
(for experts only!): Don't do it yet.

(apologies to Michael A. Jackson)

回答10:

++p is usually faster than p++ and --p is faster than p--, especially for objects of types with overloaded prefix and postfix increment and decrement operators, because prefix form just increments or decrements something and returns the new value, whereas the postfix form increments or decrements something, but has to keep the old value somewhere to return it. That is, instead of (replace int with your favorite class here)

for ( int i ( 0); i < x; i++)

always write

for ( int i ( 0); i < x; ++i)

回答11:

Profile your code to find what's actually taking the most time
Make sure you're using the right algorithm
If you're doing a lot of number crunching, be friendly with your cache and try to do everything possible on a chunk of data all at once so it doesn't have to be loaded into cache many times.

回答12:

Most optimizations are language-agnostic. Understand your code, and understand the hardware you're running on, and you can do most low-level optimizations.

Understand your problem domain, and suitable algorithms, and you can do any high-level optimizations.

The only C++-specific optimization advice I can think of is "understand what your code means". Understand when C++ copies temporaries around, understand which constructors and destructors are called when.

And prefer functors to function pointers, as the former can be inlined by the compiler. In general, move as much as possible to compile-time rather than runtime. Use templates to do the heavy lifting.

And of course, don't try optimizing until you've profiled and fount out that 1) optimization is necessary, and 2) what needs optimizing.

Edit: A comment asked about functors vs function pointers being inlined. Here's the explanation:

Functions are usually compiled in isolation. So what does the compiler know about a function F that takes a function pointer FP as argument? Nothing, it has to go look up where the F is called from, and maybe there it can find a definite clue as to which function FP points to. If it can determine that when called from here, FP will ALWAYS point to function G, then yes, it can make an inlined version of F, with G inlined inside it, for this particular call site. But it can't simply inline G without inlining F, because F might be called from other places as well, where it is passed a different function pointer. And even then, it requires some expensive global optimizations to determine that anything can be inlined.

Imagine you pass a functor such as this instead:

struct Ftor {
  void operator()() { ... }
};

so the function F looks like this:

void F(const FTor& ft) {
  ...
  ft();
  ...
}

now the compiler knows exacty which function gets called: Line 2 in the function calls Ftor::operator(). And so the call can be inlined easily.

Of course, in practice, you'd typically template it, so the function can be called with any functor type:

template <typename F>
void F(const F& ft) {
  ...
  ft();
  ...
}

回答13:

Sorry I don't have any references for you, but I do have another anecdote to add to the pile.

I had a rather large std::map that I was generating using Microsoft's CString object as the key. Performance was unacceptable. Since all of my strings were identical in length, I created a class wrapper around an old-fashioned fixed-size array of chars, to emulate the interface of CString. Unfortunately I can't remember the exact speedup, but it was significant, and the resulting performance was more than adequate.

Sometimes you need to know a little about the library constructs you rely upon.

回答14:

Template Meta-Programming can be used to move from dynamic polymorphism to compile time polymorphism, producing insanely optimal code in the process. Alexandrescu's Modern C++ Design is a good book which covers TMP in depth. Not every page is about optimization, but it is an oft recurring consideration in the program designs.

回答15:

Contrary to what many people have said, there are many language specific optimizations you can do. This is an excellent resource on Wikibooks. Keep this in mind when designing your code, then profile, profile, profile.

回答16:

The best optimization one can get is by revisiting the design, and after profiling the performance relevant parts/algorithms of the application. This is usually not language specific.

What I mean is (just as an idea) if you would get 30% performance improvement by selecting a slightly better algorithm (or collection/container class) the improvement you can expect from a C++ related refactoring would be at most 2%. A design improvement could give you anything above 30%.

If you have a concrete application, the best strategy is to measure and profile the application. Profiling gives usually the most instant idea of which parts are performance relevant.

回答17:

Here are a couple catch all paths for optimization.

There is no one way for optimization problems... they are always hand tuned to the hardware/software/system-considerations.

Assuming you have the best algorithm:

compile with "show assembly output" and "highest optimization"
look at the assembly output
identify inefficiencies that ruin compiler optimization or bad caching
Re-code the snippet
if it is still bad loop back to 2.
done

Example seen here: What is the fastest way to swap values in C?

General tips:

http://www.ceet.niu.edu/faculty/kuo/exp/exp6/figuree6-1.jpg:

Try it in floating point first
Try it in fixed point second
If you are really disparate and have lots of time and money, try it in assembly

http://www.asc.edu/seminars/optimization/fig1.gif:

Check if it is memory OR I/O OR CPU bound
Attack which ever is the limiting factor

回答18:

There are many things you can do to optimize C++ code. Some of the broader suggestions are listed above.

A couple specific examples are:

Using Structs of Arrays as opposed to Array of Structs (Classic example of DOP vs OOP)
Using restrict in C++ when you know two pointers will not point to the same memory location

In general, following a fundamental programming paradigm such as Data Oriented Programming will yield higher performance as DOP is specifically formulated to focus on performance (in all forms: memory layouts, cache coherency, runtime costs, etc.)

More info here: https://people.cs.clemson.edu/~dhouse/courses/405/papers/optimize.pdf