What are you favorite low level code optimization

2020-05-19 02:44发布

I know that you should only optimize things when it is deemed necessary. But, if it is deemed necessary, what are your favorite low level (as opposed to algorithmic level) optimization tricks.

For example: loop unrolling.

24条回答
SAY GOODBYE
2楼-- · 2020-05-19 03:25

++i can be faster than i++, because it avoids creating a temporary.

Whether this still holds for modern C/C++/Java/C# compilers, I don't know. It might well be different for user-defined types with overloaded operators, whereas in the case of simple integers it probably doesn't matter.

But I've come to like the syntax... it reads like "increment i" which is a sensible order.

查看更多
冷血范
3楼-- · 2020-05-19 03:25

Optimizing cache locality - for example when multiplying two matrices that don't fit into cache.

查看更多
Explosion°爆炸
4楼-- · 2020-05-19 03:25

Rolling up loops.

Seriously, the last time I needed to do anything like this was in a function that took 80% of the runtime, so it was worth trying to micro-optimize if I could get a noticeable performance increase.

The first thing I did was to roll up the loop. This gave me a very significant speed increase. I believe this was a matter of cache locality.

The next thing I did was add a layer of indirection, and put some more logic into the loop, which allowed me to only loop through the things I needed. This wasn't as much of a speed increase, but it was worth doing.

If you're going to micro-optimize, you need to have a reasonable idea of two things: the architecture you're actually using (which is vastly different from the systems I grew up with, at least for micro-optimization purposes), and what the compiler will do for you.

A lot of the traditional micro-optimizations trade space for time. Nowadays, using more space increases the chances of a cache miss, and there goes your performance. Moreover, a lot of them are now done by modern compilers, and typically better than you're likely to do them.

Currently, you should (a) profile to see if you need to micro-optimize, and then (b) try to trade computation for space, in the hope of keeping as much as possible in cache. Finally, run some tests, so you know if you've improved things or screwed them up. Modern compilers and chips are far too complex for you to keep a good mental model, and the only way you'll know if some optimization works or not is to test.

查看更多
走好不送
5楼-- · 2020-05-19 03:27

I was amazed at the speedup I got by replacing a for loop adding numbers together in structs:

const unsigned long SIZE = 100000000;

typedef struct {
    int a;
    int b;
    int result;
} addition;

addition *sum;

void start() {
    unsigned int byte_count = SIZE * sizeof(addition);

    sum = malloc(byte_count);
    unsigned int i = 0;

    if (i < SIZE) {
        do {
            sum[i].a = i;
            sum[i].b = i;
            i++;
        } while (i < SIZE);
    }    
}

void test_func() {
    unsigned int i = 0;

    if (i < SIZE) { // this is about 30% faster than the more obvious for loop, even with O3
        do {
            addition *s1 = &sum[i];
            s1->result = s1->b + s1->a;
            i++;
        } while ( i<SIZE );
    }
}

void finish() {
    free(sum);
}

Why doesn't gcc optimise for loops into this? Or is there something I missed? Some cache effect?

查看更多
看我几分像从前
6楼-- · 2020-05-19 03:29

Liberal use of __restrict to eliminate load-hit-store stalls.

查看更多
放荡不羁爱自由
7楼-- · 2020-05-19 03:31

Jon Bentley's Writing Efficient Programs is a great source of low- and high-level techniques -- if you can find a copy.

查看更多
登录 后发表回答