What are you favorite low level code optimization

2020-05-19 02:44发布

I know that you should only optimize things when it is deemed necessary. But, if it is deemed necessary, what are your favorite low level (as opposed to algorithmic level) optimization tricks.

For example: loop unrolling.

24条回答
手持菜刀,她持情操
2楼-- · 2020-05-19 03:12

In SQL, if you only need to know whether any data exists or not, don't bother with COUNT(*):

SELECT 1 FROM table WHERE some_primary_key = some_value

If your WHERE clause is likely return multiple rows, add a LIMIT 1 too.

(Remember that databases can't see what your code's doing with their results, so they can't optimise these things away on their own!)

查看更多
一夜七次
3楼-- · 2020-05-19 03:13

One of the most useful in scientific code is to replace pow(x,4) with x*x*x*x. Pow is almost always more expensive than multiplication. This is followed by

  for(int i = 0; i < N; i++)
  {
    z += x/y;
  }

to

  double denom = 1/y;
  for(int i = 0; i < N; i++) 
  {
    z += x*denom;
  }

But my favorite low level optimization is to figure out which calculations can be removed from a loop. Its always faster to do the calculation once rather than N times. Depending on your compiler, some of these may be automatically done for you.

查看更多
Melony?
4楼-- · 2020-05-19 03:13

Counting down a loop. It's cheaper to compare against 0 than N:

for (i = N; --i >= 0; ) ...

Shifting and masking by powers of two is cheaper than division and remainder, / and %

#define WORD_LOG 5
#define SIZE (1 << WORD_LOG)
#define MASK (SIZE - 1)

uint32_t bits[K]

void set_bit(unsigned i)
{
    bits[i >> WORD_LOG] |= (1 << (i & MASK))
}

Edit

(i >> WORD_LOG) == (i / SIZE) and
(i & MASK) == (i % SIZE)

because SIZE is 32 or 2^5.

查看更多
一夜七次
5楼-- · 2020-05-19 03:15
gcc -O2

Compilers do a lot better job of it than you can.

查看更多
Juvenile、少年°
6楼-- · 2020-05-19 03:15
  • Recycling the frame-pointer all of a sudden
  • Pascal calling-convention
  • Rewrite stack-frame tail call optimizarion (although it sometimes messes with the above)
  • Using vfork() instead of fork() before exec()
  • And one I am still looking for, an excuse to use: data driven code-generation at runtime
查看更多
Evening l夕情丶
7楼-- · 2020-05-19 03:16

I've found that changing from a pointer to indexed access may make a difference; the compiler has different instruction forms and register usages to choose from. Vice versa, too. This is extremely low-level and compiler dependent, though, and only good when you need that last few percent.

E.g.

for (i = 0;  i < n;  ++i)
    *p++ = ...; // some complicated expression

vs.

for (i = 0;  i < n;  ++i)
    p[i] = ...; // some complicated expression
查看更多
登录 后发表回答