How exactly does gcc do optimizations?

2020-07-14 05:57发布

In order to know how exactly the gcc do the optimization, I have written two program compiling with -O2, but there is some difference of the assembly code. In my programs, I want to output "hello" in the loop, and add some delay between each output. These two programs are only for illustrating my question, and I know I can using volatile or asm in program 1 to achieve my purpose.

Program 1

#include <stdio.h>

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (++i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
    }
}

Compile with -O2, the assembly code is:

Disassembly of section .text.startup:

00000000 <_main>:
#include <stdio.h>

int main(int argc, char **argv)
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   83 ec 10                sub    $0x10,%esp
   9:   e8 00 00 00 00          call   e <_main+0xe>
   e:   66 90                   xchg   %ax,%ax
  10:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  17:   e8 00 00 00 00          call   1c <_main+0x1c>
  1c:   eb f2                   jmp    10 <_main+0x10>
  1e:   90                      nop
  1f:   90                      nop

Program 2

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
        i++;
    }
}

Compile with -O2, the assembly code is:

Disassembly of section .text.startup:

00000000 <_main>:
#include <stdio.h>

int main(int argc, char **argv)
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   83 ec 10                sub    $0x10,%esp
   9:   e8 00 00 00 00          call   e <_main+0xe>
   e:   31 c0                   xor    %eax,%eax
  10:   83 c0 01                add    $0x1,%eax
  13:   3d ff ff ff 1f          cmp    $0x1fffffff,%eax
  18:   76 f6                   jbe    10 <_main+0x10>
  1a:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
    while (1) {
        if (i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
        i++;
  21:   e8 00 00 00 00          call   26 <_main+0x26>

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (i > 0x1fffffffUL) {
  26:   31 c0                   xor    %eax,%eax
  28:   eb e6                   jmp    10 <_main+0x10>
            printf("hello\n");
  2a:   90                      nop
  2b:   90                      nop
  2c:   90                      nop
  2d:   90                      nop
  2e:   90                      nop
  2f:   90                      nop

In program 1, the increase of i is optimized out, but it's not in program 2. Why this happens? What rules is gcc using when optimizing with -O2 for these two programs?

2条回答
【Aperson】
2楼-- · 2020-07-14 06:23

Asking "why" about optimizers is usually a waste of time, because there are no "rules" by which optimizers operate -- other than "as if": The optimizer may not change the observable behaviour of conforming code.

The "observable behaviour" of both your programs is to print "hello" repeatedly.

In your first program, the counting is optimized away, making the observable behaviour happen faster. That is the job of an optimizer. Be happy your code is more efficient now!

In your second program, the counting is not optimized away, because somehow the optimizer -- in this version of this compiler with this setting -- did not see that it could do without it. Why? Who knows (other than the maintainer of the compiler's optimizer module)?

If your desired behaviour is to have a delay between outputs, use something like thrd_sleep(). Empty count loops were a way to delay BASIC 2.0 programs on the C64, but they should not be used in C, for the exact reason you just observed: You never know what the optimizer does.

查看更多
ら.Afraid
3楼-- · 2020-07-14 06:40

The branching in the if statement now depends on something that happened in the previous iteration of the loop. In particular, the compiler can easily determine in program 1 that i is incremented in every iteration of the while loop (as it is right at the top), while this is not the case in program 2.

Anyway, compiler optimizations are very complicated. See below:

gcc -O2 is a shortcut for these flags: (from the documentation)

      -fauto-inc-dec 
      -fbranch-count-reg 
      -fcombine-stack-adjustments 
      -fcompare-elim 
      -fcprop-registers 
      -fdce 
      -fdefer-pop 
      -fdelayed-branch 
      -fdse 
      -fforward-propagate 
      -fguess-branch-probability 
      -fif-conversion2 
      -fif-conversion 
      -finline-functions-called-once 
      -fipa-pure-const 
      -fipa-profile 
      -fipa-reference 
      -fmerge-constants 
      -fmove-loop-invariants 
      -freorder-blocks 
      -fshrink-wrap 
      -fsplit-wide-types 
      -fssa-backprop 
      -fssa-phiopt 
      -ftree-bit-ccp 
      -ftree-ccp 
      -ftree-ch 
      -ftree-coalesce-vars 
      -ftree-copy-prop 
      -ftree-dce 
      -ftree-dominator-opts 
      -ftree-dse 
      -ftree-forwprop 
      -ftree-fre 
      -ftree-phiprop 
      -ftree-sink 
      -ftree-slsr 
      -ftree-sra 
      -ftree-pta 
      -ftree-ter 
      -funit-at-a-time
      -fthread-jumps 
      -falign-functions  -falign-jumps 
      -falign-loops  -falign-labels 
      -fcaller-saves 
      -fcrossjumping 
      -fcse-follow-jumps  -fcse-skip-blocks 
      -fdelete-null-pointer-checks 
      -fdevirtualize -fdevirtualize-speculatively 
      -fexpensive-optimizations 
      -fgcse  -fgcse-lm  
      -fhoist-adjacent-loads 
      -finline-small-functions 
      -findirect-inlining 
      -fipa-cp 
      -fipa-cp-alignment 
      -fipa-sra 
      -fipa-icf 
      -fisolate-erroneous-paths-dereference 
      -flra-remat 
      -foptimize-sibling-calls 
      -foptimize-strlen 
      -fpartial-inlining 
      -fpeephole2 
      -freorder-blocks-algorithm=stc 
      -freorder-blocks-and-partition -freorder-functions 
      -frerun-cse-after-loop  
      -fsched-interblock  -fsched-spec 
      -fschedule-insns  -fschedule-insns2 
      -fstrict-aliasing -fstrict-overflow 
      -ftree-builtin-call-dce 
      -ftree-switch-conversion -ftree-tail-merge 
      -ftree-pre 
      -ftree-vrp 
      -fipa-ra

Each of these flags corresponds to a different possible optimization the compiler is allowed to make.

查看更多
登录 后发表回答