C++ constant folding a loop for prime numbers

2019-03-26 03:17发布

Having had a look at previous questions 1, 2 , I was wondering if I can force the compiler to perform a constant folding for the following code which prints prime numbers.

#include <iostream>

using namespace std;

inline bool is_prime(int n)
{
    if(n<2)
        return false;
    for(int i=2;i*i<=n;i++)
        if(n%i==0)
            return false;
    return true;
}

int main()
{
    for(int i=0;i<20;i++)
        if(is_prime(i))
            cout<<i<<endl;
    return 0;
}

And I build it via:

g++ -O3 -S main.cpp -o main.asm

As the result are a few:

2,3,5,7,11,13,17,19

I would like to force compiler look at the code similar to

for(int x:{2,3,5,7,11,13,17,19})
    cout<<x<<endl;

or

cout<<  2 <<endl;
cout<<  3 <<endl;
cout<<  5 <<endl;
cout<<  7 <<endl;
cout<< 11 <<endl;
cout<< 13 <<endl;
cout<< 17 <<endl;
cout<< 19 <<endl;

But reading the assembly shows that none happens.

I even used __builtin_expect but it didn't work.

Is there any way to force the compiler optimizer to read the for loop and use the advantage that the output data is known?

I would like to do it without using template meta-programming.

PS. My real aim is just testing the compiler rather than an efficient method to calculate prime numbers. I just want to show off to my friends how much C++ compiler is powerful.


If separation of is_prime is matter of concern, I put everything inside the main and no difference was observed:

#include <iostream>

using namespace std;

int main()
{
    for(int n=2;n<20;n++)
    {
        bool is_prime=true;
        for(int i=2;i*i<=n;i++)
            if(n%i==0)
            {
                is_prime=false;
                break;
            }
        if(is_prime)
            cout<<n<<endl;
    }
    return 0;
}

There is even a further example that remains no excuse for the compiler:

#include <iostream>
#include <vector>

using namespace std;

int prime_after6000()
{
    int n=6000;
    do
    {
        bool is_prime=true;
        for(int i=2;i*i<=n;i++)
            if(n%i==0)
            {
                is_prime=false;
                break;
            }
        if(is_prime)
            return n;
        n++;
    }while(true);
}

int main()
{
    cout<<prime_after6000()<<endl;
    return 0;
}

assembly:

...
main:
.LFB1907:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $6000, %esi   ;;;;;;;;;;;;;;;;;;;; bad
.L18:
    testb   $1, %sil
    je  .L15
    movl    $2, %ecx
    jmp .L16
    .p2align 4,,10
    .p2align 3
.L17:
    movl    %esi, %eax
    cltd
    idivl   %ecx
    testl   %edx, %edx
    je  .L15
.L16:
    addl    $1, %ecx
    movl    %ecx, %eax
    imull   %ecx, %eax
    cmpl    %esi, %eax
    jle .L17
    movl    $_ZSt4cout, %edi
    call    _ZNSolsEi
    movq    %rax, %rdi
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
    xorl    %eax, %eax
    addq    $8, %rsp
    .cfi_remember_state
    .cfi_def_cfa_offset 8
    ret
    .p2align 4,,10
    .p2align 3
.L15:
    .cfi_restore_state
    addl    $1, %esi
    jmp .L18
    .cfi_endproc
.LFE1907:
    .size   main, .-main
    .p2align 4,,15
    .type   _GLOBAL__sub_I__Z15prime_after6000v, @function
_GLOBAL__sub_I__Z15prime_after6000v:
...

1条回答
三岁会撩人
2楼-- · 2019-03-26 03:45

There's a fundamental misunderstanding of compilers here. Let's examine the program you've written very carefully and think about what you're expecting from the compiler to do for you.

The main characteristic of the program is that it doesn't take any input, but it emits output by writing to cout. Keep in mind that the is_prime function is not a compiler intrinsic; the compiler treats it as just another function. This is important and I'll come back to it later.

Now how would the compiler transform the program the way you described? How can it do something like that? That is, how can the compiler transform those two nested loops to a simple sequence of statements that write integers to cout? The only way it can possibly do that is by executing the program to figure out all the values that needs to be written to cout.

That doesn't make any sense, does it? Let's look at the big picture here and consider all programs (or sequences of statements) that have the same characteristic; those that do not take any input but emit output. The question would become: Why doesn't the compiler execute the source code and just emit code that writes the output values? Because of the following reasons:

  • What if the program takes too much time to execute? What if there was a bug in it that makes it run for an unexpected amount of time? There is even no guarantee that the program will ever halt. What's the compiler supposed to do exactly?
  • Ultimately, the purpose of the compiler is not to execute the source code, but to emit a functionally equivalent native code that is potentially optimized. After all, if the program does not take any input (like your code), you could just, as easily, compile the code and run it once to see the output. The code has to be executed anyway, by the compiler or by running the executable binary, and it will take the same amount of time either way. In both cases, the code has to be compiled and executed. So such optimization adds no real value. However, this is in contrast to templates, which are intended to be reduced to regular code by the compiler at compile-time. In addition, interpretation would be a better execution model for such programs. You don't want to bother with compiling code? Go ahead and use a Python interpreter or whatever interpreter.
  • Implementing such optimization can be difficult or impossible in general. What if the mechanism used to emit output had side-effects that can change future output values? The compiler does not exactly know what happens when you write to cout. The standard output stream can be redirected to something weird. So programs that don't take any input aren't necessarily easier for the compiler to optimize.

That said, simple pieces of code that can be evaluated in a very limited amount of time are indeed evaluated by the compiler. This optimization is called constant folding. Pieces of code that don't have any impact on the state of the program can be eliminated without executing them. For example, if you removed cout<<i<<endl;, the compiler will just optimize away the rest of the code. This is called dead code elimination. Compilers do these optimizations because they can be done by the compiler in a systematic way and because they are very effective on real codebases.

But what happens if the is_prime function was a compiler intrinsic? In this case, the compiler would probably have a built-in table of commonly used prime numbers and a very fast implementation of primality testing. You can then expect from the compiler to unroll the loop in the main function several times, maybe even fully, containing only output statements, essentially performing the transformation you're looking for.

查看更多
登录 后发表回答