Having had a look at previous questions 1, 2 , I was wondering if I can force the compiler to perform a constant folding for the following code which prints prime numbers.
#include <iostream>
using namespace std;
inline bool is_prime(int n)
{
if(n<2)
return false;
for(int i=2;i*i<=n;i++)
if(n%i==0)
return false;
return true;
}
int main()
{
for(int i=0;i<20;i++)
if(is_prime(i))
cout<<i<<endl;
return 0;
}
And I build it via:
g++ -O3 -S main.cpp -o main.asm
As the result are a few:
2,3,5,7,11,13,17,19
I would like to force compiler look at the code similar to
for(int x:{2,3,5,7,11,13,17,19})
cout<<x<<endl;
or
cout<< 2 <<endl;
cout<< 3 <<endl;
cout<< 5 <<endl;
cout<< 7 <<endl;
cout<< 11 <<endl;
cout<< 13 <<endl;
cout<< 17 <<endl;
cout<< 19 <<endl;
But reading the assembly shows that none happens.
I even used __builtin_expect
but it didn't work.
Is there any way to force the compiler optimizer to read the for loop and use the advantage that the output data is known?
I would like to do it without using template meta-programming.
PS. My real aim is just testing the compiler rather than an efficient method to calculate prime numbers. I just want to show off to my friends how much C++ compiler is powerful.
If separation of is_prime
is matter of concern, I put everything inside the main and no difference was observed:
#include <iostream>
using namespace std;
int main()
{
for(int n=2;n<20;n++)
{
bool is_prime=true;
for(int i=2;i*i<=n;i++)
if(n%i==0)
{
is_prime=false;
break;
}
if(is_prime)
cout<<n<<endl;
}
return 0;
}
There is even a further example that remains no excuse for the compiler:
#include <iostream>
#include <vector>
using namespace std;
int prime_after6000()
{
int n=6000;
do
{
bool is_prime=true;
for(int i=2;i*i<=n;i++)
if(n%i==0)
{
is_prime=false;
break;
}
if(is_prime)
return n;
n++;
}while(true);
}
int main()
{
cout<<prime_after6000()<<endl;
return 0;
}
assembly:
...
main:
.LFB1907:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $6000, %esi ;;;;;;;;;;;;;;;;;;;; bad
.L18:
testb $1, %sil
je .L15
movl $2, %ecx
jmp .L16
.p2align 4,,10
.p2align 3
.L17:
movl %esi, %eax
cltd
idivl %ecx
testl %edx, %edx
je .L15
.L16:
addl $1, %ecx
movl %ecx, %eax
imull %ecx, %eax
cmpl %esi, %eax
jle .L17
movl $_ZSt4cout, %edi
call _ZNSolsEi
movq %rax, %rdi
call _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_
xorl %eax, %eax
addq $8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L15:
.cfi_restore_state
addl $1, %esi
jmp .L18
.cfi_endproc
.LFE1907:
.size main, .-main
.p2align 4,,15
.type _GLOBAL__sub_I__Z15prime_after6000v, @function
_GLOBAL__sub_I__Z15prime_after6000v:
...
There's a fundamental misunderstanding of compilers here. Let's examine the program you've written very carefully and think about what you're expecting from the compiler to do for you.
The main characteristic of the program is that it doesn't take any input, but it emits output by writing to
cout
. Keep in mind that theis_prime
function is not a compiler intrinsic; the compiler treats it as just another function. This is important and I'll come back to it later.Now how would the compiler transform the program the way you described? How can it do something like that? That is, how can the compiler transform those two nested loops to a simple sequence of statements that write integers to
cout
? The only way it can possibly do that is by executing the program to figure out all the values that needs to be written tocout
.That doesn't make any sense, does it? Let's look at the big picture here and consider all programs (or sequences of statements) that have the same characteristic; those that do not take any input but emit output. The question would become: Why doesn't the compiler execute the source code and just emit code that writes the output values? Because of the following reasons:
cout
. The standard output stream can be redirected to something weird. So programs that don't take any input aren't necessarily easier for the compiler to optimize.That said, simple pieces of code that can be evaluated in a very limited amount of time are indeed evaluated by the compiler. This optimization is called constant folding. Pieces of code that don't have any impact on the state of the program can be eliminated without executing them. For example, if you removed
cout<<i<<endl;
, the compiler will just optimize away the rest of the code. This is called dead code elimination. Compilers do these optimizations because they can be done by the compiler in a systematic way and because they are very effective on real codebases.But what happens if the
is_prime
function was a compiler intrinsic? In this case, the compiler would probably have a built-in table of commonly used prime numbers and a very fast implementation of primality testing. You can then expect from the compiler to unroll theloop
in the main function several times, maybe even fully, containing only output statements, essentially performing the transformation you're looking for.