In Linux kernel code there is a macro used to test bit ( Linux version 2.6.2 ):
#define test_bit(nr, addr) \
(__builtin_constant_p((nr)) \
? constant_test_bit((nr), (addr)) \
: variable_test_bit((nr), (addr)))
where constant_test_bit
and variable_test_bit
are defined as:
static inline int constant_test_bit(int nr, const volatile unsigned long *addr )
{
return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}
static __inline__ int variable_test_bit(int nr, const volatile unsigned long *addr)
{
int oldbit;
__asm__ __volatile__(
"btl %2,%1\n\tsbbl %0,%0"
:"=r" (oldbit)
:"m" (ADDR),"Ir" (nr));
return oldbit;
}
I understand that __builtin_constant_p
is used to detect whether a variable is compile time constant or unknown. My question is: Is there any performance difference between these two functions when the argument is a compile time constant or not? Why use the C version when it is and use the assembly version when it's not?
UPDATE: The following main function is used to test the performance:
constant, call constant_test_bit:
int main(void) {
unsigned long i, j = 21;
unsigned long cnt = 0;
srand(111)
//j = rand() % 31;
for (i = 1; i < (1 << 30); i++) {
j = (j + 1) % 28;
if (constant_test_bit(j, &i))
cnt++;
}
if (__builtin_constant_p(j))
printf("j is a compile time constant\n");
return 0;
}
This correctly outputs the sentence j is a...
For the other situations just uncomment the line which assigns a "random" number to j
and change the function name accordingly. When that line is uncommented the output will be empty, and this is expected.
I use gcc test.c -O1
to compile, and here is the result:
constant, constant_test_bit:
$ time ./a.out
j is compile time constant
real 0m0.454s
user 0m0.450s
sys 0m0.000s
constant, variable_test_bit( omit time ./a.out
, same for the following ):
j is compile time constant
real 0m0.885s
user 0m0.883s
sys 0m0.000s
variable, constant_test_bit:
real 0m0.485s
user 0m0.477s
sys 0m0.007s
variable, variable_test_bit:
real 0m3.471s
user 0m3.467s
sys 0m0.000s
I have each version runs several times, and the above results are the typical values of them. It seems the constant_test_bit
function is always faster than the variable_test_bit
function, no matter whether the parameter is a compile time constant or not... For the last two results( when j
is not constant ) the variable version is even dramatically slower than the constant one.
Am I missing something here?