This is related to How to force const propagation through an inline function? Clang has an integrated assembler; and it does not use the system's assembler (which is often GNU AS (GAS)). Non-Clang performed the math early, and everything "just worked".
I say "early" because @n.m. objected to describing it as "math performed by the preprocessor." But the idea is the value is known at compile time, and it should be evaluated early, like when the preprocessor evaluates a #if (X % 32 == 0)
.
Below, Clang 3.6 is complaining about violating a constraint. It appears the constant is not being propagated throughout:
$ export CXX=/usr/local/bin/clang++
$ $CXX --version
clang version 3.6.0 (tags/RELEASE_360/final)
Target: x86_64-apple-darwin12.6.0
...
$ make
/usr/local/bin/clang++ -DNDEBUG -g2 -O3 -Wall -fPIC -arch i386 -arch x86_64 -pipe -Wno-tautological-compare -c integer.cpp
In file included from integer.cpp:8:
In file included from ./integer.h:7:
In file included from ./secblock.h:7:
./misc.h:941:44: error: constraint 'I' expects an integer constant expression
__asm__ ("rolb %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%8)));
^~~~~~~~~~~~~~~~~~~~
./misc.h:951:44: error: constraint 'I' expects an integer constant expression
...
The functions above are inlined template specializations:
template<> inline byte rotrFixed<byte>(byte x, unsigned int y)
{
// The I constraint ensures we use the immediate-8 variant of the
// shift amount y. However, y must be in [0, 31] inclusive. We
// rely on the preprocessor to propoagte the constant and perform
// the modular reduction so the assembler generates the instruction.
__asm__ ("rorb %1, %0" : "+mq" (x) : "I" ((unsigned char)(y%8)));
return x;
}
They are being invoked with a const value, so the rotate amount is known at compile time. A typical caller might look like:
unsigned int x1 = rotrFixed<byte>(1, 4);
unsigned int x2 = rotrFixed<byte>(1, 32);
None of these [questionable] tricks would be required if GCC or Clang provided an intrinsic to perform the rotate in near constant time. I'd even settle for "perform the rotate" since they don't even have that.
What is the trick needed to get Clang to resume performing the preprocessing of the const value?
Astute readers will recognize rotrFixed<byte>(1, 32)
could be undefined behavior if using a traditional C/C++ rotate. So we drop into assembly to avoid the C/C++ limitations and enjoy the 1 instruction speedup.
Curious reader may wonder why we would do this. The cryptographers call out the specs, and sometimes those specs are not sympathetic to the underlying hardware or standard bodies. Rather than changing the cryptographer's specification, we attempt to provide it verbatim to make audits easier.
A bug is opened for this issue: LLVM Bug 24226 - Constant not propagated into inline assembly, results in "constraint 'I' expects an integer constant expression".
I don't know what guarantees Clang makes, but I know the compiler and integrated assembler claim to be compatible with GCC and GNU's assembler. And GCC and GAS provide the propagation of the constant value.