This question is motivated by me implementing cryptographic algorithms (e.g. SHA-1) in C/C++, writing portable platform-agnostic code, and thoroughly avoiding undefined behavior.
Suppose that a standardized crypto algorithm asks you to implement this:
b = (a << 31) & 0xFFFFFFFF
where a
and b
are unsigned 32-bit integers. Notice that in the result, we discard any bits above the least significant 32 bits.
As a first naive approximation, we might assume that int
is 32 bits wide on most platforms, so we would write:
unsigned int a = (...);
unsigned int b = a << 31;
We know this code won't work everywhere because int
is 16 bits wide on some systems, 64 bits on others, and possibly even 36 bits. But using stdint.h
, we can improve this code with the uint32_t
type:
uint32_t a = (...);
uint32_t b = a << 31;
So we are done, right? That's what I thought for years. ... Not quite. Suppose that on a certain platform, we have:
// stdint.h
typedef unsigned short uint32_t;
The rule for performing arithmetic operations in C/C++ is that if the type (such as short
) is narrower than int
, then it gets widened to int
if all values can fit, or unsigned int
otherwise.
Let's say that the compiler defines short
as 32 bits (signed) and int
as 48 bits (signed). Then these lines of code:
uint32_t a = (...);
uint32_t b = a << 31;
will effectively mean:
unsigned short a = (...);
unsigned short b = (unsigned short)((int)a << 31);
Note that a
is promoted to int
because all of ushort
(i.e. uint32
) fits into int
(i.e. int48
).
But now we have a problem: shifting non-zero bits left into the sign bit of a signed integer type is undefined behavior. This problem happened because our uint32
was promoted to int48
- instead of being promoted to uint48
(where left-shifting would be okay).
Here are my questions:
Is my reasoning correct, and is this a legitimate problem in theory?
Is this problem safe to ignore because on every platform the next integer type is double the width?
Is a good idea to correctly defend against this pathological situation by pre-masking the input like this?:
b = (a & 1) << 31;
. (This will necessarily be correct on every platform. But this could make a speed-critical crypto algorithm slower than necessary.)
Clarifications/edits:
I'll accept answers for C or C++ or both. I want to know the answer for at least one of the languages.
The pre-masking logic may hurt bit rotation. For example, GCC will compile
b = (a << 31) | (a >> 1);
to a 32-bit bit-rotation instruction in assembly language. But if we pre-mask the left shift, it is possible that the new logic is not translated into bit rotation, which means now 4 operations are performed instead of 1.
Q1: Masking before the shift does prevent undefined behavior that OP has concern.
Q2: "... because on every platform the next integer type is double the width?" --> no. The "next" integer type could be less than 2x or even the same size.
The following is well defined for all compliant C compilers that have
uint32_t
.Q3:
uint32_t a; uint32_t b = (a & 1) << 31;
is not expected to incur code that performs a mask - it is not needed in the executable - just in the source. If a mask does occur, get a better compiler should speed be an issue.As suggested, better to emphasize the unsigned-ness with these shifts.
@John Bollinger good answer well details how to handle OP's specific problem.
The general problem is how to form a number that is of at least
n
bits, a certain sign-ness and not subject to surprising integer promotions - the core of OP's dilemma. The below fulfills this by invoking anunsigned
operation that does not change the value - effective a no-op other than type concerns. The product will be at least the width ofunsigned
oruint32_t
. Casting, in general, may narrow the type. Casting needs to be avoided unless narrowing is certain to not occur. An optimization compiler will not create unnecessary code.Speaking to the C side of the problem,
It is a problem that I had not considered before, but I agree with your analysis. C defines the behavior of the
<<
operator in terms of the type of the promoted left operand, and it it conceivable that the integer promotions result in that being (signed)int
when the original type of that operand isuint32_t
. I don't expect to see that in practice on any modern machine, but I'm all for programming to the actual standard as opposed to my personal expectations.C does not require such a relationship between integer types, though it is ubiquitous in practice. If you are determined to rely only on the standard, however -- that is, if you are taking pains to write strictly conforming code -- then you cannot rely on such a relationship.
The type
unsigned long
is guaranteed to have at least 32 value bits, and it is not subject to promotion to any other type under the integer promotions. On many common platforms it has exactly the same representation asuint32_t
, and may even be the same type. Thus, I would be inclined to write the expression like this:Or if you need
a
only as an intermediate value in the computation ofb
, then declare it as anunsigned long
to begin with.Taking a clue from this question about possible UB in
uint32 * uint32
arithmetic, the following simple approach should work in C and C++:The integer constant
0u
has typeunsigned int
. This promotes the additiona + 0u
touint32_t
orunsigned int
, whichever is wider. Because the type has rankint
or higher, no more promotion occurs, and the shift can be applied with the left operand beinguint32_t
orunsigned int
.The final cast back to
uint32_t
will just suppress potential warnings about a narrowing conversion (say ifint
is 64 bits).A decent C compiler should be able to see that adding zero is a no-op, which is less onerous than seeing that a pre-mask has no effect after an unsigned shift.
To avoid unwanted promotion, you may use the greater type with some typedef, as
For this segment of code:
To promote
a
to a unsigned type instead of signed type, use:When both sides of
<<
operator is an unsigned type, then this line in 6.3.1.8 (C standard draft n1570) applies:The problem you are describing is caused you use
31
which issigned int type
so another line in 6.3.1.8forces
a
to promoted to a signed typeUpdate:
This answer is not correct because 6.3.1.1(2) (emphasis mine):
and footnote 58 (emphasis mine):
Since only integer promotion is happening and not common arithmetic conversion, using
31u
does not guaranteea
to be converted tounsigned int
as stated above.