shifting a negative signed value is undefined

2019-09-14 22:06发布

问题:

I am starring at the original JPEG standard (ITU 81), in particular the figure Figure F.12: extending the sign bit of a decoded value in V:

For reference the SLL terms means: shift left logical operation (see page 15 of the PDF)

Now the famous libjpeg implementation decided to implement it this way (quite direct transcription):

/*
 * Figure F.12: extend sign bit.
 * On some machines, a shift and add will be faster than a table lookup.
 */

#ifdef AVOID_TABLES

#define HUFF_EXTEND(x,s)  ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))

#else

#define HUFF_EXTEND(x,s)  ((x) < extend_test[s] ? (x) + extend_offset[s] : (x))

static const int extend_test[16] =   /* entry n is 2**(n-1) */
  { 0, 0x0001, 0x0002, 0x0004, 0x0008, 0x0010, 0x0020, 0x0040, 0x0080,
    0x0100, 0x0200, 0x0400, 0x0800, 0x1000, 0x2000, 0x4000 };

static const int extend_offset[16] = /* entry n is (-1 << n) + 1 */
  { 0, ((-1)<<1) + 1, ((-1)<<2) + 1, ((-1)<<3) + 1, ((-1)<<4) + 1,
    ((-1)<<5) + 1, ((-1)<<6) + 1, ((-1)<<7) + 1, ((-1)<<8) + 1,
    ((-1)<<9) + 1, ((-1)<<10) + 1, ((-1)<<11) + 1, ((-1)<<12) + 1,
    ((-1)<<13) + 1, ((-1)<<14) + 1, ((-1)<<15) + 1 };

#endif /* AVOID_TABLES */

Quite obviously shifting a negative signed value is UB, so I am wondering what the original author of the JPEG standard actually meant here. Is the JPEG standard limited to Two's complement number representation ?

回答1:

Is the JPEG standard limited to Two's complement number representation ?

With the chart using SLL rather than *2, the flow chart is relying of 2's complement implementation.

C, OTOH, is not limited to 2's complement nor using shifts in a certain "2's complement" way. Better to code without that assumption. Using unsigned types is a good first step.

// ((-1)<<15) + 1
((-1u)<<15) + 1

Need to see applications of HUFF_EXTEND() for a deeper answer.



回答2:

Shifting negative values is only undefined behavior in the C language. On assembler level, such shifts are perfectly fine. A logical/arithmetic shift left instruction will move the MSB into the carry bit and that's it. The CPU will not halt and catch fire in undefined ways.

That being said, you can dodge UB in C by converting your signed number to unsigned. Then you will merely have implementation-defined behavior. For example, (int)(-1u<<1) does actually not invoke UB, even though such code looks quite questionable.

The safe approach is to perform all calculations on unsigned numbers (uint32_t) and convert to signed only when it is actually needed. Never perform any form of bitwise operations on signed types.

I would completely ignore compatibility with wildly exotic/fictional systems that do not use two's complement. Focus on compatibility with real-world mainstream computers.



回答3:

Quite obviously, shifting a negative quantity has the same effect as multiply/divide by two raised to the power n (for a << n/a >> n where a is signed) This is never undefined behaviour for a negative left operand, and is well and preciselly defined for C/C++.

Well, despite of the fact that the accepted answer has already been selected and the amount of confussion about the << and >> operators this question has generated, I'll try to clarify what's undefined and what is not.

  • << operator behaves equally on signed and unsigned left operand quantities, as the net effect on an integer is to numerically double it, and the process is the same for signed and unsigned quantities, just insert a 0 on the right part of the number, left shifting all the number bits the number indicated as the right operator.

  • >> operator behaves differently on signed and unsigned left operands, as the net effect is, again, like integer division by two raised to the right operand, this means, for a two's complement representation to extend the most signifiant bit (0 on the left becomes 00 and 1 on the left becomes 11) to achieve the net effect of a division by two raised to the power of the right operand (-24 >> 3 becomes -3, as dividing by 2^3, and 24 >> 3 becomes 3) When using unsigned integers, this behaviour changes (0 becomes 00 and 1 becomes 01) making again the shifting like a division by two raised to the power of the right operator.

As the standards says, behaviour is undefined if you try to left/right shift a negative amount of bits (right operator must be > 0) or an amount greater or equal than the left operator's size in bits. But this only affects the right operand, and not the left one.

On trying to get an explanation about JPEG decoding, just try to think that JPEG decoding uses COSIN transformation (which is defined for real vectors) on discrete real approximations encoded as signed quantities and or that, it uses signed integers to evaluate low precision samples, so they never deal with unsigned quantities (the library designers assume that operating on integers is faster than operating on floating point quantities).

NOTE

  • -24 is binary 1111111...1111101000, and right shifting by 3 makes it 11111...11111101, or -3. If we shift it again, it becomes 111...1110 or -2 (-1.5 rounded to minus infinity), and again 1111...1111 or -1.
  • 24 is 00000000....00011000 and right shifting it becomes 000000...0000011 which is 3. If we shift it again, it becomes 000...0001 or 1 (1.5 rounded to minus infinity), and again we get 000...0000 or 0 (0.5 rounded to minus infinity).

(Beware that negative shifts made with this approach truncate up to minus infinity, making -1 >> 1 to become -1, as -0.5 truncates to -1 and not to 0.)



标签: c jpeg libjpeg