可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have 3 unsigned bytes that are coming over the wire separately.

[byte1, byte2, byte3]

I need to convert these to a signed 32-bit value but I am not quite sure how to handle the sign of the negative values.

I thought of copying the bytes to the upper 3 bytes in the int32 and then shifting everything to the right but I read this may have unexpected behavior.

Is there an easier way to handle this?

The representation is using two's complement.

回答1:

You could use:

uint32_t sign_extend_24_32(uint32_t x) {
    const int bits = 24;
    uint32_t m = 1u << (bits - 1);
    return (x ^ m) - m;
}

This works because:

if the old sign was 1, then the XOR makes it zero and the subtraction will set it and borrow through all higher bits, setting them as well.
if the old sign was 0, the XOR will set it, the subtract resets it again and doesn't borrow so the upper bits stay 0.

Templated version

template<class T>
T sign_extend(T x, const int bits) {
    T m = 1;
    m <<= bits - 1;
    return (x ^ m) - m;
}

回答2:

Assuming both representations are two's complement, simply

upper_byte = (Signed_byte(incoming_msb) >= 0? 0 : Byte(-1));

where

using Signed_byte = signed char;
using Byte = unsigned char;

and upper_byte is a variable representing the missing fourth byte.

The conversion to Signed_byte is formally implementation-dependent, but a two's complement implementation doesn't have a choice, really.

回答3:

This is a pretty old question, but I recently had to do the same (while dealing with 24-bit audio samples), and wrote my own solution for it. It's using a similar principle as this answer, but more generic, and potentially generates better code after compiling.

template <size_t Bits, typename T>
inline constexpr T sign_extend(const T& v) noexcept {
    static_assert(std::is_integral<T>::value, "T is not integral");
    static_assert((sizeof(T) * 8u) >= Bits, "T is smaller than the specified width");
    if constexpr ((sizeof(T) * 8u) == Bits) return v;
    else {
        using S = struct { signed Val : Bits; };
        return reinterpret_cast<const S*>(&v)->Val;
    }
}

This has no hard-coded math, it simply lets the compiler do the work and figure out the best way to sign-extend the number. With certain widths, this can even generate a native sign-extension instruction in the assembly, such as MOVSX on x86.

This function assumes you copied your N-bit number into the lower N bits of the type you want to extend it to. So for example:

int16_t a = -42;
int32_t b{};
memcpy(&b, &a, sizeof(a));
b = sign_extend<16>(b);

Of course it works for any number of bits, extending it to the full width of the type that contained the data.

回答4:

You could let the compiler process itself the sign extension. Assuming that the lowest significant byte is byte1 and the high significant byte is byte3;

int val = (signed char) byte3;                // C guarantees the sign extension
val << 16;                                    // shift the byte at its definitive place
val |= ((int) (unsigned char) byte2) << 8;    // place the second byte
val |= ((int) (unsigned char) byte1;          // and the least significant one

I have used C style cast here when static_cast would have been more C++ish, but as an old dinosaur (and Java programmer) I find C style cast more readable for integer conversions.

回答5:

Here's a method that works for any bit count, even if it's not a multiple of 8. This assumes you've already assembled the 3 bytes into an integer value.

const int bits = 24;
int mask = (1 << bits) - 1;
bool is_negative = (value & ~(mask >> 1)) != 0;
value |= -is_negative & ~mask;

回答6:

You can use a bitfield

template<size_t L>
inline int32_t sign_extend_to_32(const char *x)
{
  struct {int32_t i: L;} s;
  memcpy(&s, x, 3);
  return s.i;
  // or
  return s.i = (x[2] << 16) | (x[1] << 8) | x[0]; // assume little endian
}

Easy and no undefined behavior invoked

int32_t r = sign_extend_to_32<24>(your_3byte_array);

Of course copying the bytes to the upper 3 bytes in the int32 and then shifting everything to the right as you thought is also a good idea. There's no undefined behavior if you use memcpy like above. An alternative is reinterpret_cast in C++ and union in C, which can avoid the use of memcpy. However there's an implementation defined behavior because right shift is not always a sign-extension shift (although almost all modern compilers do that)