Platform independent storage of signed integers

2020-07-25 09:04发布

问题:

I want to write signed integer values into a file in a platform independent way.

If they were unsigned, I would just convert them from host byte order to LE (or BE) with the endian(3) family of functions.

I'm not sure how to deal with signed integers though. If I cast them to unsigned values, I loose the sign, since the C standard does not guarantee that

(int) ((unsigned) -1)) == -1

The other option would be to I cast a pointer to the value (i.e., reinterpret the byte sequence as unsigned), but it I'm not convinced that converting endianness after that is going to give anything sensible.

What is the proper way for platform independent signed integer storage?

Update:

  • I know that in practice, almost all architectures use two-complement representation, so that I can losslessly convert between signed and unsigned integers. However, this is question is meant to be more theoretical.

  • Just rolling out my own integer representation (be that storing the decimal letters as ascii characters, or separately storing the sign bit) is of course a solution. However, I'm interested if there is a way that works without completely abandoning the native binary representation.

回答1:

A platform-independent way? If you truly want this, you should consider writing it as text rather than binary (and taking into account that even that is not fully platform-independent since you may want to move it from an ASCII to an EBCDIC platform).

It all depends on how platform-independent you need it to be. C allows for three different signed encodings: two's complement, one's complement and sign/magnitude. But, by far, most machines will use the first one.

Work out first what you actually mean by that term. If you mean you only want to handle two's complement, then casting it to an unsigned is fine.



回答2:

The simplest solution:

For writing, just convert to unsigned and use your unsigned endian conversion functions.

For reading the values back, first read them into an unsigned variable, and check if the high bit is set, and do some arithmetic to make the conversion well-defined:

uint32_t temp;
int32_t dest;
if (temp > INT32_MAX) dest = -(int32_t)(-temp-1)-1;
else dest = temp;

As an added bonus, a good compiler on a sane system (i.e. a twos-complement system where the implementation-defined conversion to unsigned is "correct") will first optimize -(int32_t)(-temp-1)-1 to (int32_t)temp, then optimize the two branches of the conditional, which now both contain identical code, to a single code path with no branch.



回答3:

Use the same approach as when sending data over the network. Convert your unsigned or signed values to big-endian and save them by using htonl(). When reading, convert the data back to your machine endianness by using ntohl().

But as always you need to know if the data originally was signed or unsigned. With just a bit sequence, you can't know for sure.



回答4:

Options:

  • Store numbers as plain text using printf()-like functions for conversion
  • Convert negative numbers to sign + absolute value, store them as unsigned with the extra sign bit


回答5:

Output a 1 byte sign flag (e.g. 0=positive, 1=negative). If the value is negative make it positive and then write the value in big endian format. If you don't like 0 and 1 you could use '+' and '-'.



回答6:

Store the sign and the absolute value as 2 fields, and recombine them when you read it back.

You said you already know how to convert to/from a well-defined byte order, so all that is left is to determine the sign (hint < 0 might help here :-)), take the absolute value (which you could do in combination with determining what it is, or using abs() or similar.

Something like:

if (num < 0) {
  negative = 1;
   num      = -num;
 } else {
   negative = 0
 }
write_value = htole32(num);
write(file, &negative, 1);
write(file, &write_value, 4);

As an optimization you could collect the sign bits for values together and store them in a single word before the absolute values.