Would making plain int 64-bit break a lot of reaso

2019-04-04 02:16发布

站内文章 / C

74 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Until recently, I'd considered the decision by most systems implementors/vendors to keep plain int 32-bit even on 64-bit machines a sort of expedient wart. With modern C99 fixed-size types (int32_t and uint32_t, etc.) the need for there to be a standard integer type of each size 8, 16, 32, and 64 mostly disappears, and it seems like int could just as well be made 64-bit.

However, the biggest real consequence of the size of plain int in C comes from the fact that C essentially does not have arithmetic on smaller-than-int types. In particular, if int is larger than 32-bit, the result of any arithmetic on uint32_t values has type signed int, which is rather unsettling.

Is this a good reason to keep int permanently fixed at 32-bit on real-world implementations? I'm leaning towards saying yes. It seems to me like there could be a huge class of uses of uint32_t which break when int is larger than 32 bits. Even applying the unary minus or bitwise complement operator becomes dangerous unless you cast back to uint32_t.

Of course the same issues apply to uint16_t and uint8_t on current implementations, but everyone seems to be aware of and used to treating them as "smaller-than-int" types.

回答1:

As you say, I think that the promotion rules really are the killer. uint32_t would then promote to int and all of a sudden you'd have signed arithmetic where almost everybody expects unsigned.

This would be mostly hidden in places where you do just arithmetic and assign back to an uint32_t. But it could be deadly in places where you do comparison to constants. Whether code that relies on such comparisons without doing an explicit cast is reasonable, I don't know. Casting constants like (uint32_t)1 can become quite tedious. I personally at least always use the suffix U for constants that I want to be unsigned, but this already is not as readable as I would like.

Also have in mind that uint32_t etc are not guaranteed to exist. Not even uint8_t. The enforcement of that is an extension from POSIX. So in that sense C as a language is far from being able to make that move.

回答2:

"Reasonable Code"...

Well... the thing about development is, you write and fix it and then it works... and then you stop!

And maybe you've been burned a lot so you stay well within the safe ranges of certain features, and maybe you haven't been burned in that particular way so you don't realize that you're relying on something that could kind-of change.

Or even that you're relying on a bug.

On olden Mac 68000 compilers, int was 16 bit and long was 32. But even then most extant C code assumed an int was 32, so typical code you found on a newsgroup wouldn't work. (Oh, and Mac didn't have printf, but I digress.)

So, what I'm getting at is, yes, if you change anything, then some things will break.

回答3:

With modern C99 fixed-size types (int32_t and uint32_t, etc.) the need for there to be a standard integer type of each size 8, 16, 32, and 64 mostly disappears,

C99 has fixed-sized typeDEFs, not fixed-size types. The native C integer types are still char, short, int, long, and long long. They are still relevant.

The problem with ILP64 is that it has a great mismatch between C types and C99 typedefs.

int8_t = char
int16_t = short
int32_t = nonstandard type
int64_t = int, long, or long long

From 64-Bit Programming Models: Why LP64?:

Unfortunately, the ILP64 model does not provide a natural way to describe 32-bit data types, and must resort to non-portable constructs such as __int32 to describe such types. This is likely to cause practical problems in producing code which can run on both 32 and 64 bit platforms without #ifdef constructions. It has been possible to port large quantities of code to LP64 models without the need to make such changes, while maintaining the investment made in data sets, even in cases where the typing information was not made externally visible by the application.

回答4:

DEC Alpha and OSF/1 Unix was one of the first 64-bit versions of Unix, and it used 64-bit integers - an ILP64 architecture (meaning int, long and pointers were all 64-bit quantities). It caused lots of problems.

One issue I've not seen mentioned - which is why I'm answering at all after so long - is that if you have a 64-bit int, what size do you use for short? Both 16 bits (the classical, change nothing approach) and 32 bits (the radical 'well, a short should be half as long as an int' approach) will present some problems.

With the C99 <stdint.h> and <inttypes.h> headers, you can code to fixed size integers - if you choose to ignore machines with 36-bit or 60-bit integers (which is at least quasi-legitimate). However, most code is not written using those types, and there are typically deep-seated and largely hidden (but fundamentally flawed) assumptions in the code that will be upset if the model departs from the existing variations.

Notice Microsoft's ultra-conservative LLP64 model for 64-bit Windows. That was chosen because too much old code would break if the 32-bit model was changed. However, code that had been ported to ILP64 or LP64 architectures was not immediately portable to LLP64 because of the differences. Conspiracy theorists would probably say it was deliberately chosen to make it more difficult for code written for 64-bit Unix to be ported to 64-bit Windows. In practice, I doubt whether that was more than a happy (for Microsoft) side-effect; the 32-bit Windows code had to be revised a lot to make use of the LP64 model too.

回答5:

There's one code idiom that would break if ints were 64-bits, and I see it often enough that I think it could be called reasonable:

checking if a value is negative by testing if ((val & 0x80000000) != 0)

This is commonly found in checking error codes. Many error code standards (like Window's HRESULT) uses bit 31 to represent an error. And code will sometimes check for that error either by testing bit 31 or sometimes by checking if the error is a negative number.

Microsoft's macros for testing HRESULT use both methods - and I'm sure there's a ton of code out there that does similar without using the SDK macros. If MS had moved to ILP64, this would be one area that caused porting headaches that are completely avoided with the LLP64 model (or the LP64 model).

Note: if you're not familiar with terms like "ILP64", please see the mini-glossary at the end of the answer.

I'm pretty sure there's a lot of code (not necessarily Windows-oriented) out there that uses plain-old-int to hold error codes, assuming that those ints are 32-bits in size. And I bet there's a lot of code with that error status scheme that also uses both kinds of checks (< 0 and bit 31 being set) and which would break if moved to an ILP64 platform. These checks could be made to continue to work correctly either way if the error codes were carefully constructed so that sign-extension took place, but again, many such systems I've seen construct the error values by or-ing together a bunch a bitfields.

Anyway, I don't think this is an unsolvable problem by any means, but I do think it's a fairly common coding practice that would cause a lot of code to require fixing up if moved to an ILP64 platform.

Note that I also don't think this was one of the foremost reasons for Microsoft to choose the LLP64 model (I think that decision was largely driven by binary data compatibility between 32-bit and 64-bit processes, as mentioned in MSDN and on Raymond Chen's blog).

Mini-Glossary for the 64-bit Platform Programming Model terminology:

ILP64: int, long, pointers are 64-bits
LP64: long and pointers are 64-bits, int is 32-bits (used by many (most?) Unix platforms)
LLP64: long long and pointers are 64-bits, int and long remain 32-bits (used on Win64)

For more information on 64-bit programming models, see "64-bit Programming Models: Why LP64?"

回答6:

While I don't personally write code like this, I'll bet that it's out there in more than one place... and of course it'll break if you change the size of int.

int i, x = getInput();
for (i = 0; i < 32; i++)
{
    if (x & (1 << i))
    {
        //Do something
    }
}

回答7:

Well, it's not like this story is all new. With "most computers" I assume you mean desktop computers. There already has been a transition from 16-bit to 32-bit int. Is there anything at all that says the same progression won't happen this time?