Why do the sizes of data types change as the Opera

2019-02-17 13:36发布

问题:

This question was asked to me in an interview, that size of char is 2 bytes in some OS, but in some operating system it is 4 bytes or different.

Why is that so?

Why is it different from other fundamental types, such as int?

回答1:

That was probably a trick question. The sizeof(char) is always 1.

If the size differs, it's probably because of a non-conforming compiler, in which case the question should be about the compiler itself, not about the C or C++ language.

5.3.3 Sizeof [expr.sizeof]

1 The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is not evaluated, or a parenthesized type-id. The sizeof operator shall not be applied to an expression that has function or incomplete type, or to an enumeration type before all its enumerators have been declared, or to the parenthesized name of such types, or to an lvalue that designates a bit-field. sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1. The result of sizeof applied to any other fundamental type (3.9.1) is implementation-defined. (emphasis mine)

The sizeof of other types than the ones pointed out are implementation-defined, and they vary for various reasons. An int has better range if it's represented in 64 bits instead of 32, but it's also more efficient as 32 bits on a 32-bit architecture.



回答2:

The physical sizes (in terms of the number of bits) of types are usually dictated by the target hardware.

For example, some CPUs can access memory only in units not smaller than 16-bit. For the best performance, char can then be defined a 16-bit integer. If you want 8-bit chars on this CPU, the compiler has to generate extra code for packing and unpacking of 8-bit values into and from 16-bit memory cells. That extra packing/unpacking code will make your code bigger and slower.

And that's not the end of it. If you subdivide 16-bit memory cells into 8-bit chars, you effectively introduce an extra bit in addresses/pointers. If normal addresses are 16-bit in the CPU, where do you stick this extra, 17th bit? There are two options:

  • make pointers bigger (32-bit, of which 15 are unused) and waste memory and reduce the speed further
  • reduce the range of addressable address space by half, wasting memory, and loosing speed

The latter option can sometimes be practical. For example, if the entire address space is divided in halves, one of which is used by the kernel and the other by user applications, then application pointers will never use one bit in their addresses. You can use that bit to select an 8-bit byte in a 16-bit memory cell.

C was designed to run on as many different CPUs as possible. This is why the physical sizes of char, short, int, long, long long, void*, void(*)(), float, double, long double, wchar_t, etc can vary.

Now, when we're talking about different physical sizes in different compilers producing code for the same CPU, this becomes more of an arbitrary choice. However, it may be not that arbitrary as it may seem. For example, many compilers for Windows define int = long = 32 bits. They do that to avoid programmer's confusion when using Windows APIs, which expect INT = LONG = 32 bits. Defining int and long as something else would contribute to bugs due to loss of programmer's attention. So, compilers have to follow suit in this case.

And lastly, the C (and C++) standard operates with chars and bytes. They are the same concept size-wise. But C's bytes aren't your typical 8-bit bytes, they can legally be bigger than that as explained earlier. To avoid confusion you may use the term octet, whose name implies the number 8. A number of protocols uses this word for this very purpose.