I notice that modern C and C++ code seems to use size_t
instead of int
/unsigned int
pretty much everywhere - from parameters for C string functions to the STL. I am curious as to the reason for this and the benefits it brings.
问题:
回答1:
The size_t
type is the unsigned integer type that is the result of the sizeof
operator (and the offsetof
operator), so it is guaranteed to be big enough to contain the size of the biggest object your system can handle (e.g., a static array of 8Gb).
The size_t
type may be bigger than, equal to, or smaller than an unsigned int
, and your compiler might make assumptions about it for optimization.
You may find more precise information in the C99 standard, section 7.17, a draft of which is available on the Internet in pdf format, or in the C11 standard, section 7.19, also available as a pdf draft.
回答2:
Classic C (the early dialect of C described by Brian Kernighan and Dennis Ritchie in The C Programming Language, Prentice-Hall, 1978) didn\'t provide size_t
. The C standards committee introduced size_t
to eliminate a portability problem
Explained in detail at embedded.com (with a very good example)
回答3:
In short, size_t
is never negative, and it maximizes performance because it\'s typedef\'d to be the unsigned integer type that\'s big enough -- but not too big -- to represent the size of the largest possible object on the target platform.
Sizes should never be negative, and indeed size_t
is an unsigned type. Also, because size_t
is unsigned, you can store numbers that are roughly twice as big as in the corresponding signed type, because we can use the sign bit to represent magnitude, like all the other bits in the unsigned integer. When we gain one more bit, we are multiplying the range of numbers we can represents by a factor of about two.
So, you ask, why not just use an unsigned int
? It may not be able to hold big enough numbers. In an implementation where unsigned int
is 32 bits, the biggest number it can represent is 4294967295
. Some processors, such as the IP16L32, can copy objects larger than 4294967295
bytes.
So, you ask, why not use an unsigned long int
? It exacts a performance toll on some platforms. Standard C requires that a long
occupy at least 32 bits. An IP16L32 platform implements each 32-bit long as a pair of 16-bit words. Almost all 32-bit operators on these platforms require two instructions, if not more, because they work with the 32 bits in two 16-bit chunks. For example, moving a 32-bit long usually requires two machine instructions -- one to move each 16-bit chunk.
Using size_t
avoids this performance toll. According to this fantastic article, \"Type size_t
is a typedef that\'s an alias for some unsigned integer type, typically unsigned int
or unsigned long
, but possibly even unsigned long long
. Each Standard C implementation is supposed to choose the unsigned integer that\'s big enough--but no bigger than needed--to represent the size of the largest possible object on the target platform.\"
回答4:
The size_t type is the type returned by the sizeof operator. It is an unsigned integer capable of expressing the size in bytes of any memory range supported on the host machine. It is (typically) related to ptrdiff_t in that ptrdiff_t is a signed integer value such that sizeof(ptrdiff_t) and sizeof(size_t) are equal.
When writing C code you should always use size_t whenever dealing with memory ranges.
The int type on the other hand is basically defined as the size of the (signed) integer value that the host machine can use to most efficiently perform integer arithmetic. For example, on many older PC type computers the value sizeof(size_t) would be 4 (bytes) but sizeof(int) would be 2 (byte). 16 bit arithmetic was faster than 32 bit arithmetic, though the CPU could handle a (logical) memory space of up to 4 GiB.
Use the int type only when you care about efficiency as its actual precision depends strongly on both compiler options and machine architecture. In particular the C standard specifies the following invariants: sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) placing no other limitations on the actual representation of the precision available to the programmer for each of these primitive types.
Note: This is NOT the same as in Java (which actually specifies the bit precision for each of the types \'char\', \'byte\', \'short\', \'int\' and \'long\').
回答5:
Type size_t must be big enough to store the size of any possible object. Unsigned int doesn\'t have to satisfy that condition.
For example in 64 bit systems int and unsigned int may be 32 bit wide, but size_t must be big enough to store numbers bigger than 4G
回答6:
This excerpt from the glibc manual 0.02 may also be relevant when researching the topic:
There is a potential problem with the size_t type and versions of GCC prior to release 2.4. ANSI C requires that size_t always be an unsigned type. For compatibility with existing systems\' header files, GCC defines size_t in stddef.h\' to be whatever type the system\'s
sys/types.h\' defines it to be. Most Unix systems that define size_t in `sys/types.h\', define it to be a signed type. Some code in the library depends on size_t being an unsigned type, and will not work correctly if it is signed.
The GNU C library code which expects size_t to be unsigned is correct. The definition of size_t as a signed type is incorrect. We plan that in version 2.4, GCC will always define size_t as an unsigned type, and the fixincludes\' script will massage the system\'s
sys/types.h\' so as not to conflict with this.
In the meantime, we work around this problem by telling GCC explicitly to use an unsigned type for size_t when compiling the GNU C library. `configure\' will automatically detect what type GCC uses for size_t arrange to override it if necessary.
回答7:
If my compiler is set to 32 bit, size_t
is nothing other than a typedef for unsigned int
. If my compiler is set to 64 bit, size_t
is nothing other than a typedef for unsigned long long
.
回答8:
size_t is the size of a pointer.
So in 32 bits or the common ILP32 (integer, long, pointer) model size_t is 32 bits. and in 64 bits or the common LP64 (long, pointer) model size_t is 64 bits (integers are still 32 bits).
There are other models but these are the ones that g++ use (at least by default)