可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
What type for array index in C99 should be used? It have to work on LP32, ILP32, ILP64, LP64, LLP64 and more. It doesn't have to be a C89 type.
I have found 5 candidates:
size_t
ptrdiff_t
intptr_t
/ uintptr_t
int_fast*_t
/ uint_fast*_t
int_least*_t
/ uint_least*_t
There is simple code to better illustrate problem. What is the best type for i
and j
in these two particular loops. If there is a good reason, two different types are fine too.
for (i=0; i<imax; i++) {
do_something(a[i]);
}
/* jmin can be less than 0 */
for (j=jmin; j<jmax; j++) {
do_something(a[j]);
}
P.S.
In the first version of question I had forgotten about negative indexes.
P.P.S.
I am not going to write a C99 compiler. However any answer from a compiler programmer would be very valuable for me.
Similar question:
- size_t vs. uintptr_t
The context of this question if different though.
回答1:
I almost always use size_t
for array indices/loop counters. Sure there are some special instances where you may want signed offsets, but in general using a signed type has a lot of problems:
The biggest risk is that if you're passed a huge size/offset by a caller treating things as unsigned (or if you read it from a wrongly-trusted file), you may interpret it as a negative number and fail to catch that it's out of bounds. For instance if (offset<size) array[offset]=foo; else error();
will write somewhere it shouldn't.
Another problem is the possibility of undefined behavior with signed integer overflow. Whether you use unsigned or signed arithmetic, there are overflow issues to be aware of and check for, but personally I find the unsigned behavior a lot easier to deal with.
Yet another reason to use unsigned arithmetic (in general) - sometimes I'm using indices as offsets into a bit array and I want to use %8 and /8 or %32 and /32. With signed types, these will be actual division operations. With unsigned, the expected bitwise-and/bitshift operations can be generated.
回答2:
I think you should use ptrdiff_t
for the following reasons
- Indices can be negative (thus all unsigned types, including
size_t
, are out of question)
- The type of
p2 - p1
is ptrdiff_t
. The type of i
in the reverse thing, *(p1 + i)
, should be that type too (notice that *(p + i)
is equivalent to p[i]
)
回答3:
Since the type of sizeof(array)
(and malloc
's argument) is size_t
, and the array can't hold more elements than its size, it follows that size_t
can be used for the array's index.
EDIT
This analysis is for 0-based arrays, which is the common case. ptrdiff_t
will work in any case, but it's a little strange for an index variable to have a pointer-difference type.
回答4:
If you start at 0
, use size_t because that type must be able to index any array:
sizeof
returns it, so it is not valid for an array to have more than size_t
elements
malloc
takes it as argument, as mentioned by Amnon
If you start below zero, then shift to start at zero, and use size_t
, which is guaranteed to work because of the reasons above. So replace:
for (j = jmin; j < jmax; j++) {
do_something(a[j]);
}
with:
int *b = &a[jmin];
for (size_t i = 0; i < (jmax - jmin); i++) {
do_something(b[i]);
}
Why not to use:
ptrdiff_t: the maximum value this represents may be smaller than the maximum value of size_t
.
This is mentioned at cppref, and the possibility of undefined behavior if the array is too large is suggested at C99 6.5.5/9:
When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. The size of the result is implementation-defined,
and its type (a signed integer type) is ptrdiff_t defined in the header.
If the result is not representable in an object of that type, the behavior is undefined.
Out of curiosity, intptr_t
might also be larger than size_t
on a segmented memory architecture: https://stackoverflow.com/a/1464194/895245
GCC also imposes further limits on the maximum size of static array objects: What is the maximum size of an array in C?
uintptr_t: I'm not sure. So I'd just use size_t
because I'm more sure :-)
回答5:
If you know the maximum length of your array in advance you can use
int_fast*_t / uint_fast*_t
int_least*_t / uint_least*_t
In all other cases i would recommend using
or
depending on weather you want to allow negative indexes.
Using
would be also safe, but have a bit different semantics.
回答6:
In your situation, I would use ptrdiff_t
. It's not just that indicies can be negative. You might want to count down to zero, in which case signed types yield a nasty, subtle bug:
for(size_t i=5; i>=0; i--) {
printf("danger, this loops forever\n);
}
That won't happen if you use ptrdiff_t
or any other suitable signed type. On POSIX systems, you can use ssize_t
.
Personally, I often just use int
, even though it is arguably not the Correct Thing To Do.
回答7:
I use unsigned int
. (though I prefer the shorthand unsigned
)
In C99, unsigned int
is guaranteed to be able to index any portable array. Only arrays of 65'535 bytes or smaller are guaranteed to be supported, and the maximum unsigned int
value is at least 65'535.
From §the public WG14 N1256 draft of the C99 standard:
5.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: (Implementations should avoid imposing fixed translation limits whenever possible.)
(...)
- 65535 bytes in an object (in a hosted environment only)
(...)
5.2.4.2 Numerical limits
An implementation is required to document all the limits specified in this subclause, which are specified in the headers <limits.h>
and <float.h>
. Additional limits are specified in <stdint.h>
.
5.2.4.2.1 Sizes of integer types <limits.h>
The values given below shall be replaced by constant expressions suitable for use in #if
preprocessing directives. Moreover, except for CHAR_BIT
and MB_LEN_MAX
, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute v
alue) to those shown, with the same sign.
(...)
- maximum value for an object of type
unsigned int
UINT_MAX
65535 // 2^16 - 1
In ANSI C, the maximum portable array size is actually only 32'767 bytes, so even a signed int
will do, which has a maximum value of at least 32'767 (Appendix A.4).
From §2.2.4 of a C89 draft:
2.2.4.1 Translation limits
The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: (Implementations should avoid imposing fixed translation limits whenever possible.)
(...)
- 32767 bytes in an object (in a hosted environment only)
(...)
2.2.4.2 Numerical limits
A conforming implementation shall document all the limits specified in this section, which shall be specified in the headers <limits.h>
and <float.h>
.
"Sizes of integral types <limits.h>
"
The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
(...)
- maximum value for an object of type int
INT_MAX
+32767
回答8:
My choice: ptrdiff_t
Many have voted for ptrdiff_t
, but some have said that it is strange to index using a pointer difference type. To me, it makes perfect sense: the array index is the difference from the origin pointer.
Some have also said that size_t
is right because that is designed to hold the size. However, as some have commented: this is the size in bytes, and so can generally hold values several times greater than the maximum possible array index.