What is the correct type for array indexes in C?

2019-01-06 14:55发布

What type for array index in C99 should be used? It have to work on LP32, ILP32, ILP64, LP64, LLP64 and more. It doesn't have to be a C89 type.

I have found 5 candidates:

  • size_t
  • ptrdiff_t
  • intptr_t / uintptr_t
  • int_fast*_t / uint_fast*_t
  • int_least*_t / uint_least*_t

There is simple code to better illustrate problem. What is the best type for i and j in these two particular loops. If there is a good reason, two different types are fine too.

for (i=0; i<imax; i++) {
        do_something(a[i]);
}
/* jmin can be less than 0 */
for (j=jmin; j<jmax; j++) {
        do_something(a[j]);
}

P.S. In the first version of question I had forgotten about negative indexes.

P.P.S. I am not going to write a C99 compiler. However any answer from a compiler programmer would be very valuable for me.

Similar question:

8条回答
仙女界的扛把子
2楼-- · 2019-01-06 14:57

Since the type of sizeof(array) (and malloc's argument) is size_t, and the array can't hold more elements than its size, it follows that size_t can be used for the array's index.

EDIT This analysis is for 0-based arrays, which is the common case. ptrdiff_t will work in any case, but it's a little strange for an index variable to have a pointer-difference type.

查看更多
Ridiculous、
3楼-- · 2019-01-06 14:57

My choice: ptrdiff_t

Many have voted for ptrdiff_t, but some have said that it is strange to index using a pointer difference type. To me, it makes perfect sense: the array index is the difference from the origin pointer.

Some have also said that size_t is right because that is designed to hold the size. However, as some have commented: this is the size in bytes, and so can generally hold values several times greater than the maximum possible array index.

查看更多
仙女界的扛把子
4楼-- · 2019-01-06 15:06

I almost always use size_t for array indices/loop counters. Sure there are some special instances where you may want signed offsets, but in general using a signed type has a lot of problems:

The biggest risk is that if you're passed a huge size/offset by a caller treating things as unsigned (or if you read it from a wrongly-trusted file), you may interpret it as a negative number and fail to catch that it's out of bounds. For instance if (offset<size) array[offset]=foo; else error(); will write somewhere it shouldn't.

Another problem is the possibility of undefined behavior with signed integer overflow. Whether you use unsigned or signed arithmetic, there are overflow issues to be aware of and check for, but personally I find the unsigned behavior a lot easier to deal with.

Yet another reason to use unsigned arithmetic (in general) - sometimes I'm using indices as offsets into a bit array and I want to use %8 and /8 or %32 and /32. With signed types, these will be actual division operations. With unsigned, the expected bitwise-and/bitshift operations can be generated.

查看更多
甜甜的少女心
5楼-- · 2019-01-06 15:10

If you start at 0, use size_t because that type must be able to index any array:

  • sizeof returns it, so it is not valid for an array to have more than size_t elements
  • malloc takes it as argument, as mentioned by Amnon

If you start below zero, then shift to start at zero, and use size_t, which is guaranteed to work because of the reasons above. So replace:

for (j = jmin; j < jmax; j++) {
    do_something(a[j]);
}

with:

int *b = &a[jmin];
for (size_t i = 0; i < (jmax - jmin); i++) {
    do_something(b[i]);
}

Why not to use:

  • ptrdiff_t: the maximum value this represents may be smaller than the maximum value of size_t.

    This is mentioned at cppref, and the possibility of undefined behavior if the array is too large is suggested at C99 6.5.5/9:

    When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined.

    Out of curiosity, intptr_t might also be larger than size_t on a segmented memory architecture: https://stackoverflow.com/a/1464194/895245

    GCC also imposes further limits on the maximum size of static array objects: What is the maximum size of an array in C?

  • uintptr_t: I'm not sure. So I'd just use size_t because I'm more sure :-)

查看更多
放荡不羁爱自由
6楼-- · 2019-01-06 15:10

I use unsigned int. (though I prefer the shorthand unsigned)

In C99, unsigned int is guaranteed to be able to index any portable array. Only arrays of 65'535 bytes or smaller are guaranteed to be supported, and the maximum unsigned int value is at least 65'535.

From §the public WG14 N1256 draft of the C99 standard:

5.2.4.1 Translation limits

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: (Implementations should avoid imposing fixed translation limits whenever possible.)

(...)

  • 65535 bytes in an object (in a hosted environment only)

(...)

5.2.4.2 Numerical limits

An implementation is required to document all the limits specified in this subclause, which are specified in the headers <limits.h> and <float.h>. Additional limits are specified in <stdint.h>.

5.2.4.2.1 Sizes of integer types <limits.h>

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute v alue) to those shown, with the same sign.

(...)

  • maximum value for an object of type unsigned int UINT_MAX 65535 // 2^16 - 1

In ANSI C, the maximum portable array size is actually only 32'767 bytes, so even a signed int will do, which has a maximum value of at least 32'767 (Appendix A.4).

From §2.2.4 of a C89 draft:

2.2.4.1 Translation limits

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits: (Implementations should avoid imposing fixed translation limits whenever possible.)

(...)

  • 32767 bytes in an object (in a hosted environment only)

(...)

2.2.4.2 Numerical limits

A conforming implementation shall document all the limits specified in this section, which shall be specified in the headers <limits.h> and <float.h>.

"Sizes of integral types <limits.h>"

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

(...)

  • maximum value for an object of type int INT_MAX +32767
查看更多
该账号已被封号
7楼-- · 2019-01-06 15:14

In your situation, I would use ptrdiff_t. It's not just that indicies can be negative. You might want to count down to zero, in which case signed types yield a nasty, subtle bug:

for(size_t i=5; i>=0; i--) {
  printf("danger, this loops forever\n);
}

That won't happen if you use ptrdiff_t or any other suitable signed type. On POSIX systems, you can use ssize_t.

Personally, I often just use int, even though it is arguably not the Correct Thing To Do.

查看更多
登录 后发表回答