Recently I was looking at the 'dirent' structure (in dirent.h) and was a little puzzled by its definition.
NOTE: This header file is from a Solaris machine at my school.
typedef struct dirent {
ino_t d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[1];
} dirent_t;
Particularly the d_name field. How would this work in the operating system? If you need to store a null terminated string what good is an array of a single char? I know that you can get the address of an array by its first element but I am still confused. Obviously something is happening, but I don't know what. On my Fedora Linux system at home this field is simply defined as:
char d_name[256];
Now that makes a lot more sense for obvious reasons. Can someone explain why the Solaris header file defines the struct as it does?
As others have pointed out, the last member of the struct doesn't have any set size. The array is however long the implementation decides it needs to be to accommodate the characters it wants to put in it. It does this by dynamically allocating the memory for the struct, such as with malloc
.
It's convenient to declare the member as having size 1, though, because it's easy to determine how much memory is occupied by any dirent
variable d
:
sizeof(dirent) + strlen(d.d_name)
Using size 1 also discourages the recipient of such struct values from trying to store their own names in it instead of allocating their own dirent
values. Using the Linux definition, it's reasonable to assume that any dirent
value you have will acept a 255-character string, but Solaris makes no guarantee that its dirent
values will store any more characters than they need to.
I think it was C 99 that introduced a special case for the last member of a struct. The struct could be declared like this instead:
typedef struct dirent {
ino_t d_ino;
off_t d_off;
unsigned short d_reclen;
char d_name[];
} dirent_t;
The array has no declared size. This is known as the flexible array member. It accomplishes the same thing as the Solaris version, except that there's no illusion that the struct by itself could hold any name. You know by looking at it that there's more to it.
Using the "flexible" declaration, the amount of memory occupied would be adjusted like so:
sizeof(dirent) + strlen(d.d_name) + 1
That's because the flexible array member does not factor in to the size of the struct.
The reason you don't see flexible declarations like that more often, especially in OS library code, is likely for the sake of compatibility with older compilers that don't support that facility. It's also for compatibility with code written to target the current definition, which would break if the size of the struct changed like that.
The dirent struct will be immediately followed in memory by a block of memory that contains the rest of the name, and that memory is accessible through the d_name field.
This is a pattern used in C to indicate an arbitrary-length array at the end of a structure. Arrays in C have no built-in bounds checking, so when your code tries to access the string starting at d_name, it will continue past the end of the structure. This relies on readdir()
will allocate enough memory to hold the entire string plus the terminating nul.
It looks like a micro-optimization to me. Names are commonly short, so why allocate space that you know will go unused. Also, Solaris may support names longer than 255 characters. To use such a struct you just allocate the needed space and ignore the supposed array size.