$ uname -a
Linux crowsnest 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 23:42:43 UTC 2011 x86_64 GNU/Linux
$ man readdir:
DESCRIPTION
The readdir() function returns a pointer to a dirent structure representing the next directory entry in the directory stream pointed to by
dirp...
..[snip]...
The readdir_r() function is a reentrant version of readdir()...
...[snip]...
RETURN VALUE
On success, readdir() returns a pointer to a dirent structure. (This structure may be statically allocated; do not attempt to free(3) it.) If the end of the directory stream is reached, NULL is returned and errno is not changed. If an error occurs, NULL is returned and errno is set appropriately.
The readdir_r() function returns 0 on success. On error, it returns a positive error number. If the end of the directory stream is reached, readdir_r() returns 0, and returns NULL in *result.
I'm confused about what this means, my application of this function is to collect a dynamically allocated array of pointers to structs with data about the directory entries, and I'm wondering if I can dynamically allocate dirent structs and set the pointers to them. but this line seams to say that the result should never be called by free, so I'm wondering if I should allocate a seperate dirent struct which will be part of the list and memcpy it over the returned result.
I'm also confused by the terminology of "may" in the above man page. does this mean that somtimes it's statically allocated, and sometimes it's not.
I'm familiar, (vaguely) with what static variables mean in C, but not sure about all the rules and possible gotcha's arround them. because I want to pass the dirent structs that are in a directory around, I would rather it be dynamically allocated. is this what readdir_r is for? or will the double pointer be set to point to another statically allocated dirent struct?
and I'm not entirely sure what reentrant means in this context for readdir_r. my understanding of renetrant is only from scheme coroutines which I'm not sure how that would apply to reading unix directories.
The rule here is really simple -- you're free to make a copy of the data readdir()
returns, however you don't own the buffer it puts that data in so you cannot take actions that suggest you do. (I.e., copy the data out to your own buffer; don't store a pointer to within the readdir-owned buffer.)
so I'm wondering if I should allocate a seperate dirent struct which will be part of the list and memcpy it over the returned result
- that's exactly what you should do.
I'm also confused by the terminology of "may" in the above man page. does this mean that somtimes it's statically allocated, and sometimes it's not.
- it means you cannot count on how it will be managed, but it will be managed for you. The details could vary from one system to the next.
Reentrant means thread-safe. readdir() uses a static entry, making it not safe for multiple threads to use as if they each control the multi-call process. readdir_r() will use allocated space provided by the caller, letting multiple threads act independently.
The structure might be statically-allocated, it might be thread-local, it might be dynamically allocated. That's up to the implementation. But no matter what, it's not yours to free, which is why you must not free it.
readdir_r
doesn't allocate anything for you, you give it a dirent
, allocated however you like, and it fills it in. Therefore it does save you a little bit of effort compared with calling readdir
and copying the dir data. That's not the main purpose of readdir_r
, though, what it's actually for is the ability to make calls from different threads at the same time, which you can't do with readdir
.
What "reentrant" actually means, is that the function can be called again before a previous call to it has returned. In general, this might mean from a different thread (which is what most people mean by "thread-safe"), from a handler for a signal that occurred during the first call, or due to recursion. But the C standard has no concept of threads, so it mentions "reentrant" meaning only the latter two. Posix defines "thread-safe" to require this form of reentrancy and, in addition, the thing that most people mean by thread-safe.
In Posix, every function required to be thread-safe is required to be reentrant, and readdir_r
is required to be thread-safe. I think reentrancy in the weaker sense is irrelevant to readdir_r
, since it doesn't call any user code that could result in recursion, and it's not async-signal-safe so it must not be called from a signal handler either.
Beware, because when some people (Java programmers) say "thread-safe", they mean that the function can be called by different threads on the same arguments at the same time, and will use locks to work correctly. Posix APIs do not mean this by thread-safe, they only mean that the function can be called on different data at the same time. Any global data that the function uses is protected by locks or otherwise, but the arguments need not be.
First question
It means readdir could have something like this:
struct dirent *
readdir(DIR *dirp)
{
static struct dirent;
/* Do stuff. */
return &dirent;
}
Clearly it would be illegal to free it (since you didn't obtain it via malloc
).
The standard doesn't force anyone to do it like this. An implementation could use its own mechanism (perhaps malloc
and free
later on its own).
Second question
"Reentrant" means that while we are inside readdir_r
, the function can be safely called again (for example from a signal handler). For instance, readdir
isn't reentrant. Suppose this happens:
- You call
readdir(dir);
and it starts modifying dirent
- BEFORE it is done, it is interrupted and someone else calls it (from an async context)
- Its version modifies
dirent
, returns and the async context goes on its way
- Your version returns. What does
dirent
contain ?
Reentrant functions are a godsend, they are always safe to call.