In a terminal I can call ls -d */
. Now I want a c program to do that for me, like this:
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>
int main( void )
{
int status;
char *args[] = { "/bin/ls", "-l", NULL };
if ( fork() == 0 )
execv( args[0], args );
else
wait( &status );
return 0;
}
This will ls -l
everything. However, when I am trying:
char *args[] = { "/bin/ls", "-d", "*/", NULL };
I will get a runtime error:
ls: */: No such file or directory
Unfortunately, all solutions based on shell expansion are limited by the maximum command line length. Which varies (run
true | xargs --show-limits
to find out); on my system, it is about two megabytes. Yes, many will argue that it suffices -- as did Bill Gates on 640 kilobytes, once.(When running certain parallel simulations on non-shared filesystems, I do occasionally have tens of thousands of files in the same directory, during the collection phase. Yes, I could do that differently, but that happens to be the easiest and most robust way to collect the data. Very few POSIX utilities are actually silly enough to assume "X is sufficient for everybody".)
Fortunately, there are several solutions. One is to use
find
instead:You can also format the output as you wish, not depending on locale:
If you want to sort the output, use
\0
as the separator (since filenames are allowed to contain newlines), and-t=
forsort
to use\0
as the separator, too.tr
will convert them to newlines for you:If you want the names in an array, use
glob()
function instead.Finally, as I like to harp every now and then, one can use the POSIX
nftw()
function to implement this internally:and the
nftw()
call to use the above is obviously something likeThe only "issue" in using
nftw()
is to choose a good number of file descriptors the function may use (NUM_FDS
). POSIX says a process must always be able to have at least 20 open file descriptors. If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though.You can find the actual limit using
sysconf(_SC_OPEN_MAX)
, and subtracting the number of descriptors your process may use at the same time. In current Linux systems, it is typically limited to 1024 per process.The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep
nftw()
can go in the directory tree structure, before it has to use workarounds.If you want to create a test directory with lots of subdirectories, use something like the following Bash:
On my system, running
in that directory yields
bash: /bin/ls: Argument list too long
error, while thefind
command and thenftw()
based program all run just fine.You also cannot remove the directories using
rmdir directory-*/
for the same reason. Useinstead. Or just remove the entire directory and subdirectories,
Another less low-level approach, with system():
Notice with
system()
, you don't need tofork()
. However, I recall that we should avoid usingsystem()
when possible!As Nomimal Animal said, this will fail when the number of subdirectories is too big! See his answer for more...
Just call
system
. Globs on Unixes are expanded by the shell.system
will give you a shell.You can avoid the whole fork-exec thing by doing the glob(3) yourself:
You could pass the results to a spawned
ls
, but there's hardly a point in doing that.(If you do want to do fork and exec, you should start with a template that does proper error checking -- each of those calls may fail.)
The lowest-level way to do this is with the same Linux system calls
ls
uses.So look at the output of
strace -efile,getdents ls
:getdents is a Linux-specific system call. The man page says that it's used under the hood by libc's
readdir(3)
POSIX API function.The lowest-level portable way (portable to POSIX systems), is to use the libc functions to open a directory and read the entries. POSIX doesn't specify the exact system call interface, unlike for non-directory files.
These functions:
can be used like this:
There's also a fully compilable example of reading directory entries and printing file info in the Linux
stat(3posix)
man page. (not the Linuxstat(2)
man page; it has a different example).The man page for
readdir(3)
says the Linux declaration of struct dirent is:d_type is either
DT_UNKNOWN
, in which case you need tostat
to learn anything about whether the directory entry is itself a directory. Or it can beDT_DIR
or something else, in which case you can be sure it is or isn't a directory without having tostat
it.Some filesystems, like EXT4 I think, and very recent XFS (with the new metadata version), keep type info in the directory, so it can be returned without having to load the inode from disk. This is a huge speedup for
find -name
: it doesn't have to stat anything to recurse through subdirs. But for filesystems that don't do this,d_type
will always beDT_UNKNOWN
, because filling it in would require reading all the inodes (which might not even be loaded from disk).Sometimes you're just matching on filenames, and don't need type info, so it would be bad if the kernel spent a lot of extra CPU time (or especially I/O time) filling in
d_type
when it's not cheap.d_type
is just a performance shortcut; you always need a fallback (except maybe when writing for an embedded system where you know what FS you're using and that it always fills ind_type
, and that you have some way to detect the breakage when someone in the future tries to use this code on another FS type.)If you are looking for a simple way to get a list of folders into your program, I'd rather suggest the spawnless way, not calling an external program, and use the standard POSIX
opendir
/readdir
functions.It's almost as short as your program, but has several additional advantages:
d_type
.