How do you prevent a file descriptor from being copy-inherited across fork() syscalls (without closing it, of course) ?
I am looking for a way to mark a single file descriptor as NOT to be (copy-)inherited by children at fork(), something like a FD_CLOEXEC-like hack but for forks (so a FD_DONTINHERIT feature if you like). Anybody did this? Or looked into this and has a hint for me to start with?
Thank you
UPDATE:
I could use libc's __register_atfork
__register_atfork(NULL, NULL, fdcleaner, NULL)
to close the fds in child just before fork() returns. However, the fds are still being copied so this sounds like a silly hack to me. Question is how to skip the dup()-ing in child of unneeded fds
I'm thinking of some scenarios when a fcntl(fd,F_SETFL,F_DONTINHERIT) would be needed:
fork() will copy an event fd (e.g. epoll); sometimes this isn't wanted, for example FreeBSD is marking the kqueue() event fd as being of a KQUEUE_TYPE and these types of fds won't be copied across forks (the kqueue fds are skipped explicitly from being copied, if one wants to use it from a child it must fork with shared fd table)
fork() will copy 100k unneeded fds to fork a child for doing some cpu-intensive tasks (suppose the need for a fork() is probabilistically very low and programmer won't want to maintain a pool of children for something that normally wouldn't happen)
Some descriptors we want to be copied (0,1,2), some (most of them?) not. I think full fdtable duping is here for historic reasons but I am probably wrong.
How silly does this sound:
- patch fcntl to support the dontinherit flag on file descriptors (not sure if the flag should be kept per-fd or in a fdtable fd_set, like the close-on-exec flags are being kept
- modify dup_fd() in kernel to skip copying of dontinherit fds, same as freebsd does for kq fds
consider the program
#include <stdio.h>
#include <unistd.h>
#include <err.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>
static int fds[NUMFDS];
clock_t t1;
static void cleanup(int i)
{
while(i-- >= 0) close(fds[i]);
}
void clk_start(void)
{
t1 = clock();
}
void clk_end(void)
{
double tix = (double)clock() - t1;
double sex = tix/CLOCKS_PER_SEC;
printf("fork_cost(%d fds)=%fticks(%f seconds)\n",
NUMFDS,tix,sex);
}
int main(int argc, char **argv)
{
pid_t pid;
int i;
__register_atfork(clk_start,clk_end,NULL,NULL);
for (i = 0; i < NUMFDS; i++) {
fds[i] = open("/dev/null",O_RDONLY);
if (fds[i] == -1) {
cleanup(i);
errx(EXIT_FAILURE,"open_fds:");
}
}
t1 = clock();
pid = fork();
if (pid < 0) {
errx(EXIT_FAILURE,"fork:");
}
if (pid == 0) {
cleanup(NUMFDS);
exit(0);
} else {
wait(&i);
cleanup(NUMFDS);
}
exit(0);
return 0;
}
ofcourse, can't consider this a real bench but anyhow:
root@pinkpony:/home/cia/dev/kqueue# time ./forkit
fork_cost(100 fds)=0.000000ticks(0.000000 seconds)
real 0m0.004s
user 0m0.000s
sys 0m0.000s
root@pinkpony:/home/cia/dev/kqueue# gcc -DNUMFDS=100000 -o forkit forkit.c
root@pinkpony:/home/cia/dev/kqueue# time ./forkit
fork_cost(100000 fds)=10000.000000ticks(0.010000 seconds)
real 0m0.287s
user 0m0.010s
sys 0m0.240s
root@pinkpony:/home/cia/dev/kqueue# gcc -DNUMFDS=100 -o forkit forkit.c
root@pinkpony:/home/cia/dev/kqueue# time ./forkit
fork_cost(100 fds)=0.000000ticks(0.000000 seconds)
real 0m0.004s
user 0m0.000s
sys 0m0.000s
forkit ran on a Dell Inspiron 1520 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz with 4GB ram; average_load=0.00