I am currently learning about fork()
and execv()
and I had a question regarding the efficiency of the combination.
I was shown the following standard code:
pid = fork();
if(pid < 0){
//handle fork error
}
else if (pid == 0){
execv("son_prog", argv_son);
//do father code
I know that fork()
clones the entire process (copying the entire heap, etc) and that execv()
replaces the current address space with that of the new program. With this in mind, doesn't it make it very inefficient to use this combination? We are copying the entire address space of a process and then immediately overwrite it.
So my question:
What is the advantage that is achieved by using this combo (instead of some other solution) that makes people still use this, even though we have waste?
Not any longer. There's something called
COW
(Copy On Write), only when one of the two processes (Parent/Child) tries to write to a shared data, it is copied.In the past:
The
fork()
system call copied the address space of the calling process (the parent) to create a new process (the child). The copying of the parent's address space into the child was the most expensive part of thefork()
operation.Now:
A call to
fork()
is frequently followed almost immediately by a call toexec()
in the child process, which replaces the child's memory with a new program. This is what the the shell typically does, for example. In this case, the time spent copying the parent's address space is largely wasted, because the child process will use very little of its memory before callingexec()
.For this reason, later versions of Unix took advantage of virtual memory hardware to allow the parent and child to share the memory mapped into their respective address spaces until one of the processes actually modifies it. This technique is known as copy-on-write. To do this, on
fork()
the kernel would copy the address space mappings from the parent to the child instead of the contents of the mapped pages, and at the same time mark the now-shared pages read-only. When one of the two processes tries to write to one of these shared pages, the process takes a page fault. At this point, the Unix kernel realizes that the page was really a "virtual" or "copy-on-write" copy, and so it makes a new, private, writable copy of the page for the faulting process. In this way, the contents of individual pages aren't actually copied until they are actually written to. This optimization makes afork()
followed by anexec()
in the child much cheaper: the child will probably only need to copy one page (the current page of its stack) before it callsexec()
.It's not that expensive (relatively to spawning a process directly), especially with copy-on-write
fork
s like you find in Linux , and it's kind of elegant for:POSIX now has
posix_spawn
that effectively allows you to combine fork/and-exec (possibly more efficiently thanfork
+exec
; if it is more efficient, it'll usually be implemented through some cheaper but less robust fork (clone
/vfork
) followed by exec), but the way it achieves #2 is through a ton of relatively messy options, which can never be as complete and powerful and clean as just allowing you to run arbitrary code just before the new process image is loaded.Another answer states:
Obviously, one person's bad old days are a lot younger than others remember.
The original UNIX systems did not have the memory for running multiple processes and they did not have an MMU for keeping several processes in physical memory ready-to-run at the same logical address space: they swapped out processes to disk that it wasn't currently running.
The fork system call was almost entirely the same as swapping out the current process to disk, except for the return value and for not replacing the remaining in-memory copy by swapping in another process. Since you had to swap out the parent process anyway in order to run the child, fork+exec was not incurring any overhead.
It's true that there was a period of time when fork+exec was awkward: when there were MMUs that provided a mapping between logical and physical address space but page faults did not retain enough information that copy-on-write and a number of other virtual-memory/demand-paging schemes were feasible.
This situation was painful enough, not just for UNIX, that page fault handling of the hardware was adapted to become "replayable" pretty fast.
You have to create a new process somehow. There are very few ways for a userspace program to accomplish that. POSIX used to have
vfork()
alognsidefork()
, and some systems may have their own mechanisms, such as Linux-specificclone()
, but since 2008, POSIX specifies onlyfork()
and theposix_spawn()
family. Thefork
+exec
route is more traditional, is well understood, and has few drawbacks (see below). Theposix_spawn
family is designed as a special purpose substitute for use in contexts that present difficulties forfork()
; you can find details in the "Rationale" section of its specification.This excerpt from the Linux man page for
vfork()
may be illuminating:(Emphasis added)
Thus, your concern about waste is not well-founded for modern systems (not limited to Linux), but it was indeed an issue historically, and there were indeed mechanisms designed to avoid it. These days, most of those mechanisms are obsolete.
A process created by exec() et al, will inherit its file handles from the parent process (including stdin, stdout, stderr). If the parent changes these after calling fork() but before calling exec() then it can control the child's standard streams.
It turns out all those COW page faults are not at all cheap when the process has a few gigabytes of writable RAM. They're all gonna fault once even if the child has long since called
exec()
. Because the child offork()
is no longer allowed to allocate memory even for the single threaded case (you can thank Apple for that one), arranging to callvfork()/exec()
instead is hardly more difficult now.The real advantage to the
vfork()/exec()
model is you can set the child up with an arbitrary current directory, arbitrary environment variables, and arbitrary fs handles (not juststdin/stdout/stderr
), an arbitrary signal mask, and some arbitrary shared memory (using the shared memory syscalls) without having a twenty-argumentCreateProcess()
API that gets a few more arguments every few years.It turned out the "oops I leaked handles being opened by another thread" gaffe from the early days of threading was fixable in userspace w/o process-wide locking thanks to
/proc
. The same would not be in the giantCreateProcess()
model without a new OS version, and convincing everybody to call the new API.So there you have it. An accident of design ended up far better than the directly designed solution.