I want to know how copy-on-write happens in fork().
Assuming we have a process A that has a dynamical int array:
int *array = malloc(1000000*sizeof(int));
Elements in array are initialized to some meaningful values.
Then, we use fork() to create a child process, namely B.
B will iterate the array and do some calculations:
for(a in array){
a = a+1;
}
- I know B will not copy the entire array immediately, but when does the child B allocate memory for array? during fork()?
- Does it allocate the entire array all at once, or only a single integer for
a = a+1
?
a = a+1;
how does this happen? Does B read data from A and write new data to its own array?
I wrote some code to explore how COW works. My environment: ubuntu 14.04, gcc4.8.2
#include <stdlib.h>
#include <stdio.h>
#include <sys/sysinfo.h>
void printMemStat(){
struct sysinfo si;
sysinfo(&si);
printf("===\n");
printf("Total: %llu\n", si.totalram);
printf("Free: %llu\n", si.freeram);
}
int main(){
long len = 200000000;
long *array = malloc(len*sizeof(long));
long i = 0;
for(; i<len; i++){
array[i] = i;
}
printMemStat();
if(fork()==0){
/*child*/
printMemStat();
i = 0;
for(; i<len/2; i++){
array[i] = i+1;
}
printMemStat();
i = 0;
for(; i<len; i++){
array[i] = i+1;
}
printMemStat();
}else{
/*parent*/
int times=10;
while(times-- > 0){
sleep(1);
}
}
return 0;
}
After fork(), the child process modifies a half of numbers in array, and then modifies the entire array. The outputs are:
===
Total: 16694571008
Free: 2129162240
===
Total: 16694571008
Free: 2126106624
===
Total: 16694571008
Free: 1325101056
===
Total: 16694571008
Free: 533794816
It seems that the array is not allocated as a whole. If I slightly change the first modification phase to:
i = 0;
for(; i<len/2; i++){
array[i*2] = i+1;
}
The outputs will be:
===
Total: 16694571008
Free: 2129924096
===
Total: 16694571008
Free: 2126868480
===
Total: 16694571008
Free: 526987264
===
Total: 16694571008
Free: 526987264
Depends on the Operating System, hardware architecture and libc. But yes in case of recent Linux with MMU the fork(2) will work with copy-on-write. It will only (allocate and) copy a few system structures and the page table, but the heap pages actually point to the ones of the parent until written.
More control over this can be exercised with the clone(2) call. And vfork(2) beeing a special variant which does not expect the pages to be used. This is typically used before exec().
As for the allocation: the malloc() has meta information over requested memory blocks (address and size) and the C variable is a pointer (both in process memory heap and stacks). Those two look the same for the child (same values because same underlying memory page seen in the address space of both processes). So from a C program point of view the array is already allocated and the variable initialized when the process comes into existence. The underlying memory pages are however pointing to the original physical ones of the parent process, so no extra memory pages are needed until they are modified.
If the child allocates a new array it depends if it fits into the already existing heap pages or if the brk of the process needs to be increased. In both cases only the modified pages get copied and the new pages get allocated only for the child.
This also means that the physical memory might run out after malloc(). (Which is bad as the program cannot check the error return code of "a operation in a random code line"). Some operating systems will not allow this form of overcommit: So if you fork a process it will not allocate the pages, but it requires them to be available at that moment (kind of reserves them) just in case. In Linux this is configurable and called overcommit-accounting.
Some systems have a system call vfork()
, which was originally
designed as a lower-overhead version of fork()
. Since
fork()
involved copying the entire address space of the process,
and was therefore quite expensive, the vfork()
function was
introduced (in 3.0BSD).
However, since vfork()
was introduced, the
implementation of fork()
has improved drastically, most notably
with the introduction of 'copy-on-write', where the copying of the
process address space is transparently faked by allowing both processes
to refer to the same physical memory until either of them modify
it. This largely removes the justification for vfork();
indeed, a
large proportion of systems now lack the original functionality of
vfork()
completely. For compatibility, though, there may still be
a vfork()
call present, that simply calls fork()
without
attempting to emulate all of the vfork()
semantics.
As a result, it is very unwise to actually make use of any of the
differences between fork()
and vfork()
. Indeed, it is
probably unwise to use vfork()
at all, unless you know exactly
why you want to.
The basic difference between the two is that when a new process is created with vfork()
, the parent process is temporarily suspended, and the child process might borrow the parent's address space. This strange state of affairs continues until the child process either exits, or calls execve()
, at which point the parent process continues.
This means that the child process of a vfork()
must be careful to
avoid unexpectedly modifying variables of the parent process. In
particular, the child process must not return from the function
containing the vfork()
call, and it must not call
exit()
(if it needs to exit, it should use _exit();
actually, this is also true for the child of a normal fork()
).