Im really new to forking, what is the pid doing in this code? Can someone please explain what comes out at line X and line Y ?
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
#define SIZE 5
int nums[SIZE] = {0,1,2,3,4};
int main()
{
int i;
pid_t pid;
pid = fork();
if (pid == 0) {
for (i = 0; i < SIZE; i++) {
nums[i] *= -i;
printf("CHILD: %d ",nums[i]); /* LINE X */
}
}
else if (pid > 0) {
wait(NULL);
for (i = 0; i < SIZE; i++)
printf("PARENT: %d ",nums[i]); /* LINE Y */
}
return 0;
}
In the simplest cases, the behaviour of
fork()
is very simple — if a bit mind-blowing on your first encounter with it. It either returns once with an error, or it returns twice, once in the original (parent) process, and once in a brand new almost exact duplicate of the original process (the child process). After return, the two processes are nominally independent, though they share a lot of resources.The child process is a copy of the parent. It has the same set of open file descriptors, for example; each file descriptor N that was open in the parent is open in the child, and they share the same open file description. That means that if one of the processes alters the read or write position in a file, that also affects the other process. On the other hand, if one of the processes closes a file, that has no direct effect on the file in the other process.
It also means that if there was data buffered in the standard I/O package in the parent process (e.g. some data had been read from the standard input file descriptor (
STDIN_FILENO
) into the data buffer forstdin
, then that data is available to both the parent and the child, and both can read that buffered data without affecting the other, which will also see the same data. On the other hand, once the buffered data is read, if the parent reads another buffer full, that moves the current file position for both the parent and the child, so the child won't then see the data that the parent just read (but if the child also reads a block of data, the parent won't see that). This can be confusing. Consequently, it is usually a good idea to make sure that there's no pending standard I/O before forking —fflush(0)
is one way to do that.In the code fragment,
assert(original == getppid() || getppid() == 1);
allows for the possibility that by the time the child executes the statement, the parent process may have exited, in which case the child will have been inherited by a system process — which normally has PID 1 (I know of no POSIX system where orphaned children are inherited by a different PID, but there probably is one).Other shared resources, such as memory-mapped files or shared memory, continue to be available in both. The subsequent behaviour of a memory-mapped file depends on the options used to create the mapping; MAP_PRIVATE means that the two processes have independent copies of the data, and MAP_SHARED means that they share the same copy of the data and changes made by one process will be visible in the other.
However, not every program that forks is a simple as the story described so far. For example, the parent process might have acquired some (advisory) locks; those locks are not inherited by the child. The parent may have been multi-threaded; the child has a single thread of execution — and there are constraints placed on what the child may do safely.
The POSIX specification for
fork()
specifies the differences in detail:Most of these issues do not affect most programs, but multi-threaded programs that fork need to be very careful. It is worth reading the Rationale section of the POSIX definition of
fork()
.Inside the kernel, the system manages all the issues highlighted in the definition above. Memory page mapping tables have to be replicated. The kernel will typically mark the (writable) memory pages as COW — copy on write — so that until one or the other process modifies memory, they can access the same memory. This minimizes the cost of replicating the process; memory pages are only made distinct when they're modified. Many resources, though, such as file descriptors, have to be replicated, so
fork()
is quite an expensive operation (though not as expensive as theexec*()
functions). Note that replicating a file descriptor leaves both descriptors referring to the same open file description — see theopen()
anddup2()
system calls for a discussion of the distinctions between file descriptors and open file descriptions.Simplest example for fork()
The return value of fork(). Return value -1= failed; 0= in child process; positive = in parent process (and the return value is the child process id)
What is different in the child process than the parent process?
Now let's visualize your program code
Now OS make two identical copies of address spaces, one for the parent and the other for the child.
Both parent and child process start their execution right after the system call fork(). Since both processes have identical but separate address spaces, those variables initialized before the fork() call have the same values in both address spaces. Every process has its own address space so any modifications will be independent of the others. If the parent changes the value of its variable, the modification will only affect the variable in the parent process's address space. Other address spaces created by fork() sysem calls will not be affected even though they have identical variable names .
Here parent pid is non-zero, it calls function ParentProcess(). On the other hand, the child has a zero pid and it calls ChildProcess() as shown below:
In your code parent process call
wait()
it pauses at that point until the child exits. So the child's output appears first.OUTPUT from child process
Then after the child exits, the parent continues from after the wait() call and prints its output next.
OUTPUT from parent process:
At last both output combined by child and parent process will be shown on terminal as follow:
For more info refer this link
The
fork()
function is special because it actually returns twice: once to the parent process and once to the child process. In the parent process,fork()
returns the pid of the child. In the child process, it returns 0. In the event of an error, no child process is created and -1 is returned to the parent.After a successful call to
fork()
, the child process is basically an exact duplicate of the parent process. Both have their own copies of all local and global variables, and their own copies of any open file descriptors. Both processes run concurrently, and because they share the same file descriptors, the output of each process will likely interleave with each other.Taking a closer look at the example in the question:
This will output the following:
Because the parent process calls
wait()
it pauses at that point until the child exits. So the child's output appears first. Then after the child exits, the parent continues from after thewait()
call and prints its output next.fork()
duplicates the process, so after calling fork there are actually 2 instances of your program running.How do you know which process is the original (parent) one, and which is the new (child) one?
In the parent process, the PID of the child process (which will be a positive integer) is returned from
fork()
. That's why theif (pid > 0) { /* PARENT */ }
code works. In the child process,fork()
just returns0
.Thus, because of the
if (pid > 0)
check, the parent process and the child process will produce different output, which you can see here (as provided by @jxh in the comments).fork()
is a function call that creates a process. The process that invokesfork()
is called the parent process, and the newly created process is called the child process.On return from the
fork()
system call, the two processes have identical copies of their user-level context, except for the return value,pid
.In the parent process,
pid
is theChild process ID
(process ID of the newly created child process).In the child process,
pid
is0
.The kernel does the following sequence of operations for
fork()
.ID
number to the child process.ID
number of the child to the parent process, and a0
value to the child process.Now lets see what happens in your code when you call
fork()
Line 01:
fork()
is called, child process is created.fork()
returns and the return value is stored inpid
.[Note: Since there is no error check in OP's code, it will be discussed later]
Line 02:
pid
value is checked against the value0
. Note that this check is done at both the parent process and the newly created child process. As mentioned above, value ofpid
will be0
in child process, andchild process ID
in parent process. So, this condition check evaluates toTrue
in child process, andFalse
in parent process. Hence lines 03-07 are executed in the child process.Line 03-07: These lines are pretty straight forward. The
num[]
array of the child process is changed (nums[i] *= -i;
), and is printed out usingprintf("CHILD: %d ",nums[i]);
.The thing to note here is that the values that are being printed are of the
num[]
array of the child process. Thenum[]
array of the parent process is till now same as it was before.There is a neat trick involved here called copy-on-write. Although it is not asked in this question, still it will be an interesting read.
Line 08: This line is now checked in the parent process. It will not be checking in the child process as the previous
if
was successful. A process ID is always a positive number, so when parent process got the process ID of the newly created child process, it will always pass the testelse if (pid > 0)
, and enter the block.[Note: It can never be
0
because0
reserved. Read here.]Line 09: This line makes the parent process wait until the child process has completed execution. This is the reason you will see all the
printf()
of child process before any of theprintf()
of the parent process.Line 10-12: This also is a pretty forward
for
loop, which prints the value ofnum[]
array. Note that the values are unchanged for the parent process. As it was changed by the child process previously, which owns its own copy of the arraynum[]
.When
fork()
fails.There is a possibility that
fork()
call might fail. In such a case return value is-1
. This should also be handled for the program to be proper.Some contents taken from the book The Design of the UNIX Operating System.