可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Im really new to forking, what is the pid doing in this code? Can someone please explain what comes out at line X and line Y ?
#include <sys/types.h>
#include <stdio.h>
#include <unistd.h>
#define SIZE 5
int nums[SIZE] = {0,1,2,3,4};
int main()
{
int i;
pid_t pid;
pid = fork();
if (pid == 0) {
for (i = 0; i < SIZE; i++) {
nums[i] *= -i;
printf("CHILD: %d ",nums[i]); /* LINE X */
}
}
else if (pid > 0) {
wait(NULL);
for (i = 0; i < SIZE; i++)
printf("PARENT: %d ",nums[i]); /* LINE Y */
}
return 0;
}
回答1:
fork()
duplicates the process, so after calling fork there are actually 2 instances of your program running.
How do you know which process is the original (parent) one, and which is the new (child) one?
In the parent process, the PID of the child process (which will be a positive integer) is returned from fork()
. That's why the if (pid > 0) { /* PARENT */ }
code works. In the child process, fork()
just returns 0
.
Thus, because of the if (pid > 0)
check, the parent process and the child process will produce different output, which you can see here (as provided by @jxh in the comments).
回答2:
Simplest example for fork()
printf("I'm printed once!\n");
fork();
// Now there are two processes running one is parent and another child.
// and each process will print out the next line.
printf("You see this line twice!\n");
The return value of fork(). Return value -1= failed; 0= in child process; positive = in parent process (and the return value is the child process id)
pid_t id = fork();
if (id == -1) exit(1); // fork failed
if (id > 0)
{
// I'm the original parent and
// I just created a child process with id 'id'
// Use waitpid to wait for the child to finish
} else { // returned zero
// I must be the newly made child process
}
What is different in the child process than the parent process?
- The parent is notified via a signal when the child process finishes but not vice versa.
- The child does not inherit pending signals or timer alarms. For a complete list see the fork()
- Here the process id can be returned by getpid(). The parent process id can be returned by getppid().
Now let's visualize your program code
pid_t pid;
pid = fork();
Now OS make two identical copies of address spaces, one for the parent and the other for the child.
Both parent and child process start their execution right after the system call fork(). Since both processes have identical but separate address spaces, those variables initialized before the fork() call have the same values in both address spaces. Every process has its own address space so any modifications will be independent of the others. If the parent changes the value of its variable, the modification will only affect the variable in the parent process's address space. Other address spaces created by fork() sysem calls will not be affected even though they have identical variable names .
Here parent pid is non-zero, it calls function ParentProcess(). On the other hand, the child has a zero pid and it calls ChildProcess() as shown below:
In your code parent process call wait()
it pauses at that point until the child exits. So the child's output appears first.
if (pid == 0) {
// The child runs this part because fork returns 0 to the child
for (i = 0; i < SIZE; i++) {
nums[i] *= -i;
printf("CHILD: %d ",nums[i]); /* LINE X */
}
}
OUTPUT from child process
what comes out at line X
CHILD: 0 CHILD: -1 CHILD: -4 CHILD: -9 CHILD: -16
Then after the child exits, the parent continues from after the wait() call and prints its output next.
else if (pid > 0) {
wait(NULL);
for (i = 0; i < SIZE; i++)
printf("PARENT: %d ",nums[i]); /* LINE Y */
}
OUTPUT from parent process:
what comes out at line Y
PARENT: 0 PARENT: 1 PARENT: 2 PARENT: 3 PARENT: 4
At last both output combined by child and parent process will be shown on terminal as follow:
CHILD: 0 CHILD: -1 CHILD: -4 CHILD: -9 CHILD: -16 PARENT: 0 PARENT: 1 PARENT: 2 PARENT: 3 PARENT: 4
For more info refer this link
回答3:
The fork()
function is special because it actually returns twice: once to the parent process and once to the child process. In the parent process, fork()
returns the pid of the child. In the child process, it returns 0. In the event of an error, no child process is created and -1 is returned to the parent.
After a successful call to fork()
, the child process is basically an exact duplicate of the parent process. Both have their own copies of all local and global variables, and their own copies of any open file descriptors. Both processes run concurrently, and because they share the same file descriptors, the output of each process will likely interleave with each other.
Taking a closer look at the example in the question:
pid_t pid;
pid = fork();
// When we reach this line, two processes now exist,
// with each one continuing to run from this point
if (pid == 0) {
// The child runs this part because fork returns 0 to the child
for (i = 0; i < SIZE; i++) {
nums[i] *= -i;
printf("CHILD: %d ",nums[i]); /* LINE X */
}
}
else if (pid > 0) {
// The parent runs this part because fork returns the child's pid to the parent
wait(NULL); // this causes the parent to wait until the child exits
for (i = 0; i < SIZE; i++)
printf("PARENT: %d ",nums[i]); /* LINE Y */
}
This will output the following:
CHILD: 0 CHILD: -1 CHILD: -4 CHILD: -9 CHILD: -16 PARENT: 0 PARENT: 1 PARENT: 2 PARENT: 3 PARENT: 4
Because the parent process calls wait()
it pauses at that point until the child exits. So the child's output appears first. Then after the child exits, the parent continues from after the wait()
call and prints its output next.
回答4:
fork()
is a function call that creates a process. The process that invokes fork()
is called the parent process, and the newly created process is called the child process.
On return from the fork()
system call, the two processes have identical copies of their user-level context, except for the return value, pid
.
In the parent process, pid
is the Child process ID
(process ID of the newly created child process).
In the child process, pid
is 0
.
The kernel does the following sequence of operations for fork()
.
- It allocates a slot in the process table for the new process.
- It assigns a unique
ID
number to the child process.
- It makes a logical copy of the context of the parent process. Since certain
portions of a process, such as the text region, may be shared between
processes, the kernel can sometimes increment a region reference count
instead of copying the region to a new physical location in memory,
- It increments file and mode table counters for files associated with the
process.
- It returns the
ID
number of the child to the parent process, and a 0
value to the child process.
Now lets see what happens in your code when you call
fork()
01: pid = fork();
02: if (pid == 0) {
03: for (i = 0; i < SIZE; i++) {
04: nums[i] *= -i;
05: printf("CHILD: %d ",nums[i]); /* LINE X */
06: }
07: }
08: else if (pid > 0) {
09: wait(NULL);
10: for (i = 0; i < SIZE; i++)
11: printf("PARENT: %d ",nums[i]); /* LINE Y */
12: }
Line 01: fork()
is called, child process is created. fork()
returns and the return value is stored in pid
.
[Note: Since there is no error check in OP's code, it will be discussed later]
Line 02: pid
value is checked against the value 0
. Note that this check is done at both the parent process and the newly created child process. As mentioned above, value of pid
will be 0
in child process, and child process ID
in parent process. So, this condition check evaluates to True
in child process, and False
in parent process. Hence lines 03-07 are executed in the child process.
Line 03-07: These lines are pretty straight forward. The num[]
array of the child process is changed (nums[i] *= -i;
), and is printed out using printf("CHILD: %d ",nums[i]);
.
The thing to note here is that the values that are being printed are of the num[]
array of the child process. The num[]
array of the parent process is till now same as it was before.
There is a neat trick involved here called copy-on-write. Although it is not asked in this question, still it will be an interesting read.
Line 08: This line is now checked in the parent process. It will not be checking in the child process as the previous if
was successful. A process ID is always a positive number, so when parent process got the process ID of the newly created child process, it will always pass the test else if (pid > 0)
, and enter the block.
[Note: It can never be 0
because 0
reserved. Read here.]
Line 09: This line makes the parent process wait until the child process has completed execution. This is the reason you will see all the printf()
of child process before any of the printf()
of the parent process.
Line 10-12: This also is a pretty forward for
loop, which prints the value of num[]
array. Note that the values are unchanged for the parent process. As it was changed by the child process previously, which owns its own copy of the array num[]
.
When fork()
fails.
There is a possibility that fork()
call might fail. In such a case return value is -1
. This should also be handled for the program to be proper.
pid = fork();
if (pid == -1)
perror("Fork failed");
Some contents taken from the book The Design of the UNIX Operating System.
回答5:
In the simplest cases, the behaviour of fork()
is very simple — if a bit mind-blowing on your first encounter with it. It either returns once with an error, or it returns twice, once in the original (parent) process, and once in a brand new almost exact duplicate of the original process (the child process). After return, the two processes are nominally independent, though they share a lot of resources.
pid_t original = getpid();
pid_t pid = fork();
if (pid == -1)
{
/* Failed to fork - one return */
…handle error situation…
}
else if (pid == 0)
{
/* Child process - distinct from original process */
assert(original == getppid() || getppid() == 1);
assert(original != getpid());
…be childish here…
}
else
{
/* Parent process - distinct from child process */
assert(original != pid);
…be parental here…
}
The child process is a copy of the parent. It has the same set of open file descriptors, for example; each file descriptor N that was open in the parent is open in the child, and they share the same open file description. That means that if one of the processes alters the read or write position in a file, that also affects the other process. On the other hand, if one of the processes closes a file, that has no direct effect on the file in the other process.
It also means that if there was data buffered in the standard I/O package in the parent process (e.g. some data had been read from the standard input file descriptor (STDIN_FILENO
) into the data buffer for stdin
, then that data is available to both the parent and the child, and both can read that buffered data without affecting the other, which will also see the same data. On the other hand, once the buffered data is read, if the parent reads another buffer full, that moves the current file position for both the parent and the child, so the child won't then see the data that the parent just read (but if the child also reads a block of data, the parent won't see that). This can be confusing. Consequently, it is usually a good idea to make sure that there's no pending standard I/O before forking — fflush(0)
is one way to do that.
In the code fragment, assert(original == getppid() || getppid() == 1);
allows for the possibility that by the time the child executes the statement, the parent process may have exited, in which case the child will have been inherited by a system process — which normally has PID 1 (I know of no POSIX system where orphaned children are inherited by a different PID, but there probably is one).
Other shared resources, such as memory-mapped files or shared memory, continue to be available in both. The subsequent behaviour of a memory-mapped file depends on the options used to create the mapping; MAP_PRIVATE means that the two processes have independent copies of the data, and MAP_SHARED means that they share the same copy of the data and changes made by one process will be visible in the other.
However, not every program that forks is a simple as the story described so far. For example, the parent process might have acquired some (advisory) locks; those locks are not inherited by the child. The parent may have been multi-threaded; the child has a single thread of execution — and there are constraints placed on what the child may do safely.
The POSIX specification for fork()
specifies the differences in detail:
The fork()
function shall create a new process. The new process (child process) shall be an exact copy of the calling process (parent process) except as detailed below:
The child process shall have a unique process ID.
The child process ID also shall not match any active process group ID.
The child process shall have a different parent process ID, which shall be the process ID of the calling process.
The child process shall have its own copy of the parent's file descriptors. Each of the child's file descriptors shall refer to the same open file description with the corresponding file descriptor of the parent.
The child process shall have its own copy of the parent's open directory streams. Each open directory stream in the child process may share directory stream positioning with the corresponding directory stream of the parent.
The child process shall have its own copy of the parent's message catalog descriptors.
The child process values of tms_utime
, tms_stime
, tms_cutime
, and tms_cstime
shall be set to 0.
The time left until an alarm clock signal shall be reset to zero, and the alarm, if any, shall be canceled; see alarm.
[XSI] ⌦ All semadj values shall be cleared. ⌫
File locks set by the parent process shall not be inherited by the child process.
The set of signals pending for the child process shall be initialized to the empty set.
[XSI] ⌦ Interval timers shall be reset in the child process. ⌫
Any semaphores that are open in the parent process shall also be open in the child process.
[ML] ⌦ The child process shall not inherit any address space memory locks established by the parent process via calls to mlockall()
or mlock()
. ⌫
Memory mappings created in the parent shall be retained in the child process. MAP_PRIVATE mappings inherited from the parent shall also be MAP_PRIVATE mappings in the child, and any modifications to the data in these mappings made by the parent prior to calling fork()
shall be visible to the child. Any modifications to the data in MAP_PRIVATE mappings made by the parent after fork()
returns shall be visible only to the parent. Modifications to the data in MAP_PRIVATE mappings made by the child shall be visible only to the child.
[PS] ⌦ For the SCHED_FIFO and SCHED_RR scheduling policies, the child process shall inherit the policy and priority settings of the parent process during a fork()
function. For other scheduling policies, the policy and priority settings on fork()
are implementation-defined. ⌫
Per-process timers created by the parent shall not be inherited by the child process.
[MSG] ⌦ The child process shall have its own copy of the message queue descriptors of the parent. Each of the message descriptors of the child shall refer to the same open message queue description as the corresponding message descriptor of the parent. ⌫
No asynchronous input or asynchronous output operations shall be inherited by the child process. Any use of asynchronous control blocks created by the parent produces undefined behavior.
A process shall be created with a single thread. If a multi-threaded process calls fork()
, the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. Fork handlers may be established by means of the pthread_atfork()
function in order to maintain application invariants across fork()
calls.
When the application calls fork()
from a signal handler and any of the fork handlers registered by pthread_atfork()
calls a function that is not async-signal-safe, the behavior is undefined.
[OB TRC TRI] ⌦ If the Trace option and the Trace Inherit option are both supported:
If the calling process was being traced in a trace stream that had its inheritance policy set to POSIX_TRACE_INHERITED, the child process shall be traced into that trace stream, and the child process shall inherit the parent's mapping of trace event names to trace event type identifiers. If the trace stream in which the calling process was being traced had its inheritance policy set to POSIX_TRACE_CLOSE_FOR_CHILD, the child process shall not be traced into that trace stream. The inheritance policy is set by a call to the posix_trace_attr_setinherited()
function. ⌫
[OB TRC] ⌦ If the Trace option is supported, but the Trace Inherit option is not supported:
The child process shall not be traced into any of the trace streams of its parent process. ⌫
[OB TRC] ⌦ If the Trace option is supported, the child process of a trace controller process shall not control the trace streams controlled by its parent process. ⌫
[CPT] ⌦ The initial value of the CPU-time clock of the child process shall be set to zero. ⌫
[TCT] The initial value of the CPU-time clock of the single thread of the child process shall be set to zero.⌫
All other process characteristics defined by POSIX.1-2008 shall be the same in the parent and child processes. The inheritance of process characteristics not defined by POSIX.1-2008 is unspecified by POSIX.1-2008.
After fork()
, both the parent and the child processes shall be capable of executing independently before either one terminates.
Most of these issues do not affect most programs, but multi-threaded programs that fork need to be very careful. It is worth reading the Rationale section of the POSIX definition of fork()
.
Inside the kernel, the system manages all the issues highlighted in the definition above. Memory page mapping tables have to be replicated. The kernel will typically mark the (writable) memory pages as COW — copy on write — so that until one or the other process modifies memory, they can access the same memory. This minimizes the cost of replicating the process; memory pages are only made distinct when they're modified. Many resources, though, such as file descriptors, have to be replicated, so fork()
is quite an expensive operation (though not as expensive as the exec*()
functions). Note that replicating a file descriptor leaves both descriptors referring to the same open file description — see the open()
and dup2()
system calls for a discussion of the distinctions between file descriptors and open file descriptions.