I have this problem to solve that I have no idea how to do it because there's only a few system calls we can use to solve it and I don't see how they are helpful for the situation.
The Exercise:
I have matrix with size [10][1000000] with integers and for each line I create a new process with fork(). The idea of each process is to go through all the numbers for that specific line and find a specific number then print a message about it. This was the first step of the problem and it's done. The second step is to print the total of occurrences of that number on each line by order. And in the end, the grand total of occurrences of that number.
The Calls:
The system calls I can use are described like this in the document for this exercise:
pid_t fork(void);
void exit(int status);
pid_t wait(int *status);
pid_t waitpid(pid_t pid, int *status, int options);
The Problem:
I have no idea how to do it because the exit()
call only allows me to pass a number below 256, what if the number of occurrences is larger than this number? How shall I return such a number?
Another Problem:
I don't exactly understand the difference between wait()
and waitpid()
and how/where to use one over the other. Besides the man pages, are there any more documentation where I can see code examples and such to understand them better? Or can someone explain me the differences and provide a basic example demonstrating such differences?
Use waitpid()
to garner the exit statuses of the child processes in sequence; using wait()
makes no guarantee about the sequence in which the child corpses will be retrieved.
On Unix, the exit status is limited to 8 bits, which can be treated as signed or unsigned by the program retrieving the data. You also get an 8 bit value identifying the signal number and core dump status of the terminated child. AFAIK, either the status or the signal bits are always zero (and often both - when the process exits successfully).
If you don't know that the numbers to be returned are smaller than 256, then exit status is not the way to go. As others have said, you have to use some other IPC in that case. If the only system calls permitted are those, then you have to conclude that the values will be less than 255, or that overflows don't matter. Neither is satisfactory as a conclusion outside a homework exercise, but in the 'real world', you are not limited to just 4 system calls either.
See also Exit codes bigger than 255?. Note that on Windows, the range of exit codes is much larger - but you don't use the system calls listed in the question.
Observation: when I do exit(1)
, the value in status from wait()
is 256; is there a reason for that?
Answer: yes. The low 8 bits of the status word encode the signal number and so on; the high 8 bits of the (16-bit) status word encode the exit status.
See <sys/wait.h>
and macros WIFEXITED(), WEXITSTATUS(), etc.
I think what you are doing should work fine - just return the number of occurrences as the exit code from the process.
You mention that exit() will only allow numbers below 256. I highly doubt if this is the case, but it would be simple enough for you to write a test program to find out for sure.
Sounds like this is really just a simplified version of Map-Reduce. You might want to have a look at that algorithm as well for some ideas on how you could further parallelize the program - and maybe get some extra credit :)
As for the difference between wait() and waitpid() - if you just want to wait for any of your child processes to complete, you would use wait(). If you want to wait only for a specific child process or if you wanted to just check if a child process has exited without hanging, you would use waitpid().