Linux Process States

2019-01-16 01:42发布

问题:

In Linux, what happens to the state of a process when it needs to read blocks from a disk? Is it blocked? If so, how is another process chosen to execute?

回答1:

While waiting for read() or write() to/from a file descriptor return, the process will be put in a special kind of sleep, known as "D" or "Disk Sleep". This is special, because the process can not be killed or interrupted while in such a state. A process waiting for a return from ioctl() would also be put to sleep in this manner.

An exception to this is when a file (such as a terminal or other character device) is opened in O_NONBLOCK mode, passed when its assumed that a device (such as a modem) will need time to initialize. However, you indicated block devices in your question. Also, I have never tried an ioctl() that is likely to block on a fd opened in non blocking mode (at least not knowingly).

How another process is chosen depends entirely on the scheduler you are using, as well as what other processes might have done to modify their weights within that scheduler.

Some user space programs under certain circumstances have been known to remain in this state forever, until rebooted. These are typically grouped in with other "zombies", but the term would not be correct as they are not technically defunct.



回答2:

When a process needs to fetch data from a disk, it effectively stops running on the CPU to let other processes run because the operation might take a long time to complete – at least 5ms seek time for a disk is common, and 5ms is 10 million CPU cycles, an eternity from the point of view of the program!

From the programmer point of view (also said "in userspace"), this is called a blocking system call. If you call write(2) (which is a thin libc wrapper around the system call of the same name), your process does not exactly stops at that boundary: it continues, on kernel side, running the system call code. Most of the time it goes all the way up to a specific disk controller driver (filename → filesystem/VFS → block device → device driver), where a command to fetch a block on disk is submitted to the proper hardware: this is a very fast operation most of the time.

THEN the process is put in sleep state (in kernel space, blocking is called sleeping – nothing is ever 'blocked' from the kernel point of view). It will be awoken again once the hardware has finally fetched the proper data, then the process will be marked as runnable, scheduled and run as soon as the scheduler allows it to.

Finally in userspace the blocking system call returns with proper status and data, and the program flow goes on.

It is possible to invoke most I/O system calls in non-blocking mode (see O_NONBLOCK in open(2) and fcntl(2)). In this case, the system calls return immediately and only tells about the proper submission of the disk operation. The programmer will have to explicitly check at a later time if the operation completed, with success or not, and fetch its result (e.g. with select(2)). This is called asynchronous or event based programming.

Most answers here mentioning the D state (which exact name is TASK_UNINTERRUPTIBLE from Linux sate names) are incorrect. The D state is a special sleep mode which is only triggered in a kernel space code path, when that code path can't be interrupted (because it would be to complex to program), most of the time in the hope that it would block very shortly. I believe that most "D states" are actually invisible, they are very short lived and can't be observed by sampling tools such as 'top'.

But you will sometimes encounter those unkillable processes in D state in a few situations. NFS is famous for that, and I've encountered it many times. I think there's a semantic clash between some VFS code paths which assume to always reach local disks and fast error detection (on SATA, an error timeout would be around a few 100 ms), and NFS which actually fetches data from the network which is more resilient and has slow recovery (a TCP timeout of 300 seconds is common). Read this article for the cool solution introduced in Linux 2.6.25 with the TASK_KILLABLE state. Before this era there was a hack where you could actually send signals to NFS process clients by sending a SIGKILL to the kernel thread rpciod, but forget about that ugly trick…



回答3:

A process performing I/O will be put in D state (uninterruptable sleep), which frees the CPU until there is a hardware interrupt which tells the CPU to return to executing the program. See man ps for the other process states.

Depending on your kernel, there is a process scheduler, which keeps track of a runqueue of processes ready to execute. It, along with a scheduling algorithm, tells the kernel which process to assign to which CPU. There are kernel processes and user processes to consider. Each process is allocated a time-slice, which is a chunk of CPU time it is allowed to use. Once the process uses all of its time-slice, it is marked as expired and given lower priority in the scheduling algorithm.

In the 2.6 kernel, there is a O(1) time complexity scheduler, so no matter how many processes you have up running, it will assign CPUs in constant time. It is more complicated though, since 2.6 introduced preemption and CPU load balancing is not an easy algorithm. In any case, it’s efficient and CPUs will not remain idle while you wait for the I/O.



回答4:

As already explained by others, processes in "D" state (uninterruptible sleep) are responsible for the hang of ps process. To me it has happened many times with RedHat 6.x and automounted NFS home directories.

To list processes in D state you can use the following commands:

cd /proc
for i in [0-9]*;do echo -n "$i :";cat $i/status |grep ^State;done|grep D

To know the current directory of the process and, may be, the mounted NFS disk that has issues you can use a command similar to the following example (replace 31134 with the sleeping process number):

# ls -l /proc/31134/cwd
lrwxrwxrwx 1 pippo users 0 Aug  2 16:25 /proc/31134/cwd -> /auto/pippo

I found that giving the umount command with the -f (force) switch, to the related mounted nfs file system, was able to wake-up the sleeping process:

umount -f /auto/pippo

the file system wasn't unmounted, because it was busy, but the related process did wake-up and I was able to solve the issue without rebooting.



回答5:

Assuming your process is a single thread, and that you're using blocking I/O, your process will block waiting for the I/O to complete. The kernel will pick another process to run in the meantime based on niceness, priority, last run time, etc. If there are no other runnable processes, the kernel won't run any; instead, it'll tell the hardware the machine is idle (which will result in lower power consumption).

Processes that are waiting for I/O to complete typically show up in state D in, e.g., ps and top.



回答6:

Yes, the task gets blocked in the read() system call. Another task which is ready runs, or if no other tasks are ready, the idle task (for that CPU) runs.

A normal, blocking disc read causes the task to enter the "D" state (as others have noted). Such tasks contribute to the load average, even though they're not consuming the CPU.

Some other types of IO, especially ttys and network, do not behave quite the same - the process ends up in "S" state and can be interrupted and doesn't count against the load average.



回答7:

Yes, tasks waiting for IO are blocked, and other tasks get executed. Selecting the next task is done by the Linux scheduler.



回答8:

Generally the process will block. If the read operation is on a file descriptor marked as non-blocking or if the process is using asynchronous IO it won't block. Also if the process has other threads that aren't blocked they can continue running.

The decision as to which process runs next is up to the scheduler in the kernel.