Lets say, you have an application, which is consuming up all the computational power. Now you want to do some other necessary work. Is there any way on Linux, to interrupt that application and checkpoint its state, so that later on it could be resumed from the state it was interrupted?
Especially I am interested in a way, where the application could be stopped and restarted on another machine. Is that possible too?
From the man pages
man kill
Interrupting a process requires two steps:
To stop
and
To continue
Where
<pid>
is the process-id.Type:
Control + Z
to suspend a process (it sends a SIGTSTP)then
bg
/fg
to resume it in background or in foregroundOn linux it is achivable by sending this process STOP signal. Leter on you resume it by sending CONT signal. Please refer to kill manual.
In general terms, checkpointing a process is not entirely possible (because a process is not only an address space, but also has other resources likes file descriptors, and TCP/IP sockets ...).
In practice, you can use some checkpointing libraries like BLCR etc. With certain limiting conditions, you might be able to migrate a checkpoint image from one system to another one (very similar to the source one: same kernel, same versions of libraries & compilers, etc.).
Migrating images is also possible at the virtual machine level. Some of them are quite good for that.
You could also design and implement your software with your own checkpointing machinery. Then, you should think of using garbage collection techniques and terminology. Look also into Emacs (or Xemacs) unexec.c file (which is heavily machine dependent).
Some languages implementation & runtime have checkpointing primitives. SBCL (a free Common Lisp implementation) is able to save a core image and restart it later. SML/NJ is able to export an image. Squeak (a Smalltalk implementation) also has such ability.
As an other example of checkpointing, the GCC compiler is actually able to compile a single
*.h
header (into a pre-compiled header file which is a persistent image of GCC heap) by using persistence techniques.Read more about orthogonal persistence. It is also a research subject. serialization is also relevant (and you might want to use textual formats à la JSON, YAML, XML, ...). You might also use hibernation techniques (on the whole system level).
Checkingpointing an individual process is fundamentally impossible on POSIX. That's because processes are not independent; they can interact. If nothing else, a process has a unique process ID, which it might have stored somewhere internally, and if you resume it with a different process ID, all hell could break loose. This is especially true if the process uses any kind of locks/synchronization primitives. Of course you also can't resume the process with the same process ID it originally had, since that might be taken by a new process.
Perhaps you could solve the problem by making process (and thread) ids 128-bit or so, such that they're universally unique...