Will os.fork() use copy on write or do a full copy

I would like to load a rather large data structure into a process and then fork in the hope to reduce total memory consumption. Will os.fork work that way or copy all of the parent process in Linux (RHEL)?

回答1:

Even if COW is employed, CPython uses reference counting and stores the reference count in each object's header. So unless you don't do anything with that data, you'll quickly have spurious writes to the memory in question, which will force the system to copy the data. Pass it to a function? That's another reference, an INCREF, a write to the COW'd memory. Store it in a variable or object attribute? Same. Even just look up a method on it? Ditto. Some builtin data structures allocate the bulk of their data separately from the object (e.g. most collections) for various reasons. If these end up on a different page -- or whatever granularity COW works on -- you may get lucky with those. However, an object referenced from such a collection is not exempt -- using it manipulates its refcount just the same.

In addition, a bit of data will be shared because there are no writes to it by design (e.g., the native CPython code), and some objects your fork'd process does not touch may be shared (I'm honestly not sure; I think the cycle GC does not write to the object). But Python objects used by Python code is virtually guaranteed to get written to. Similar reasoning applies to PyPy, Jython, IronPython, etc. (only that they fiddle with bits in the object header instead of doing reference counting) though I can't vouch for all possible configurations.

回答2:

If you are on a *nix system, then os.fork() will use the system's fork() call which, at least in the case of Linux, is copy-on-write:

http://linux.die.net/man/2/fork

See "notes" section