I've heard that creating a new process on a Windows box is more expensive than on Linux. Is this true? Can somebody explain the technical reasons for why it's more expensive and provide any historical reasons for the design decisions behind those reasons?
相关问题
- Is shmid returned by shmget() unique across proces
- Faster loop: foreach vs some (performance of jsper
- Inheritance impossible in Windows Runtime Componen
- how to get running process information in java?
- Why wrapping a function into a lambda potentially
Adding to what JP said: most of the overhead belongs to Win32 startup for the process.
The Windows NT kernel actually does support COW fork. SFU (Microsoft's UNIX environment for Windows) uses them. However, Win32 does not support fork. SFU processes are not Win32 processes. SFU is orthogonal to Win32: they are both environment subsystems built on the same kernel.
In addition to the out-of-process LPC calls to
CSRSS
, in XP and later there is an out of process call to the application compatibility engine to find the program in the application compatibility database. This step causes enough overhead that Microsoft provides a group policy option to disable the compatibility engine on WS2003 for performance reasons.The Win32 runtime libraries (kernel32.dll, etc.) also do a lot of registry reads and initialization on startup that don't apply to UNIX, SFU or native processes.
Native processes (with no environment subsystem) are very fast to create. SFU does a lot less than Win32 for process creation, so its processes are also fast to create.
mweerden: NT has been designed for multi-user from day one, so this is not really a reason. However, you are right about that process creation plays a less important role on NT than on Unix as NT, in contrast to Unix, favors multithreading over multiprocessing.
Rob, it is true that fork is relatively cheap when COW is used, but as a matter of fact, fork is mostly followed by an exec. And an exec has to load all images as well. Discussing the performance of fork therefore is only part of the truth.
When discussing the speed of process creation, it is probably a good idea to distinguish between NT and Windows/Win32. As far as NT (i.e. the kernel itself) goes, I do not think process creation (NtCreateProcess) and thread creation (NtCreateThread) is significantly slower as on the average Unix. There might be a little bit more going on, but I do not see the primary reason for the performance difference here.
If you look at Win32, however, you'll notice that it adds quite a bit of overhead to process creation. For one, it requires the CSRSS to be notified about process creation, which involves LPC. It requires at least kernel32 to be loaded additionally, and it has to perform a number of additional bookkeeping work items to be done before the process is considered to be a full-fledged Win32 process. And let's not forget about all the additional overhead imposed by parsing manifests, checking if the image requires a compatbility shim, checking whether software restriction policies apply, yada yada.
That said, I see the overall slowdown in the sum of all those little things that have to be done in addition to the raw creation of a process, VA space, and initial thread. But as said in the beginning -- due to the favoring of multithreading over multitasking, the only software that is seriously affected by this additional expense is poorly ported Unix software. Although this sitatuion changes when software like Chrome and IE8 suddenly rediscover the benefits of multiprocessing and begin to frequently start up and teardown processes...
The short answer is "software layers and components".
The windows SW architecture has a couple of additional layers and components that don't exist on Unix or are simplified and handled inside the kernel on Unix.
On Unix, fork and exec are direct calls to the kernel.
On Windows, the kernel API is not used directly, there is win32 and certain other components on top of it, so process creation must go through extra layers and then the new process must start up or connect to those layers and components.
For quite some time researchers and corporations have attempted to break up Unix in a vaguely similar way, usually basing their experiments on the Mach kernel; a well-known example is OS X.. Every time they try, though, it gets so slow they end up at least partially merging the pieces back into the kernel either permanently or for production shipments.
In addition to the answer of Rob Walker: Nowadays you have things like the Native POSIX Thread Library - if you want. But for a long time the only way to "delegate" the work in the unix world was to use fork() (and it's still prefered in many, many circumstances). e.g. some kind of socket server
Therefore the implementation of fork had to be fast and lots optimizations have been implemented over time. Microsoft endorsed CreateThread or even fibers instead of creating new processes and usage of interprocess communication. I think it's not "fair" to compare CreateProcess to fork since they are not interchangeable. It's probably more appropriate to compare fork/exec to CreateProcess.