I've been reading the "Real World Haskell" book, the chapter on concurrency and parallelism. My question is as follows:
Since Haskell threads are really just multiple "virtual" threads inside one "real" OS-thread, does this mean that creating a lot of them (like 1000) will not have a drastic impact on performance? I.e., can we say that the overhead incurred from creating a Haskell thread with forkIO
is (almost) negligible? Please bring pactical examples if possible.
Doesn't the concept of lightweight threads prevent us from using the benefints of multicore architectures? As I understand, it is not possible for two Haskell threads to execute concurrently on two separate cores, because they are really one single thread from the operating system's point of view. Or does the Haskell runtime do some clever tricks to ensure that multiple CPU's can be made use of?
GHC's runtime provides an execution environment supporting billions of sparks, thousands of lightweight threads, which may be distributed over multiple hardware cores. Compile with -threaded
and use the +RTS -N4
flags to set your desired number of cores.
Specifically:
does this mean that creating a lot of them (like 1000) will not have a drastic impact on performance?
Well, creating 1,000,000 of them is certainly possible. 1000 is so cheap it won't even show up. You can see in thread creation benchmarks, such as "thread ring" that GHC is very, very good.
Doesn't the concept of lightweight threads prevent us from using the benefints of multicore architectures?
Not at all. GHC has been running on multicores since 2004. The current status of the multicore runtime is tracked here.
How does it do it? The best place to read up on this architecture is in the paper, "Runtime Support for Multicore Haskell":
The GHC runtime system supports millions of lightweight threads
by multiplexing them onto a handful of operating system threads,
roughly one for each physical CPU. ...
Haskell threads are executed by a set of operating system
threads, which we call worker threads. We maintain roughly one
worker thread per physical CPU, but exactly which worker thread
may vary from moment to moment ...
Since the worker thread may change, we maintain exactly one
Haskell Execution Context (HEC) for each CPU. The HEC is a
data structure that contains all the data that an OS worker thread
requires in order to execute Haskell threads
You can monitor your threads being created, and where they're executing, via threadscope.. Here, e.g. running the binary-trees benchmark:
Creating 1000 processes is relatively light weight; don't worry about doing it. As for performance, you should just benchmark it.
As has been pointed out before, multiple cores work just fine. Several Haskell threads can run at the same time by being scheduled on different OS threads.