I am trying to understand threading in Python. I've looked at the documentation and examples, but quite frankly, many examples are overly sophisticated and I'm having trouble understanding them.
How do you clearly show tasks being divided for multi-threading?
For me, the perfect example for Threading is monitoring Asynchronous events. Look at this code.
You can play with this code by opening an IPython session and doing something like:
Wait a few minutes
None of the above solutions actually used multiple cores on my GNU/Linux server (where I don't have admin rights). They just ran on a single core. I used the lower level
os.fork
interface to spawn multiple processes. This is the code that worked for me:Since this question was asked in 2010, there has been real simplification in how to do simple multithreading with python with map and pool.
The code below comes from an article/blog post that you should definitely check out (no affiliation) - Parallelism in one line: A Better Model for Day to Day Threading Tasks. I'll summarize below - it ends up being just a few lines of code:
Which is the multithreaded version of:
Description
Implementation
multiprocessing.dummy
is exactly the same as multiprocessing module, but uses threads instead (an important distinction - use multiple processes for CPU-intensive tasks; threads for (and during) IO):And the timing results:
Passing multiple arguments (works like this only in Python 3.3 and later):
To pass multiple arrays:
or to pass a constant and an array:
If you are using an earlier version of Python, you can pass multiple arguments via this workaround.
(Thanks to user136036 for the helpful comment)
I found this very useful: create as many threads as cores and let them execute a (large) number of tasks (in this case, calling a shell program):
NOTE: For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving but are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations).
However, if you are merely looking for interleaving (or are doing I/O operations that can be parallelized despite the global interpreter lock), then the threading module is the place to start. As a really simple example, let's consider the problem of summing a large range by summing subranges in parallel:
Note that the above is a very stupid example, as it does absolutely no I/O and will be executed serially albeit interleaved (with the added overhead of context switching) in CPython due to the global interpreter lock.