import threading
threads = []
for n in range(0, 60000):
t = threading.Thread(target=function,args=(x, n))
t.start()
threads.append(t)
for t in threads:
t.join()
It is working well for range up to 800 on my laptop, but if I increase range to more than 800 I get the error can't create new thread
.
How can I control number to threads to get created or any other way to make it work like timeout? I tried using threading.BoundedSemaphore
function but that doesn't seem to work properly.
The problem is that no major platform (as of mid-2013) will let you create anywhere near this number of threads. There are a wide variety of different limitations you could run into, and without knowing your platform, its configuration, and the exact error you got, it's impossible to know which one you ran into. But here are two examples:
ulimit
values before you get anywhere near running out of page space. (Linux has a variety of different limits beyond the ones required by POSIX.)Using as many threads as possible is very unlikely to be what you actually want to do. Running 800 threads on an 8-core machine means that you're spending a whole lot of time context-switching between the threads, and the cache keeps getting flushed before it ever gets primed, and so on.
Most likely, what you really want is one of the following:
gevent
.But it's certainly possible.
Once you've hit whichever limit you're hitting, it's very likely that trying again will fail until a thread has finished its job and been joined, and it's pretty likely that trying again will succeed after that happens. So, given that you're apparently getting an exception, you could handle this the same way as anything else in Python: with a
try
/except
block. For example, something like this:Of course this assumes that the first task launched is likely to be the one of the first tasks finished. If this is not true, you'll need some way to explicitly signal doneness (condition, semaphore, queue, etc.), or you'll need to use some lower-level (platform-specific) library that gives you a way to wait on a whole list until at least one thread is finished.
Also, note that on some platforms (e.g., Windows XP), you can get bizarre behavior just getting near the limits.
On top of being a lot better, doing the right thing will probably be a lot simpler as well. For example, here's a process-per-CPU pool:
… and a fixed-thread-count pool:
… and a balancing-CPU-parallelism-with-numpy-vectorization batching pool:
In the examples above, I used a list comprehension to submit all of the jobs and gather their futures, because we're not doing anything else inside the loop. But from your comments, it sounds like you do have other stuff you want to do inside the loop. So, let's convert it back into an explicit
for
statement:And now, whatever you want to add inside that loop, you can.
However, I don't think you actually want to add anything inside that loop. The loop just submits all the jobs as fast as possible; it's the
wait
function that sits around waiting for them all to finish, and it's probably there that you want to exit early.To do this, you can use
wait
with theFIRST_COMPLETED
flag, but it's much simpler to useas_completed
.Also, I'm assuming
error
is some kind of value that gets set by the tasks. In that case, you will need to put aLock
around it, as with any other mutable value shared between threads. (This is one place where there's slightly more than a one-line difference between aProcessPoolExecutor
and aThreadPoolExecutor
—if you use processes, you needmultiprocessing.Lock
instead ofthreading.Lock
.)So:
However, you might want to consider a different design. In general, if you can avoid sharing between threads, your life gets a lot easier. And futures are designed to make that easy, by letting you return a value or raise an exception, just like a regular function call. That
f.result()
will give you the returned value or raise the raised exception. So, you can rewrite that code as:Notice how similar this looks to the ThreadPoolExecutor Example in the docs. This simple pattern is enough to handle almost anything without locks, as long as the tasks don't need to interact with each other.