How can I throttle Python threads?

2020-07-10 07:16发布

问题:

I have a thread doing a lot of CPU-intensive processing, which seems to be blocking out other threads. How do I limit it?

This is for web2py specifically, but a general solution would be fine.

回答1:

I actually just ended up diving into this issue not long ago, you wont be able to change the thread priority but there are ways around this.

To give you a bit of background on the problem, in the cPython implementation CPU bound threads can cause other threads to starve because of the way the Global Interpreter Lock or GIL is released and acquired. Oddly enough this problem is made worse in a multicore environment. A really detailed analysis and presentation on this issue was done by David Beazley which you can find at http://www.dabeaz.com/python/GIL.pdf. He has several blog posts that go into more detail. They're long but quite fascinating.

The short version is that the CPU bound thread releases and reacquires the GIL before the other threads can be woken up to grab it. Resulting in the CPU bound thread holding the GIL for more than 90% of the time.

There are some patterns you can use to work around this issue. For example you can run your CPU bound tasks in a completely different process. This will allow the operating system scheduler to manage resource sharing a lot better and should allow your web2py threads to continue to run since operating systems actually give preferential treatment to IO bound threads. The multiprocessing library is provided for cases such as this. It will require some more code to get it working but that should help.



回答2:

Which version of Python are you using? In 3.2, the GIL was changed to yield after fixed timeslices rather than after a certain number of high-level opcodes.

Even with that change, running CPU-intensive code can affect the latency of your web app (and conversely, the IO-sensitive part will prevent the CPU-intensive part from occupying an entire core). You should just spin off tasks to worker processes using a queue like beanstalkd, and let the OS scheduler do its thing.