Maximum pool size when using ThreadPool Python

2019-05-24 04:16发布

问题:

I am using ThreadPool to achieve multiprocessing. When using multiprocessing, pool size limit should be equivalent to number of CPU cores. My question- When using ThreadPool, should the pool size limit be number of CPU cores?

This is my code

from multiprocessing.pool import ThreadPool as Pool
class Subject():
    def __init__(self, url):
       #rest of the code
   def func1(self):
      #returns something
if __name__=="__main__":
   pool_size= 11
   pool= Pool(pool_size)
   objects= [Subject() for url in all_my_urls]
   for obj in objects:
     pool.apply_async(obj.func1, ())
   pool.close()
   pool.join()

What should be the maximum pool size be? Thanks in advance.

回答1:

You cannot use threads for multiprocessing, you can only achieve multithreading. Multiple threads cannot run concurrently in a single Python process because of the GIL and so multithreading is only useful if they are running IO heavy work (e.g. talking to the Internet) where they spend a lot of time waiting, rather than CPU heavy work (e.g. maths) which constantly occupies a core.

So if you have many IO heavy tasks running at once then having that many threads will be useful, even if it's more than the the number of CPU cores. A very large number threads will eventually have a negative impact on performance, but until you actually measure a problem don't worry. Something like 100 threads should be fine.