Python utilizing multiple processors

Lets say I have a big list of music of varying length that needs to be converted or images of varying sizes that need to be resized or something like that. The order doesn't matter so it is perfect for splitting across multiple processors.

If I use multiprocessing.Pool's map function it seems like all the work is divided up ahead of time and doesn't take into account the fact that some files may take longer to do that others.

What happens is that if I have 12 processors... near the end of processing, 1 or 2 processors will have 2 or 3 files left to process while other processors that could be utilized sit idle.

Is there some sort of queue implementation that can keep all processors loaded until there is no more work left to do?

标签： python multithreading batch-file queue multiprocessing

5条回答

Explosion°爆炸

2楼-- · 2020-08-13 08:05

There is a Queue class within the multiprocessing module specifically for this purpose.

Edit: If you are looking for a complete framework for parallel computing which features a map() function using a task queue, have a look at the parallel computing facilities of IPython. In particlar, you can use the TaskClient.map() function to get a load-balanced mapping to the available processors.

0人赞添加讨论(0) 举报

一夜七次

3楼-- · 2020-08-13 08:06

This is not the case if you use Pool.imap_unordered.

0人赞添加讨论(0) 举报

不美不萌又怎样

4楼-- · 2020-08-13 08:10

This is trivial to do with jug:

def process_image(img):
     ....
images = glob('*.jpg')
for im in images:
      Task(process_image, im)

Now, just run jug execute a few times to spawn worker processes.

0人赞添加讨论(0) 举报

够拽才男人

5楼-- · 2020-08-13 08:12

The Python threading library that has brought me most joy is Parallel Python (PP). It is trivial with PP to use a thread pool approach with a single queue to achieve what you need.

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

6楼-- · 2020-08-13 08:17

About queue implementations. There are some.

Look at the Celery project. http://celeryproject.org/

So, in your case, you can run 12 conversions (one on each CPU) as Celery tasks, add a callback function (to the conversion or to the task) and in that callback function add a new conversion task running when one of the previous conversions is finished.

0人赞添加讨论(0) 举报

Python utilizing multiple processors

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间