Running Python script parallel

I have a huge dataset of videos that I process using a python script called process.py. The problem is it takes a lot of time to process all the dataset which contains 6000 videos. So, I came up with the idea of dividing this dataset for example into 4 and copy the same code to different Python scripts (e.g. process1.py, process2.py, process3.py, process3.py) and run each one on different shells with one portion of the dataset.

My question is would that bring me anything in terms of performance? I have a machine with 10 cores so it would be very beneficial if I could somehow exploit this multicore structure. I heard about multiprocessing module of Python but unfortunately, I don't know much about it and I didn't write my script considering that I would use its features. Is the idea of starting each script in different shells nonsense? Is there a way to choose which core would be used by each script?

标签： python multithreading multiprocessing

1条回答

男人必须洒脱

2楼-- · 2020-06-03 08:20

The multiprocessing documentation ( https://docs.python.org/2/library/multiprocessing.html) is actually fairly easy to digest. This section (https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers) should be particularly relevant

You definitely do not need multiple copy of the same script. This is an approach you can adopt:

Assume it is the general structure of your existing script (process.py).

def convert_vid(fname):
    # do the heavy lifting
    # ...

if __name__ == '__main__':
   # There exists VIDEO_SET_1 to 4, as mentioned in your question
   for file in VIDEO_SET_1:  
       convert_vid(file)

With multiprocessing, you can fire the function convert_vid in seperate processes. Here is the general scheme:

from multiprocessing import Pool

def convert_vid(fname):
    # do the heavy lifting
    # ...

if __name__ == '__main__':
   pool = Pool(processes=4) 
   pool.map(convert_vid, [VIDEO_SET_1, VIDEO_SET_2, VIDEO_SET_3, VIDEO_SET_4])

0人赞添加讨论(0) 举报

Running Python script parallel

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间