I am writing a simple code of cropping images and saving it.
But the problem is that the number of images is about 150,000+ and I want to improve the speed.
So, at first I wrote a code with simple for loops, like the following:
import cv2
import numpy
import sys
textfile=sys.argv[1]
file_list=open(textfile)
files=file_list.read().split('\n')
idx=0
for eachfile in files:
image=cv2.imread(eachfile)
idx+=1
if image is None:
pass
outName=eachfile.replace('/data','/changed_data')
if image.shape[0]==256:
image1=image[120:170,120:170]
elif image.shape[0]==50:
image1=image
cv2.imwrite(outName,image1)
print idx,outName
This code took about 38 seconds for 90000 images. But, using dual core took more time than single process, about 48 seconds for the same 90000 images.
import cv2
import sys
import numpy
from multiprocessing import Pool
def crop(eachfile):
image=cv2.imread(eachfile)
idx+=1
if image is None:
pass
outName=eachfile.replace('/data','/changed_data')
if image.shape[0]==256:
image1=image[120:170,120:170]
elif image.shape[0]==50:
image1=image
cv2.imwrite(outName,image1)
print idx,outName
if __name__=='__main__':
textfile=sys.argv[1]
file_list=open(textfile)
files=file_list.read().split('\n')
pool=Pool(2)
pool.map(crop,files)
Am I doing the right thing for speeding up the process? Or should I split the list and send each list to the process?
Any comments regard my code would be great!!!
Thanks in advance!!!
You should indeed split the task over two cores. Play around with this example code "slightly modified". OP can be found here. Where you see
data
that is your hook providing your images. The defs don't work under class when usingmultiprocessing
... If your trying to use pathos...you'll get errors from cPickle... some nagging issue with latest 2.7 version. Doesn't occur in 3.5 or something. Enjoy!But check also here using kernels and splitting jobs.