I have a set of images in a folder that I want to preprocess using some OpenCV functions. The function
detectAndaligncrop
takes an image path preprocesses it using OpenCV and returns the utput image. I am able to do it using:
for image_path in files_list:
cropped_image, _=detectAndaligncrop(im)
cv2.imwrite("ouput_folder/{}".format(os.path.basename(image_path)),cropped_im*255.)
However this is not working:
jobs=[]
for im_no, im in enumerate(files_list):
p=multiprocessing.Process(target=saveIm,args=[im])
jobs.append(p)
p.start()
for j in jobs:
j.join()
where saveIm is:
im,lm=detectAndaligncrop(im_path)
fname="output_path/cropped2/{}".format(os.path.basename(im_path))
cv2.imwrite(fname,im)
I have verified that it calls the detectAndaligncrop function, but does not process image starting from the line where
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
is called inside detectAndaligncrop, because "before cvtColor" is called for every image, while "after cvtColor" is not:
def detectAndaligncrop(impath):
image=cv2.imread(impath)
image_float=np.float32(image)/255.0
print ("before cvtcolor")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
print ("after cvtcolor")
return gray, 1
Also,I tried:
with ThreadPoolExecutor(max_workers=32) as execr:
res=execr.map(saveIm,files_list)
This works but no faster than simply running a for loop. Is it because of GIL?
I was in need of a multiprocessing approach to pre-process images before feeding them to neural networks. I came across this page called Embarrassingly parallel for loops where mathematical tasks were being run for elements in an array/list in parallel. I wanted to know if this could be extended to images (after all images are nothing but arrays, big 3D arrays!)
I decided to perform the add weighted operation from OpenCV to a collection of images. Using this operation you can apply different weights to two images and add them. It is used for blending images as you can see here
I performed this function with and without joblib for a set of images on my desktop and compared their performances. In the end I have mentioned the number of images and the collective size of these images used.
Code:
Here is the result I got:
As you can see using
joblib
speeds up operations like crazy!!Now let me show you how many images are present on my desktop and what is the total size of them:
and the result:
After a few experiments found the error: Basically, the error is in the method to convert the read image into a grayscale one. If I use :
instead of
the code works fine,
Perhaps there is some problem in using cv2.cvtColor in MultiProcessing. Someone can shed light on the reasons. Is it about picklability?