How can I process images with OpenCV in parallel u

I have a set of images in a folder that I want to preprocess using some OpenCV functions. The function

detectAndaligncrop

takes an image path preprocesses it using OpenCV and returns the utput image. I am able to do it using:

for image_path in files_list:
   cropped_image, _=detectAndaligncrop(im)
   cv2.imwrite("ouput_folder/{}".format(os.path.basename(image_path)),cropped_im*255.)

However this is not working:

jobs=[]
for im_no, im in enumerate(files_list):
    p=multiprocessing.Process(target=saveIm,args=[im])
    jobs.append(p)
    p.start()
for j in jobs:
    j.join()

where saveIm is:

im,lm=detectAndaligncrop(im_path)
        fname="output_path/cropped2/{}".format(os.path.basename(im_path))
        cv2.imwrite(fname,im)

I have verified that it calls the detectAndaligncrop function, but does not process image starting from the line where

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

is called inside detectAndaligncrop, because "before cvtColor" is called for every image, while "after cvtColor" is not:

def detectAndaligncrop(impath):
    image=cv2.imread(impath)
    image_float=np.float32(image)/255.0
    print ("before cvtcolor")
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    print ("after cvtcolor")
    return gray, 1

Also,I tried:

with ThreadPoolExecutor(max_workers=32) as execr:
    res=execr.map(saveIm,files_list)

This works but no faster than simply running a for loop. Is it because of GIL?

标签： python multithreading opencv python-multiprocessing

2条回答

Deceive 欺骗

2楼-- · 2019-06-04 06:54

I was in need of a multiprocessing approach to pre-process images before feeding them to neural networks. I came across this page called Embarrassingly parallel for loops where mathematical tasks were being run for elements in an array/list in parallel. I wanted to know if this could be extended to images (after all images are nothing but arrays, big 3D arrays!)

I decided to perform the add weighted operation from OpenCV to a collection of images. Using this operation you can apply different weights to two images and add them. It is used for blending images as you can see here

I performed this function with and without joblib for a set of images on my desktop and compared their performances. In the end I have mentioned the number of images and the collective size of these images used.

Code:

import os
import time

#--- Importing the required library ---
from joblib import delayed

#--- Choosing all available image formats of images from my desktop ---
path = r'C:\Users\Jackson\Desktop'
img_formats = ['.png', '.jpg', '.jpeg']

#--- Defining the addWeighted function from OpenCV ---
def weight(im):
    addweighted = cv2.addWeighted(im, 0.7, cv2.GaussianBlur(im, (15, 15), 0), 0.3, 0)
    return addweighted


#--- Using joblib library-----
start_time = time.time()

new_dir = os.path.join(path, 'add_Weighted_4_joblib')
if not os.path.exists(new_dir):
    os.makedirs(new_dir)

def joblib_loop():
    for f in os.listdir(path):
        if any(c in f for c in img_formats):
            img = cv2.imread(os.path.join(path, f))
            r = delayed(weight)(img)
            cv2.imwrite(os.path.join(new_dir, f + '_add_weighted_.jpg'), r)

elapsed_time = time.time() - start_time
print('Using Joblib : ', elapsed_time)

#--- Without joblib ---
start_time = time.time()

#--- Check whether directory exists if not make one
new_dir = os.path.join(path, 'add_Weighted_4')
if not os.path.exists(new_dir):
    os.makedirs(new_dir)

for f in os.listdir(path):
    if any(c in f for c in img_formats):
        img = cv2.imread(os.path.join(path, f))
        r = weight(img)
        cv2.imwrite(os.path.join(new_dir, f + '_add_weighted_.jpg'), r)

elapsed_time = time.time() - start_time
print('Without Joblib : ', elapsed_time)

Here is the result I got:

('Using Joblib : ', 0.09400010108947754)
('Without Joblib : ', 15.386000156402588)

As you can see using joblib speeds up operations like crazy!!

Now let me show you how many images are present on my desktop and what is the total size of them:

overall_size = 0
count = 0
#for f in os.listdir(path):
for  f in os.listdir(path):
    if any(c in f for c in img_formats):
        img = cv2.imread(os.path.join(path, f))
        overall_size+= img.size
        count+= 1

print('Collective Size of all {} images in the predefined path is {} MB'.format(count, overall_size/10**6))

and the result:

Collective size of all 14 images in the predefined path is 58 MB

0人赞添加讨论(0) 举报

看我几分像从前

3楼-- · 2019-06-04 06:55

After a few experiments found the error: Basically, the error is in the method to convert the read image into a grayscale one. If I use :

gray = cv2.imread(impath,0)

instead of

image = cv2.imread(impath)
image_float = np.float32(image)/255.0
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

the code works fine,

Perhaps there is some problem in using cv2.cvtColor in MultiProcessing. Someone can shed light on the reasons. Is it about picklability?

0人赞添加讨论(0) 举报

How can I process images with OpenCV in parallel u

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间