Grid engine cluster + OpenCV: strange behaviour

2019-08-12 05:19发布

I'm using a Grid Engine cluster for running some OpenCV code. The code runs well when executed locally, but when submitted to the grid it's not working. I extracted here a minimal example.

In the directory ~/code/ I have a file test.cpp containing the following code:

#include <opencv2/core.hpp>
#include <iterator>
#include <string>
#include <sys/types.h>
#include <sys/stat.h>
using namespace cv;
using namespace std;


int main(int ac, char** av)
{    
    /// Create a random matrix
    Mat M;

    /// Create a subfolder
    string folderName = "sub/";
    mkdir(folderName.c_str(),0777);

    return 0;
}

The code is compiled without errors.

When executing locally, i.e.

username@machine:~/code$ ./test

it creates a subfolder, i.e. ~/code/sub, as expected.

For submitting to the grid, I created a job script job.sh in the home directory (i.e. ~/job.sh) containing

cd code/
./test

and then submit using

qsub job.sh

Nothing happened. (And no errors).

However, when I removed the line

Mat M;

it did create the folder as expected.

What are the possible reasons for this behaviour? I'm thinking of something like the shared libs of OpenCV weren't installed in other computers of the grid, but I'm not sure and I don't know how to verify that.

Thank you in advance for any suggestions.

1条回答
Bombasti
2楼-- · 2019-08-12 05:21

The libraries need to be accessible to all execution nodes in queue you want to submit job to. If execution nodes have access to shared location, such as NFS mount, you can install the libraries there. Otherwise, you need to install required libs on all execution nodes. Additional link regarding SET_LIB_PATH:

blogs.oracle.com/templedf/entry/inheriting_job_environment

While this would help point to right location, the libraries still need to be accessible

查看更多
登录 后发表回答