I'm using a Grid Engine cluster for running some OpenCV code. The code runs well when executed locally, but when submitted to the grid it's not working. I extracted here a minimal example.
In the directory ~/code/
I have a file test.cpp
containing the following code:
#include <opencv2/core.hpp>
#include <iterator>
#include <string>
#include <sys/types.h>
#include <sys/stat.h>
using namespace cv;
using namespace std;
int main(int ac, char** av)
{
/// Create a random matrix
Mat M;
/// Create a subfolder
string folderName = "sub/";
mkdir(folderName.c_str(),0777);
return 0;
}
The code is compiled without errors.
When executing locally, i.e.
username@machine:~/code$ ./test
it creates a subfolder, i.e. ~/code/sub
, as expected.
For submitting to the grid, I created a job script job.sh
in the home directory (i.e. ~/job.sh
) containing
cd code/
./test
and then submit using
qsub job.sh
Nothing happened. (And no errors).
However, when I removed the line
Mat M;
it did create the folder as expected.
What are the possible reasons for this behaviour? I'm thinking of something like the shared libs of OpenCV weren't installed in other computers of the grid, but I'm not sure and I don't know how to verify that.
Thank you in advance for any suggestions.
The libraries need to be accessible to all execution nodes in queue you want to submit job to. If execution nodes have access to shared location, such as NFS mount, you can install the libraries there. Otherwise, you need to install required libs on all execution nodes. Additional link regarding SET_LIB_PATH:
blogs.oracle.com/templedf/entry/inheriting_job_environment
While this would help point to right location, the libraries still need to be accessible