Semaphores and locks in MATLAB

2019-02-21 13:26发布

问题:

I am working on a MATLAB project where I would like to have two instances of MATLAB running in parallel and sharing data. I will call these instances MAT_1 and MAT_2. More specifically, the architecture of the system is:

  1. MAT_1 processes images sequentially, reading them one by one using imread, and outputs the result for each image using imwrite.
  2. MAT_2 reads the images output by MAT_1 using imread and outputs its result somewhere else.

One of the problems I think I need to address is to guarantee that MAT_2 reads an image output by MAT_1 once MAT_1 has fully finished writing to it.

My questions are:

  1. How would you approach this problem? Do I need to use semaphores or locks to prevent race conditions?
  2. Does MATLAB provide any mechanism to lock files? (i.e. something similar to flock, but provided by MATLAB directly, and that works on multiple platforms, e.g. Windows & Linux). If not, do you know of any third-party library that I can use to build this mechanism in MATLAB?

EDIT :

  • As @yoda points out below, the Parallel Computing Toolbox (PCT) allows for blocking calls between MATLAB workers, which is great. That said, I am particularly interested in solutions that do not require the PCT.
  • Why do I require MAT_1 and MAT_2 to run in parallel threads?:

    The processing done in MAT_2 is slower on average (and more prone to crashing) than MAT_1, and the output of MAT_1 feeds other programs and processes (including human inspection) that do not need to wait for MAT_2 to do its job.

Answers :

  • For a solution that allows for the implementation of semaphores but does not rely on the PCT see Jonas' answer below
  • For other good approaches to the problem, see Yoda's answer below

回答1:

Personally, I'd use the parallel processing toolbox for this.

As far as I know, there is no straightforward way in Matlab to have systemwide file locks. However, in order to ensure that Matlab #2 only reads output of Matlab #1 when the file has finished writing, I suggest that after writing e.g the file results_1.mat, Matlab #1 writes a second file, results_1.finished, which is an empty text file. Since the second file is written after the first, its existence signals that the results-file has been written. You can thus search for files with the extension finished, i.e. dir('*.finished'), and use fileparts to get the name of the .mat file you'd like to load with Matlab #2.



回答2:

I would approach this using semaphores; in my experience the PCT is unreasonably slow at synchronization.

dfacto (another answer) has a great implementation of semaphores for MATLAB, however it will not work on MS Windows; I improved on that work so that it would. The improved work is here: http://www.mathworks.com/matlabcentral/fileexchange/45504-semaphoreposixandwindows

This will be better performing than interfacing with Java, .NET, the PCT, or file locks. This does not use the Parallel Computing Toolbox (PCT), and AFAIK semaphore functionality isn't in the PCT anyway (puzzling that they left it out!). It is possible to use the PCT for synchronization but everything I'd tried in it was unreasonably slow.

To install this high-performance semaphore library into MATLAB, run this within the MATLAB interpreter: mex -O -v semaphore.c

You'll need a C++ compiler installed to compile semaphore.c into a binary MEX-file. That MEX-file is then callable from your MATLAB code as shown in the example below.

Usage example:

function Example()
    semkey=1234;
    semaphore('create',semkey,1);
    funList = {@fun,@fun,@fun};
    parfor i=1:length(funList)
        funList{i}(semkey);
    end
end
function fun(semkey)
    semaphore('wait',semkey)
    disp('hey');
    semaphore('post',semkey)
end


回答3:

I am not sure if you are looking for matlab-only solution but I have just submitted a semaphore wrapper for use in Matlab. It works as a generic semaphore, but it was mainly designed with sharedmatrix in mind.

As soon as Mathworks accepts the submission, I will update the link on my research group's blog.

Please note that this mex file is a wrapper for the POSIX semaphore functionality. As such it will work in Linux, Unix, MacOS but will not work out-of-the-box on Windows. It may work when compiled against cygwin libraries.



回答4:

I dont think there is a fool-proof way other than using the OS specific locks. One approach might be to have MAT_1 do:

imwrite(fileName);
movefile(fileName, completedFileName);

And have MAT_2 only process completedFileName.



回答5:

EDIT:

After seeing your edit, a simple solution not involving the use of any toolboxes is the following:

Since MAT_2 is much slower than MAT_1, start MAT_2 with a delay. i.e., start it when MAT_1 has finished processing say 5 images or so. If you do this, MAT_2 will never catch up with MAT_1 and hence will never be in a situation where it has to "wait" for images from MAT_1.


I'm still not clear on a few things from your question:

  1. You say MAT_1 processes images sequentially, but does it have to? In other words, does the order in which they are processed matter?
  2. You say MAT_2 reads the output from MAT_1... Does it have to be in the order that MAT_1 finishes or can that be any order?
  3. You say MAT_2 reads the image using imread and outputs it some where else. Is there any reason that task cannot be combined into MAT_1?

In any case, you can implement some form of execution blocking using the parallel computing toolbox; but instead of using parfor loops (which is what most people use), you'll have to create a distributed job (example).

The important thing to note is that each worker (lab) has a labindex, and you can use labSend to send data from worker 1 (equivalent of MAT_1) to worker 2 (equivalent of MAT_2), who then receives it using labReceive. From the documentation on labReceive:

This function blocks execution in the lab until the corresponding call to labSend occurs in the sending lab.

which is pretty much what you wanted to do with MAT_1 and MAT_2.

Another way to do this would be to spawn one additional worker in your current session, but only assign tasks performed by MAT_1 to it. You then set the FinishedFcn property for the tasks to execute the set of functions performed by MAT_2, but I wouldn't recommend it as I don't think this was the intent for FinishedFcn, and I don't know if it will break in certain cases.



回答6:

I would also recommend to look at the parallel processing toolbox for such a thing, the functionality you want should be in there somewhere. I think it's cleaner that way than trying to synchronize two instances of MATLAB (unless you are forced to use two instances).

In the odd case that there is no such thing, you might also look at different environments to implement what you want. It might be a bit of a workaround, but you can always interface your MATLAB code with other languages (e.g. Java, .NET, C, ...) and use the functionality you are accustomed to there. With Java you are quite sure that your solution is platform independent, .NET only works on Windows (at least in combination with MATLAB).