I am working on a MATLAB project where I would like to have two instances of MATLAB running in parallel and sharing data. I will call these instances MAT_1
and MAT_2
. More specifically, the architecture of the system is:
MAT_1
processes images sequentially, reading them one by one usingimread
, and outputs the result for each image usingimwrite
.MAT_2
reads the images output byMAT_1
usingimread
and outputs its result somewhere else.
One of the problems I think I need to address is to guarantee that MAT_2
reads an image output by MAT_1
once MAT_1
has fully finished writing to it.
My questions are:
- How would you approach this problem? Do I need to use semaphores or locks to prevent race conditions?
- Does MATLAB provide any mechanism to lock files? (i.e. something similar to
flock
, but provided by MATLAB directly, and that works on multiple platforms, e.g. Windows & Linux). If not, do you know of any third-party library that I can use to build this mechanism in MATLAB?
EDIT :
- As @yoda points out below, the Parallel Computing Toolbox (PCT) allows for blocking calls between MATLAB workers, which is great. That said, I am particularly interested in solutions that do not require the PCT.
Why do I require
MAT_1
andMAT_2
to run in parallel threads?:The processing done in
MAT_2
is slower on average (and more prone to crashing) thanMAT_1
, and the output ofMAT_1
feeds other programs and processes (including human inspection) that do not need to wait forMAT_2
to do its job.
Answers :
- For a solution that allows for the implementation of semaphores but does not rely on the PCT see Jonas' answer below
- For other good approaches to the problem, see Yoda's answer below
I would approach this using semaphores; in my experience the PCT is unreasonably slow at synchronization.
dfacto (another answer) has a great implementation of semaphores for MATLAB, however it will not work on MS Windows; I improved on that work so that it would. The improved work is here: http://www.mathworks.com/matlabcentral/fileexchange/45504-semaphoreposixandwindows
This will be better performing than interfacing with Java, .NET, the PCT, or file locks. This does not use the Parallel Computing Toolbox (PCT), and AFAIK semaphore functionality isn't in the PCT anyway (puzzling that they left it out!). It is possible to use the PCT for synchronization but everything I'd tried in it was unreasonably slow.
To install this high-performance semaphore library into MATLAB, run this within the MATLAB interpreter: mex -O -v semaphore.c
You'll need a C++ compiler installed to compile semaphore.c into a binary MEX-file. That MEX-file is then callable from your MATLAB code as shown in the example below.
Usage example:
I am not sure if you are looking for matlab-only solution but I have just submitted a semaphore wrapper for use in Matlab. It works as a generic semaphore, but it was mainly designed with sharedmatrix in mind.
As soon as Mathworks accepts the submission, I will update the link on my research group's blog.
Please note that this mex file is a wrapper for the POSIX semaphore functionality. As such it will work in Linux, Unix, MacOS but will not work out-of-the-box on Windows. It may work when compiled against cygwin libraries.
Personally, I'd use the parallel processing toolbox for this.
As far as I know, there is no straightforward way in Matlab to have systemwide file locks. However, in order to ensure that Matlab #2 only reads output of Matlab #1 when the file has finished writing, I suggest that after writing e.g the file
results_1.mat
, Matlab #1 writes a second file,results_1.finished
, which is an empty text file. Since the second file is written after the first, its existence signals that the results-file has been written. You can thus search for files with the extensionfinished
, i.e.dir('*.finished')
, and usefileparts
to get the name of the .mat file you'd like to load with Matlab #2.I would also recommend to look at the parallel processing toolbox for such a thing, the functionality you want should be in there somewhere. I think it's cleaner that way than trying to synchronize two instances of MATLAB (unless you are forced to use two instances).
In the odd case that there is no such thing, you might also look at different environments to implement what you want. It might be a bit of a workaround, but you can always interface your MATLAB code with other languages (e.g. Java, .NET, C, ...) and use the functionality you are accustomed to there. With Java you are quite sure that your solution is platform independent, .NET only works on Windows (at least in combination with MATLAB).
I dont think there is a fool-proof way other than using the OS specific locks. One approach might be to have MAT_1 do:
And have MAT_2 only process completedFileName.
EDIT:
After seeing your edit, a simple solution not involving the use of any toolboxes is the following:
Since
MAT_2
is much slower thanMAT_1
, startMAT_2
with a delay. i.e., start it whenMAT_1
has finished processing say 5 images or so. If you do this,MAT_2
will never catch up withMAT_1
and hence will never be in a situation where it has to "wait" for images fromMAT_1
.I'm still not clear on a few things from your question:
MAT_1
processes images sequentially, but does it have to? In other words, does the order in which they are processed matter?MAT_2
reads the output fromMAT_1
... Does it have to be in the order thatMAT_1
finishes or can that be any order?MAT_2
reads the image usingimread
and outputs it some where else. Is there any reason that task cannot be combined intoMAT_1
?In any case, you can implement some form of execution blocking using the parallel computing toolbox; but instead of using
parfor
loops (which is what most people use), you'll have to create a distributed job (example).The important thing to note is that each worker (lab) has a
labindex
, and you can uselabSend
to send data from worker 1 (equivalent ofMAT_1
) to worker 2 (equivalent ofMAT_2
), who then receives it usinglabReceive
. From the documentation onlabReceive
:which is pretty much what you wanted to do with
MAT_1
andMAT_2
.Another way to do this would be to spawn one additional worker in your current session, but only assign tasks performed by
MAT_1
to it. You then set theFinishedFcn
property for the tasks to execute the set of functions performed byMAT_2
, but I wouldn't recommend it as I don't think this was the intent forFinishedFcn
, and I don't know if it will break in certain cases.