Segmentation Faults when Running MEX Files in Para

2019-07-19 14:53发布

问题:

I am currently running repetitions of an experiment that uses MEX files in MATLAB 2012a and occasionally running into segmentation faults that I cannot understand.

Some information about the faults

  • They occur randomly

  • They only occur when I run multiple repetitions of my experiment in parallel on a Linux machine using a parfor loop.

  • They do not occur when I run multiple repetitions of my experiment in parallel on Mac OSX 10.7 using a parfor loop.

  • They do not occur when I run or do they occur when I run the repetitions sequentially.

  • They seem to occur far less frequently when I run 2 experiments in parallel - as opposed to 12 experiments in parallel.

Some information about my MEX file:

  • It is written in C

  • It uses the IBM CPLEX 12.4 API (this is thread-safe)

  • It was compiled using GCC 4.6.3

My thoughts are that there may be some issue in accessing the MEX file in multiple cores. Can anyone shed any light on what might be going on or suggest a fix? I'd be happy to provide more information as necessary.

回答1:

I've recently sent a stack trace to the people at MATLAB and it turns out that the culprit is not my code but one of the functions from the CPLEX 12.4 API. It turns out that this function uses the putenv() function in C which is not necessarily thread-safe.

Unfortunately, I have to keep using this function and the API so I've posted a follow-up thread that focuses on finding ways to avoid this fault.

Any advice would be appreciated.



回答2:

My thoughts are that there may be some issue in accessing the MEX file in multiple cores.

It's much more likely that your MEX file has a bug. Various bugs (which are very easy to make in C), such as accessing dangling memory, double-free()ing, or writing past the end of allocated array, will cause intermittent SIGSEGV.

Your best bet is to run Matlab under a debugger, and see where it crashes.