Access Violation Exception mystery

2019-02-07 20:26发布

问题:

I've been working with EMGU+OpenCV for quite some time and ran into this AccessViolationException mystery.

First thing first, the code:

class AVE_Simulation
    {
        public static int Width = 7500;
        public static int Height = 7500;
        public static Emgu.CV.Image<Rgb, float>[] Images;

        static void Main(string[] args)
        {
            int N = 50;
            int Threads = 5;

            Images = new Emgu.CV.Image<Rgb, float>[N];
            Console.WriteLine("Start");

            ParallelOptions po = new ParallelOptions();
            po.MaxDegreeOfParallelism = Threads;
            System.Threading.Tasks.Parallel.For(0, N, po, new Action<int>((i) =>
            {
                Images[i] = GetRandomImage();
                Console.WriteLine("Prossing image: " + i);
                Images[i].SmoothBilatral(15, 50, 50);
                GC.Collect();
            }));
            Console.WriteLine("End");
        }

        public static Emgu.CV.Image<Rgb, float> GetRandomImage()
        {
            Emgu.CV.Image<Rgb, float> im = new Emgu.CV.Image<Rgb, float>(Width, Height);

            float[, ,] d = im.Data;
            Random r = new Random((int)DateTime.Now.Ticks);

            for (int y = 0; y < Height; y++)
            {
                for (int x = 0; x < Width; x++)
                {
                    d[y, x, 0] = (float)r.Next(255);
                    d[y, x, 1] = (float)r.Next(255);
                    d[y, x, 2] = (float)r.Next(255);
                }
            }
            return im;
        }

    }

The code is simple. Allocate an array of images. Generate a random image and populate it with random numbers. Execute bilateral filter over the image. That's it.

If I execute this program in a single thread, (Threads=1) everything seems to work normally with no problem. But, if I raise the number of concurrent threads to 5 I get an AccessViolationException very quickly.

I've went over OpenCV code and verified that there are no allocations on the OpenCV side and also went over the EMGU code searching for un-pinned objects or other problems and everything seems correct.

Some notes:

  1. If you remove the GC.Collect() you will get the AccessViolationException less often but it will eventually happen.
  2. This happens only when executed in Release mode. In Debug mode I didn't experience any exceptions.
  3. Although each Image is 675MB there is no problem with allocation (I have ALLOT of memory) and a 'OutOfMemoryException' is thrown in case the system ran out of memory.
  4. I used bilateral filter but I get this exception with other filters/functions as well.

Any help would be appreciated. I've been trying to fix this for more than a week.

i7 (no overclock), Win7 64bit, 32GB RAM, VS 2010, Framework 4.0, OpenCV 2.4.3

Stack:

Start
Prossing image: 20
Prossing image: 30
Prossing image: 40
Prossing image: 0
Prossing image: 10
Prossing image: 21

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Emgu.CV.CvInvoke.cvSmooth(IntPtr src, IntPtr dst, SMOOTH_TYPE type, Int32 param1, Int32 param2, Double param3, Double param4)
   at TestMemoryViolationCrash.AVE_Simulation.<Main>b__0(Int32 i) in C:\branches\1.1\TestMemoryViolationCrash\Program.cs:line 32
   at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass10.<ExecuteSelfReplicating>b__f(Object param0)
   at System.Threading.Tasks.Task.Execute()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at System.Threading.Tasks.ThreadPoolTaskScheduler.TryExecuteTaskInline(Task task, Boolean taskWasPreviouslyQueued)
   at System.Threading.Tasks.TaskScheduler.TryRunInline(Task task, Boolean taskWasPreviouslyQueued)
   at System.Threading.Tasks.Task.InternalRunSynchronously(TaskScheduler scheduler, Boolean waitForCompletion)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 loc
alInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.For(Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body)
   at TestMemoryViolationCrash.AVE_Simulation.Main(String[] args) in C:\branches\1.1\TestMemoryViolationCrash\Program.cs:line 35

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Emgu.CV.CvInvoke.cvSmooth(IntPtr src, IntPtr dst, SMOOTH_TYPE type, Int32 param1, Int32 param2, Double param3, Double param4)
   at TestMemoryViolationCrash.AVE_Simulation.<Main>b__0(Int32 i) in C:\branches\1.1\TestMemoryViolationCrash\Program.cs:line 32
   at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass10.<ExecuteSelfReplicating>b__f(Object param0)
   at System.Threading.Tasks.Task.Execute()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Emgu.CV.CvInvoke.cvSmooth(IntPtr src, IntPtr dst, SMOOTH_TYPE type, Int32 param1, Int32 param2, Double param3, Double param4)
   at TestMemoryViolationCrash.AVE_Simulation.<Main>b__0(Int32 i) in C:\branches\1.1\TestMemoryViolationCrash\Program.cs:line 32
   at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass10.<ExecuteSelfReplicating>b__f(Object param0)
   at System.Threading.Tasks.Task.Execute()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean bPreventDoubleExecution)
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
Press any key to continue . . .

回答1:

Your example doesn't keep a reference to the result image from Image.SmoothBilatral. The input images are rooted in a static array so are fine.

An Emgu.CV Image's data array is pinned to a GCHandle inside the actual image, this is no different from the fact that image contains the array and doesn't prevent collection while the GCHandle's pointer is in use by unmanaged code (in the abscence of a managed root to the image).

Because the Image.SmoothBilatral method doesn't do anything with its temporary result image other than pass its pointer and return it, I think it gets optimised away to the extent that the result image can be collected while the smooth is processing.

Because there's no finalizer for this class, opencv will not get called upon to release it's unmanaged image header (which has a pointer to the managed image data) so opencv still thinks it has a usable image structure.

You can fix it by taking a reference to the result of SmoothBilatral and doing something with it (like disposing it).

This extension method would also work (i.e. allow it to be called successfuly for benchmarking without the result being used):

public static class BilateralExtensionFix
{
    public static Emgu.CV.Image<testchannels, testtype> SmoothBilateral(this Emgu.CV.Image<testchannels, testtype> image, int p1, int p2 , int p3)
    {
        var result = image.CopyBlank();
        var handle = GCHandle.Alloc(result);
        Emgu.CV.CvInvoke.cvSmooth(image.Ptr, result.Ptr, Emgu.CV.CvEnum.SMOOTH_TYPE.CV_BILATERAL, p1, p1, p2, p3);
        handle.Free();
        return result;
    }
}

I think what EmguCV should be doing is only pinning pointers to pass to opencv while making an interop call.

p.s The OpenCv bilateral filter crashes (producing a very similar error to your problem) on any kind of float image passed with zero variation (min() = max()) across all channels. I think because of how it builds it's binned exp() lookup table.

This can be reproduced with:

    // create new blank image
    var zeroesF1 = new Emgu.CV.Image<Rgb, float>(75, 75);
    // uncomment next line for failure
    zeroesF1.Data[0, 0, 0] += 1.2037063600E-035f;
    zeroesF1.SmoothBilatral(15, 50, 50);

This was confusing me as I was actually sometimes getting this error due to a bug in my test code...



回答2:

What version of Emgu CV are you using? I couldn't find a 2.4.3 version of it.

Pretty sure your code is not the problem.

Seems possible that the Emgu.CV.Image constructor might have a concurrency issue (either in the managed wrapper or the unmanaged code). The way the managed data array is handled in the Emgu CV trunk seems solid, there is some unmanaged data allocated during the image constructor which I suppose might have gone wrong.

What happens if you try:

  • Moving Images[i] = GetRandomImage(); outside of the parallel For().
  • Slapping a lock() around the Image constructor in GetRandomImage()

I noticed there's a closed bug report of someone having a similar issues (calls to image constructor occuring in parallel but images themselves not shared among threads) here.

[Edit]

Yes this is a strange one. I can reproduce with the stock 2.4.2 version and OpenCV binaries.

It only seems to crash for me if the number of threads in the parallel for exceeds the number of cores which is >2 for me.. would be interesting to know how many cores are on your test system.

Also I only get the crash when the code is not attached to the debugger and Optimize Code is enabled - have you ever observed it in release mode with the debugger attached?

As the SmoothBilateral function is CPU bound, using MaxDegreeOfParallelism more than the number of cores doesn't really add any benefit so there's a perfect workaround assuming what I found about the number if threads vs cores is also true for your rig (sods law predicts: it isn't).

So my guess is there is a concurrency/volatile issue in Emgu that only manifests when JIT optimisation is run, and when the GC is moving managed data around. But, as you say, there are no obvious unpinned-pointer-to-managed-object issues in the Emgu code.

Although I still can't explain it properly, here's what I found so far:

With the GC.Collect + console logs removed, the calls to GetRandomImage() serialised, and the code run outside of MSVC I couldn't reproduce the issue (although this may have just reduced the frequency):

            public static int Width = 750;
            public static int Height = 750;
...
                int N = 500;
                int Threads = 11;
                Images = new Emgu.CV.Image<Rgb, float>[N];
                Console.WriteLine("Start");
                ParallelOptions po = new ParallelOptions();
                po.MaxDegreeOfParallelism = Threads;
                for (int i = 0; i < N; i++)
                {
                    Images[i] = GetRandomImage();
                }
                System.Threading.Tasks.Parallel.For(0, N, po, new Action<int>((i) =>
                {
                    //Console.WriteLine("CallingSmooth");
                    Images[i].SmoothBilatral(15, 50, 50);
                    //Console.WriteLine("SmoothCompleted");
                }));
                Console.WriteLine("End");

I added a timer to fire GC.Collect outside of the parallel for, but still more often than it would fire normally:

        var t = new System.Threading.Timer((dummy) => { 
            GC.Collect(); 
        }, null, 100,100);

And with this change I still can't reproduce the issue, although GC collect is being called less consistently than in your demo as the thread pool is busy, also there are no (or very few) managed allocations occuring in the main loop for it to collect. Uncommenting the console logs around the SmoothBilatral call then repros the error fairly swiftly (by giving GC something to collect I guess).

[Another edit]

The OpenCV 2.4.2 reference manual states that cvSmooth is deprecated AND that "Median and bilateral filters work with 1- or 3-channel 8-bit images and can not process images in-place."... not very encouraging!

I find that using median filter on byte or float images and bilateral on byte images works fine, and I can't see why any CLR/GC issues woudn't affect those cases too.

So despite the strange effects on the C# test program I still reckon this is an Emgu/OpenCV bug.

If you haven't already, you should test with opencv binaries that you've compiled yourself, if it still fails convert your test to C++.

N.b. that OpenCV has its own parallelism implementation which would probably work out faster.



标签: c# opencv emgucv