.NET C# unsafe/fixed doesn't pin passthrough a

2019-01-28 10:48发布

问题:

I have some concurrent code which has an intermittent failure and I've reduced the problem down to two cases which seem identical, but where one fails and the other doesn't.

I've now spent way too much time trying to create a minimal, complete example that fails, but without success, so I'm just posting the lines that fail in case anyone can see an obvious problem.

Object lock = new Object();

struct MyValueType { readonly public int i1, i2; };
class Node { public MyValueType x; public int y; public Node z; };
volatile Node[] m_rg = new Node[300];

unsafe void Foo()
{
    Node[] temp;
    while (true)
    {
        temp = m_rg;
        /* ... */
        Monitor.Enter(lock);
        if (temp == m_rg)
            break;
        Monitor.Exit(lock);
    }

#if OK                                      // this works:
    Node cur = temp[33];
    fixed (MyValueType* pe = &cur.x)
        *(long*)pe = *(long*)&e;
#else                                       // this reliably causes random corruption:
    fixed (MyValueType* pe = &temp[33].x)
        *(long*)pe = *(long*)&e;
#endif

    Monitor.Exit(lock);
}

I have studied the IL code and it looks like what's happening is that the Node object at array position 33 is moving (in very rare cases) despite the fact that we are holding a pointer to a value type within it.

It's as if the CLR doesn't notice that we are passing through a heap (movable) object--the array element--in order to access the value type. The 'OK' version has never failed under extended testing on an 8-way machine, but the alternate path fails quickly every time.

  • Is this never supposed to work, and 'OK' version is too streamlined to fail under stress?
  • Do I need to pin the object myself using GCHandle (I notice in the IL that the fixed statement alone is not doing so)?
  • If manual pinning is required here, why is the compiler allowing access through a heap object (without pinning) in this way?

note: This question is not discussing the elegance of reinterpreting the blittable value type in a nasty way, so please, no criticism of this aspect of the code unless it is directly relevant to the problem at hand.. thanks

[edit: jitted asm] Thanks to Hans' reply, I understand better why the jitter is placing things on the stack in what otherwise seem like vacuous asm operations. See [rsp + 50h] for example, and how it gets nulled out after the 'fixed' region. The remaining unresolved question is whether [cur+18h] (lines 207-20C) on the stack is somehow sufficient to protect the access to the value type in a way that is not adequate for [temp+33*IntPtr.Size+18h] (line 24A).

[edit]

summary of conclusions, minimal example

Comparing the two code fragments below, I now believe that #1 is not ok, whereas #2 is acceptable.

(1.) The following fails (on x64 jit at least); GC can still move the MyClass instance if you try to fix it in-situ, via an array reference. There's no place on the stack for the reference of the particular object instance (the array element that needs to be fixed) to be published, for the GC to notice.

struct MyValueType { public int foo; };
class MyClass { public MyValueType mvt; };
MyClass[] rgo = new MyClass[2000];

fixed (MyValueType* pvt = &rgo[1234].mvt)
    *(int*)pvt = 1234;

(2.) But you can access a structure inside a (movable) object using fixed (without pinning) if you provide an explicit reference on the stack which can be advertised to the GC:

struct MyValueType { public int foo; };
class MyClass { public MyValueType mvt; };
MyClass[] rgo = new MyClass[2000];

MyClass mc = &rgo[1234];              // <-- only difference -- add this line
fixed (MyValueType* pvt = &mc.mvt)    // <-- and adjust accordingly here
    *(int*)pvt = 1234;

This is where I'll leave it unless someone can provide corrections or more information...

回答1:

Modifying objects of managed type through fixed pointers can results in undefined behavior
(C# Language specification, chapter 18.6.)

Well, you are doing just that. In spite of the verbiage in the spec and the MSDN library, the fixed keyword does not in fact make the object unmoveable, it doesn't get pinned. You probably found out from looking at the IL. It uses a clever trick by generating a pointer + offset and letting the garbage collector adjust the pointer. I don't have a great explanation why this fails in one case but not the other. I don't see a fundamental difference in the generated machine code. But then I probably didn't reproduce your exact machine code either, the snippet isn't great.

As near as I can tell it should fail in both cases because of the structure member access. That causes the pointer + offset to collapse to a single pointer with a LEA instruction, preventing the garbage collector from recognizing the reference. Structures have always been trouble for the jitter. Thread timing could explain the difference, perhaps.

You could post to connect.microsoft.com for a second opinion. It is however going to be difficult to navigate around the spec violation. If my theory is correct then a read could fail too, much harder to prove though.

Fix it by actually pinning the array with GCHandle.



回答2:

Puzzling over this, and I'm guessing here, it looks like the compiler is taking &temp (fixed pointer to the tmp array) then indexing that with [33]. So you're pinning the temp array, rather than the node. Try...

fixed (MyValueType* pe = &(temp[33]).x)
    *(long*)pe = *(long*)&e;