C# Automatic deep copy of struct

2019-02-21 19:07发布

问题:

I have a struct, MyStruct, that has a private member private bool[] boolArray; and a method ChangeBoolValue(int index, bool Value).

I have a class, MyClass, that has a field public MyStruct bools { get; private set; }

When I create a new MyStruct object from an existing one, and then apply method ChangeBoolValue(), the bool array in both objects is changed, because the reference, not what was referred to, was copied to the new object. E.g:

MyStruct A = new MyStruct();
MyStruct B = A;  //Copy of A made
B.ChangeBoolValue(0,true);
//Now A.BoolArr[0] == B.BoolArr[0] == true

Is there a way of forcing a copy to implement a deeper copy, or is there a way to implement this that will not have the same issue?

I had specifically made MyStruct a struct because it was value type, and I did not want references propagating.

回答1:

The runtime performs a fast memory copy of structs and as far as I know, it's not possible to introduce or force your own copying procedure for them. You could introduce your own Clone method or even a copy-constructor, but you could not enforce that they use them.

Your best bet, if possible, to make your struct immutable (or an immutable class) or redesign in general to avoid this issue. If you are the sole consumer of the API, then perhaps you can just remain extra vigilant.

Jon Skeet (and others) have described this issue and although there can be exceptions, generally speaking: mutable structs are evil. Can structs contain fields of reference types



回答2:

One simple method to make a (deep) copy, though not the fastest one (because it uses reflection), is to use BinaryFormatter to serialize the original object to a MemoryStream and then deserialize from that MemoryStream to a new MyStruct.

    static public T DeepCopy<T>(T obj)
    {
        BinaryFormatter s = new BinaryFormatter();
        using (MemoryStream ms = new MemoryStream())
        {
            s.Serialize(ms, obj);
            ms.Position = 0;
            T t = (T)s.Deserialize(ms);

            return t;
        }
    }

Works for classes and structs.



回答3:

As a workaround, I am going to implement the following.

There are 2 methods in the struct that can modify the contents of BoolArray. Rather than creating the array when the struct is copied, BoolArray will be created anew when a call to change it is made, as follows

public void ChangeBoolValue(int index, int value)
{
    bool[] Copy = new bool[4];
    BoolArray.CopyTo(Copy, 0);
    BoolArray = Copy;

    BoolArray[index] = value;
}

Though this would be bad for any uses that involved much change of the BoolArray, my use of the struct is a lot of copying, and very little changing. This will only change the reference to the array when a change is required.



回答4:

To avoid weird semantics, any struct which holds a field of a mutable reference type must do one of two things:

  1. It should make very clear that, from its perspective, the the content of the field serves not to "hold" an object, but merely to identify one. For example, a `KeyValuePair<String, Control>` would be a perfectly reasonable type, since although `Control` is mutable, the identity of a control referenced by such a type would be immutable.
  2. The mutable object must be one which is created by the value type, will never be exposed outside it. Further, any mutations that will ever be performed upon the immutable object must be performed before a reference to the object is stored into any field of the struct.

As others have noted, one way to allow a struct to simulate an array would be for it to hold an array, and make a new copy of that array any time an element is modified. Such a thing would, of course, be outrageously slow. An alternative approach would be to add some logic to store the indices and values of the last few mutations requests; any time an attempt is made to read the array, check whether the value is one of the recently-written ones and, if so, use the value stored in the struct instead of the one in the array. Once all of the 'slots' within the struct are filled up, make a copy of the array. This approach would at best "only" offer a constant speed up versus regenerating the array if updates hit many different elements, but could be helpful if the extremely vast majority of updates hit a small number of elements.

Another approach when updates are likely to have a high special concentration, but hit too many elements for them to fit entirely within a struct, would be to keep a reference to a "main" array, as well as an "updates" array along with an integer indicating what part of the main array the "updates" array represents. Updates would often require regeneration of the "updates" array, but that could be much smaller than the main array; if the "updates" array gets too big, the main array can be regenerated with changes represented by the "updates" array incorporated within it.

The biggest problem with any of these approaches is that while the struct could be engineered in such a way as to present consistent value-type semantics while allowing efficient copying, a glance at the struct's code would hardly make that obvious (as compared with plain-old-data structs, where the fact that the struct has a public field called Foo makes it very clear how Foo will behave).



回答5:

I was thinking about a similar issue related to value types, and found out a "solution" to this. You see, you cannot change the default copy constructor in C# like you can in C++, because it's intended to be lightweight and side effects-free. However, what you can do is wait until you actually access the struct, and then check if it was copied.

The problem with this is that unlike reference types, structs have no real identity; there is only by-value equality. However, they still have to be stored at some place in memory, and this address can be used to identify (albeit temporarily) a value type. The GC is a concern here, because it can move objects around, and therefore change the address at which the struct is located, so you would have to be able to cope with that (e.g. make the struct's data private).

In practice, the address of the struct can be obtained from the this reference, because it's a simple ref T in case of a value type. I leave the means to obtain the address from a reference to my library, but it's quite simple to emit custom CIL for that. In this example, I create something what is essentially a byval array.

public struct ByValArray<T>
{
    //Backup field for cloning from.
    T[] array;

    public ByValArray(int size)
    {
        array = new T[size];
        //Updating the instance is really not necessary until we access it.
    }

    private void Update()
    {
        //This should be called from any public method on this struct.
        T[] inst = FindInstance(ref this);
        if(inst != array)
        {
            //A new array was cloned for this address.
            array = inst;
        }
    }

    //I suppose a GCHandle would be better than WeakReference,
    //but this is sufficient for illustration.
    static readonly Dictionary<IntPtr, WeakReference<T[]>> Cache = new Dictionary<IntPtr, WeakReference<T[]>>();

    static T[] FindInstance(ref ByValArray<T> arr)
    {
        T[] orig = arr.array;
        return UnsafeTools.GetPointer(
            //Obtain the address from the reference.
            //It uses a lambda to minimize the chance of the reference
            //being moved around by the GC.
            out arr,
            ptr => {
                WeakReference<T[]> wref;
                T[] inst;
                if(Cache.TryGetValue(ptr, out wref) && wref.TryGetTarget(out inst))
                {
                    //An object is found on this address.
                    if(inst != orig)
                    {
                        //This address was overwritten with a new value,
                        //clone the instance.
                        inst = (T[])orig.Clone();
                        Cache[ptr] = new WeakReference<T[]>(inst);
                    }
                    return inst;
                }else{
                    //No object was found on this address,
                    //clone the instance.
                    inst = (T[])orig.Clone();
                    Cache[ptr] = new WeakReference<T[]>(inst);
                    return inst;
                }
            }
        );
    }

    //All subsequent methods should always update the state first.
    public T this[int index]
    {
        get{
            Update();
            return array[index];
        }
        set{
            Update();
            array[index] = value;
        }
    }

    public int Length{
        get{
            Update();
            return array.Length;
        }
    }

    public override bool Equals(object obj)
    {
        Update();
        return base.Equals(obj);
    }

    public override int GetHashCode()
    {
        Update();
        return base.GetHashCode();
    }

    public override string ToString()
    {
        Update();
        return base.ToString();
    }
}

var a = new ByValArray<int>(10);
a[5] = 11;
Console.WriteLine(a[5]); //11

var b = a;
b[5]++;
Console.WriteLine(b[5]); //12
Console.WriteLine(a[5]); //11

var c = a;
a = b;
Console.WriteLine(a[5]); //12
Console.WriteLine(c[5]); //11

As you can see, this value type behaves exactly as if the underlying array was copied to a new location every time the reference to the array is copied.

WARNING!!! Use this code only at your own risk, and preferably never in a production code. This technique is wrong and evil at so many levels, because it assumes identity for something that shouldn't have it. Although this tries to "enforce" value type semantics for this struct ("the end justifies the means"), there are certainly better solutions to the real problem in almost any case. Also please note that although I have tried to foresee any foreseeable issues with this, there could be cases where this type will show quite an unexpected behaviour.