I wonder what is the fastest way to do shallow copying in C#? I only know there are 2 ways to do shallow copy:
- MemberwiseClone
- Copy each field one by one (manual)
I found that (2) is faster than (1). I'm wondering if there's another way to do shallow copying?
This is a complex subject with lots of possible solutions and many pros and cons to each. There is a wonderful article here that outlines several different ways of making a copy in C#. To summarize:
Clone Manually
Tedious, but high level of control.
Clone with MemberwiseClone
Only creates a shallow copy, i.e. for reference-type fields the original object and its clone refer to the same object.
Clone with Reflection
Shallow copy by default, can be re-written to do deep copy. Advantage: automated. Disadvantage: reflection is slow.
Clone with Serialization
Easy, automated. Give up some control and serialization is slowest of all.
Clone with IL, Clone with Extension Methods
More advanced solutions, not as common.
I'm confused. MemberwiseClone()
should annihilate the performance of anything else for a shallow copy. In the CLI, any type other than an RCW should be able to be shallow-copied by the following sequence:
- Allocate memory in the nursery for the type.
memcpy
the data from the original to the new. Since the target is in the nursery, no write barriers are required.
- If the object has a user-defined finalizer, add it to the GC list of items pending finalization.
- If the source object has
SuppressFinalize
called on it and such a flag is stored in the object header, unset it in the clone.
Can someone on the CLR internals team explain why this is not the case?
I'd like to start with a few quotes:
In fact, MemberwiseClone is usually much better than others, especially for complex type.
and
I'm confused. MemberwiseClone() should annihilate the performance of anything else for a shallow copy. [...]
Theoretically the best implementation of a shallow copy is a C++ copy constructor: it knows the size compile-time, and then does a memberwise clone of all fields. The next best thing is using memcpy
or something similar, which is basically how MemberwiseClone
should work. This means, in theory it should obliterate all other possibilities in terms of performance. Right?
... but apparently it isn't blazing fast and it doesn't obliterate all the other solutions. At the bottom I've actually posted a solution that's over 2x faster. So: Wrong.
Testing the internals of MemberwiseClone
Let's start with a little test using a simple blittable type to check the underlying assumptions here about performance:
[StructLayout(LayoutKind.Sequential)]
public class ShallowCloneTest
{
public int Foo;
public long Bar;
public ShallowCloneTest Clone()
{
return (ShallowCloneTest)base.MemberwiseClone();
}
}
The test is devised in such a way that we can check the performance of MemberwiseClone
agaist raw memcpy
, which is possible because this is a blittable type.
To test by yourself, compile with unsafe code, disable the JIT suppression, compile release mode and test away. I've also put the timings after every line that's relevant.
Implementation 1:
ShallowCloneTest t1 = new ShallowCloneTest() { Bar = 1, Foo = 2 };
Stopwatch sw = Stopwatch.StartNew();
int total = 0;
for (int i = 0; i < 10000000; ++i)
{
var cloned = t1.Clone(); // 0.40s
total += cloned.Foo;
}
Console.WriteLine("Took {0:0.00}s", sw.Elapsed.TotalSeconds);
Basically I ran these tests a number of times, checked the assembly output to ensure that the thing wasn't optimized away, etc. The end result is that I know approximately how much seconds this one line of code costs, which is 0.40s on my PC. This is our baseline using MemberwiseClone
.
Implementation 2:
sw = Stopwatch.StartNew();
total = 0;
uint bytes = (uint)Marshal.SizeOf(t1.GetType());
GCHandle handle1 = GCHandle.Alloc(t1, GCHandleType.Pinned);
IntPtr ptr1 = handle1.AddrOfPinnedObject();
for (int i = 0; i < 10000000; ++i)
{
ShallowCloneTest t2 = new ShallowCloneTest(); // 0.03s
GCHandle handle2 = GCHandle.Alloc(t2, GCHandleType.Pinned); // 0.75s (+ 'Free' call)
IntPtr ptr2 = handle2.AddrOfPinnedObject(); // 0.06s
memcpy(ptr2, ptr1, new UIntPtr(bytes)); // 0.17s
handle2.Free();
total += t2.Foo;
}
handle1.Free();
Console.WriteLine("Took {0:0.00}s", sw.Elapsed.TotalSeconds);
If you look closely at these numbers, you'll notice a few things:
- Creating an object and copying it will take roughly 0.20s. Under normal circumstances this is the fastest possible code you can have.
- However, to do that, you need to pin and unpin the object. That will take you 0.81 seconds.
So why is all of this so slow?
My explanation is that it has to do with the GC. Basically the implementations cannot rely on the fact that memory will stay the same before and after a full GC (The address of the memory can be changed during a GC, which can happen at any moment, including during your shallow copy). This means you only have 2 possible options:
- Pinning the data and doing a copy. Note that
GCHandle.Alloc
is just one of the ways to do this, it's well known that things like C++/CLI will give you better performance.
- Enumerating the fields. This will ensure that between GC collects you don't need to do anything fancy, and during GC collects you can use the GC ability to modify the addresses on the stack of moved objects.
MemberwiseClone
will use method 1, which means you'll get a performance hit because of the pinning procedure.
A (much) faster implementation
In all cases our unmanaged code cannot make assumptions about the size of the types and it has to pin data. Making assumptions about size enables the compiler to do better optimizations, like loop unrolling, register allocation, etc. (just like a C++ copy ctor is faster than memcpy
). Not having to pin data means we don't get an extra performance hit. Since .NET JIT's to assembler, in theory this means that we should be able to make a faster implementation using simple IL emitting, and allowing the compiler to optimize it.
So to summarize on why this can be faster than the native implementation?
- It doesn't require the object to be pinned; objects that are moving around are handled by the GC -- and really, this is relentlessly optimized.
- It can make assumptions about the size of the structure to copy, and therefore allows for better register allocation, loop unrolling, etc.
What we're aiming for is the performance of raw memcpy
or better: 0.17s.
To do that, we basically cannot use more than just a call
, create the object, and perform a bunch of copy
instructions. It looks a bit like the Cloner
implementation above, but some important differences (most significant: no Dictionary
and no redundant CreateDelegate
calls). Here goes:
public static class Cloner<T>
{
private static Func<T, T> cloner = CreateCloner();
private static Func<T, T> CreateCloner()
{
var cloneMethod = new DynamicMethod("CloneImplementation", typeof(T), new Type[] { typeof(T) }, true);
var defaultCtor = typeof(T).GetConstructor(new Type[] { });
var generator = cloneMethod .GetILGenerator();
var loc1 = generator.DeclareLocal(typeof(T));
generator.Emit(OpCodes.Newobj, defaultCtor);
generator.Emit(OpCodes.Stloc, loc1);
foreach (var field in typeof(T).GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic))
{
generator.Emit(OpCodes.Ldloc, loc1);
generator.Emit(OpCodes.Ldarg_0);
generator.Emit(OpCodes.Ldfld, field);
generator.Emit(OpCodes.Stfld, field);
}
generator.Emit(OpCodes.Ldloc, loc1);
generator.Emit(OpCodes.Ret);
return ((Func<T, T>)cloneMethod.CreateDelegate(typeof(Func<T, T>)));
}
public static T Clone(T myObject)
{
return cloner(myObject);
}
}
I've tested this code with the result: 0.16s. This means it's approximately 2.5x faster than MemberwiseClone
.
More importantly, this speed is on-par with memcpy
, which is more or less the 'optimal solution under normal circumstances'.
Personally, I think this is the fastest solution - and the best part is: if the .NET runtime will get faster (proper support for SSE instructions etc), so will this solution.
Why complicate things? MemberwiseClone would suffice.
public class ClassA : ICloneable
{
public object Clone()
{
return this.MemberwiseClone();
}
}
// let's say you want to copy the value (not reference) of the member of that class.
public class Main()
{
ClassA myClassB = new ClassA();
ClassA myClassC = new ClassA();
myClassB = (ClassA) myClassC.Clone();
}
This is a way to do it using dynamic IL generation. I found it somewhere online:
public static class Cloner
{
static Dictionary<Type, Delegate> _cachedIL = new Dictionary<Type, Delegate>();
public static T Clone<T>(T myObject)
{
Delegate myExec = null;
if (!_cachedIL.TryGetValue(typeof(T), out myExec))
{
var dymMethod = new DynamicMethod("DoClone", typeof(T), new Type[] { typeof(T) }, true);
var cInfo = myObject.GetType().GetConstructor(new Type[] { });
var generator = dymMethod.GetILGenerator();
var lbf = generator.DeclareLocal(typeof(T));
generator.Emit(OpCodes.Newobj, cInfo);
generator.Emit(OpCodes.Stloc_0);
foreach (var field in myObject.GetType().GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic))
{
// Load the new object on the eval stack... (currently 1 item on eval stack)
generator.Emit(OpCodes.Ldloc_0);
// Load initial object (parameter) (currently 2 items on eval stack)
generator.Emit(OpCodes.Ldarg_0);
// Replace value by field value (still currently 2 items on eval stack)
generator.Emit(OpCodes.Ldfld, field);
// Store the value of the top on the eval stack into the object underneath that value on the value stack.
// (0 items on eval stack)
generator.Emit(OpCodes.Stfld, field);
}
// Load new constructed obj on eval stack -> 1 item on stack
generator.Emit(OpCodes.Ldloc_0);
// Return constructed object. --> 0 items on stack
generator.Emit(OpCodes.Ret);
myExec = dymMethod.CreateDelegate(typeof(Func<T, T>));
_cachedIL.Add(typeof(T), myExec);
}
return ((Func<T, T>)myExec)(myObject);
}
}
In fact, MemberwiseClone is usually much better than others, especially for complex type.
The reason is that:if you manual create a copy, it must call one of the type's constructor, but use memberwise clone, I guess it just copy a block of memory. for those types has very expensive construct actions, memberwise clone is absolutely the best way.
Onece i wrote such type:
{string A = Guid.NewGuid().ToString()}, I found memberwise clone is muct faster than create a new instance and manual assign members.
The code below's result:
Manual Copy:00:00:00.0017099
MemberwiseClone:00:00:00.0009911
namespace MoeCard.TestConsole
{
class Program
{
static void Main(string[] args)
{
Program p = new Program() { AAA = Guid.NewGuid().ToString(), BBB = 123 };
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10000; i++)
{
p.Copy1();
}
sw.Stop();
Console.WriteLine("Manual Copy:" + sw.Elapsed);
sw.Restart();
for (int i = 0; i < 10000; i++)
{
p.Copy2();
}
sw.Stop();
Console.WriteLine("MemberwiseClone:" + sw.Elapsed);
Console.ReadLine();
}
public string AAA;
public int BBB;
public Class1 CCC = new Class1();
public Program Copy1()
{
return new Program() { AAA = AAA, BBB = BBB, CCC = CCC };
}
public Program Copy2()
{
return this.MemberwiseClone() as Program;
}
public class Class1
{
public DateTime Date = DateTime.Now;
}
}
}
finally, I provide my code here:
#region 数据克隆
/// <summary>
/// 依据不同类型所存储的克隆句柄集合
/// </summary>
private static readonly Dictionary<Type, Func<object, object>> CloneHandlers = new Dictionary<Type, Func<object, object>>();
/// <summary>
/// 根据指定的实例,克隆一份新的实例
/// </summary>
/// <param name="source">待克隆的实例</param>
/// <returns>被克隆的新的实例</returns>
public static object CloneInstance(object source)
{
if (source == null)
{
return null;
}
Func<object, object> handler = TryGetOrAdd(CloneHandlers, source.GetType(), CreateCloneHandler);
return handler(source);
}
/// <summary>
/// 根据指定的类型,创建对应的克隆句柄
/// </summary>
/// <param name="type">数据类型</param>
/// <returns>数据克隆句柄</returns>
private static Func<object, object> CreateCloneHandler(Type type)
{
return Delegate.CreateDelegate(typeof(Func<object, object>), new Func<object, object>(CloneAs<object>).Method.GetGenericMethodDefinition().MakeGenericMethod(type)) as Func<object, object>;
}
/// <summary>
/// 克隆一个类
/// </summary>
/// <typeparam name="TValue"></typeparam>
/// <param name="value"></param>
/// <returns></returns>
private static object CloneAs<TValue>(object value)
{
return Copier<TValue>.Clone((TValue)value);
}
/// <summary>
/// 生成一份指定数据的克隆体
/// </summary>
/// <typeparam name="TValue">数据的类型</typeparam>
/// <param name="value">需要克隆的值</param>
/// <returns>克隆后的数据</returns>
public static TValue Clone<TValue>(TValue value)
{
if (value == null)
{
return value;
}
return Copier<TValue>.Clone(value);
}
/// <summary>
/// 辅助类,完成数据克隆
/// </summary>
/// <typeparam name="TValue">数据类型</typeparam>
private static class Copier<TValue>
{
/// <summary>
/// 用于克隆的句柄
/// </summary>
internal static readonly Func<TValue, TValue> Clone;
/// <summary>
/// 初始化
/// </summary>
static Copier()
{
MethodFactory<Func<TValue, TValue>> method = MethodFactory.Create<Func<TValue, TValue>>();
Type type = typeof(TValue);
if (type == typeof(object))
{
method.LoadArg(0).Return();
return;
}
switch (Type.GetTypeCode(type))
{
case TypeCode.Object:
if (type.IsClass)
{
method.LoadArg(0).Call(Reflector.GetMethod(typeof(object), "MemberwiseClone")).Cast(typeof(object), typeof(TValue)).Return();
}
else
{
method.LoadArg(0).Return();
}
break;
default:
method.LoadArg(0).Return();
break;
}
Clone = method.Delegation;
}
}
#endregion
MemberwiseClone requires less maintenance. I don't know if having default property values helps any, maybe if could ignore items with default values.
Here is a small helper class that uses reflection to access MemberwiseClone
and then caches the delegate to avoid using reflection more than necessary.
public static class CloneUtil<T>
{
private static readonly Func<T, object> clone;
static CloneUtil()
{
var cloneMethod = typeof(T).GetMethod("MemberwiseClone", System.Reflection.BindingFlags.Instance | System.Reflection.BindingFlags.NonPublic);
clone = (Func<T, object>)cloneMethod.CreateDelegate(typeof(Func<T, object>));
}
public static T ShallowClone(T obj) => (T)clone(obj);
}
public static class CloneUtil
{
public static T ShallowClone<T>(this T obj) => CloneUtil<T>.ShallowClone(obj);
}
You can call it like this:
Person b = a.ShallowClone();