I wonder what is the fastest way to do shallow copying in C#? I only know there are 2 ways to do shallow copy:
- MemberwiseClone
- Copy each field one by one (manual)
I found that (2) is faster than (1). I'm wondering if there's another way to do shallow copying?
Here is a small helper class that uses reflection to access
MemberwiseClone
and then caches the delegate to avoid using reflection more than necessary.You can call it like this:
I'd like to start with a few quotes:
and
Theoretically the best implementation of a shallow copy is a C++ copy constructor: it knows the size compile-time, and then does a memberwise clone of all fields. The next best thing is using
memcpy
or something similar, which is basically howMemberwiseClone
should work. This means, in theory it should obliterate all other possibilities in terms of performance. Right?... but apparently it isn't blazing fast and it doesn't obliterate all the other solutions. At the bottom I've actually posted a solution that's over 2x faster. So: Wrong.
Testing the internals of MemberwiseClone
Let's start with a little test using a simple blittable type to check the underlying assumptions here about performance:
The test is devised in such a way that we can check the performance of
MemberwiseClone
agaist rawmemcpy
, which is possible because this is a blittable type.To test by yourself, compile with unsafe code, disable the JIT suppression, compile release mode and test away. I've also put the timings after every line that's relevant.
Implementation 1:
Basically I ran these tests a number of times, checked the assembly output to ensure that the thing wasn't optimized away, etc. The end result is that I know approximately how much seconds this one line of code costs, which is 0.40s on my PC. This is our baseline using
MemberwiseClone
.Implementation 2:
If you look closely at these numbers, you'll notice a few things:
So why is all of this so slow?
My explanation is that it has to do with the GC. Basically the implementations cannot rely on the fact that memory will stay the same before and after a full GC (The address of the memory can be changed during a GC, which can happen at any moment, including during your shallow copy). This means you only have 2 possible options:
GCHandle.Alloc
is just one of the ways to do this, it's well known that things like C++/CLI will give you better performance.MemberwiseClone
will use method 1, which means you'll get a performance hit because of the pinning procedure.A (much) faster implementation
In all cases our unmanaged code cannot make assumptions about the size of the types and it has to pin data. Making assumptions about size enables the compiler to do better optimizations, like loop unrolling, register allocation, etc. (just like a C++ copy ctor is faster than
memcpy
). Not having to pin data means we don't get an extra performance hit. Since .NET JIT's to assembler, in theory this means that we should be able to make a faster implementation using simple IL emitting, and allowing the compiler to optimize it.So to summarize on why this can be faster than the native implementation?
What we're aiming for is the performance of raw
memcpy
or better: 0.17s.To do that, we basically cannot use more than just a
call
, create the object, and perform a bunch ofcopy
instructions. It looks a bit like theCloner
implementation above, but some important differences (most significant: noDictionary
and no redundantCreateDelegate
calls). Here goes:I've tested this code with the result: 0.16s. This means it's approximately 2.5x faster than
MemberwiseClone
.More importantly, this speed is on-par with
memcpy
, which is more or less the 'optimal solution under normal circumstances'.Personally, I think this is the fastest solution - and the best part is: if the .NET runtime will get faster (proper support for SSE instructions etc), so will this solution.