I'm trying to write a DynamicMethod
to wrap the cpblk
IL opcode. I need to copy chunks of byte arrays and on x64 platforms, this is supposedly the fastest way to do it. Array.Copy
and Buffer.BlockCopy
both work, but I'd like to explore all options.
My goal is to copy managed memory from one byte array to a new managed byte array. My concern is how do I know how to correctly "pin" memory location. I don't want the garbage collector to move the arrays and break everything. SO far it works but I'm not sure how to test if this is GC safe.
// copying 'count' bytes from offset 'index' in 'source' to offset 0 in 'target'
// i.e. void _copy(byte[] source, int index, int count, byte[] target)
static Action<byte[], int, int, byte[]> Init()
{
var dmethod = new DynamicMethod("copy", typeof(void), new[] { typeof(object),typeof(byte[]), typeof(int), typeof(int),typeof(byte[]) },typeof(object), true);
var il = dmethod.GetILGenerator();
il.DeclareLocal(typeof(byte).MakeByRefType(), true);
il.DeclareLocal(typeof(byte).MakeByRefType(), true);
// pin the source
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Ldarg_2);
il.Emit(OpCodes.Ldelema, typeof(byte));
il.Emit(OpCodes.Stloc_0);
// pin the target
il.Emit(OpCodes.Ldarg_S,(byte)4);
il.Emit(OpCodes.Ldc_I4_0);
il.Emit(OpCodes.Ldelema, typeof(byte));
il.Emit(OpCodes.Stloc_1);
il.Emit(OpCodes.Ldloc_1);
il.Emit(OpCodes.Ldloc_0);
// load the length
il.Emit(OpCodes.Ldarg_3);
// perform the memcpy
il.Emit(OpCodes.Unaligned,(byte)1);
il.Emit(OpCodes.Cpblk);
il.Emit(OpCodes.Ret);
return dmethod.CreateDelegate(typeof(Action<byte[], int, int, byte[]>)) as Action<byte[], int, int, byte[]>;
}
I believe that your usage of pinned local variables is correct.
You don't need to pin anything in this method, if you want to pin then pin your array before input to this method. You don't need to pin any pointer because address of an element alway same unless you restart your program, you can even stock it into intptr type without any problem.
.maxstack 3
ldarg.0
ldarg.1
ldelema int8
ldarg.2
ldarg.3
ldelema int8
ldarg.s 4
cpblk
ret
void cpblk<T>(ref T src, ref T dst, int c_elem)
Copies c_elem elements of type T from src to dst using the cpblk IL instruction. Note that c_elem indicates the number of elements, not the number of bytes. Tested with C#7 and .NET 4.7. See usage example below.
public static class IL<T>
{
public delegate void _cpblk_del(ref T src, ref T dst, int c_elem);
public static readonly _cpblk_del cpblk;
static IL()
{
var dm = new DynamicMethod("cpblk+" + typeof(T).FullName,
typeof(void),
new[] { typeof(T).MakeByRefType(), typeof(T).MakeByRefType(), typeof(int) },
typeof(T),
true);
var il = dm.GetILGenerator();
il.Emit(OpCodes.Ldarg_1);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldarg_2);
int cb = Marshal.SizeOf<T>();
if (cb > 1)
{
il.Emit(OpCodes.Ldc_I4, cb);
il.Emit(OpCodes.Mul);
}
byte align;
if ((cb & (align = 1)) != 0 ||
(cb & (align = 2)) != 0 ||
(cb & (align = 4)) != 0)
il.Emit(OpCodes.Unaligned, align);
il.Emit(OpCodes.Cpblk);
il.Emit(OpCodes.Ret);
cpblk = (_cpblk_del)dm.CreateDelegate(typeof(_cpblk_del));
}
}
Note that this code assumes that the elements are byte-packed (i.e., no padding between individual elements) and aligned according to their size. Specifically, the source and destination addresses should be divisible by 1 << floor(log₂(sizeof(T) & 0xF))
Said another way, if sizeof(T) % 8
is non-zero, then OpCodes.Unaligned
prefix is emitted specifying the highest divisor of that remainder amongst {1, 2, or 4}. For 8-byte alignment, no prefix is needed.
As an example, a 11-byte struct requires alignment prefix 1 because even if the first element in the range happens to be quad-aligned, byte-packing means the adjacent ones won't be. Normally, the CLR arranges arrays this way and you don't have to worry about these issues.
Usage:
var src = new[] { 1, 2, 3, 4, 5, 6 };
var dst = new int[6];
IL<int>.cpblk(ref src[2], ref dst[3], 2); // dst => { 0, 0, 0, 3, 4, 0 }
Automatic type inference (optional):
For automatic type inference, you can include the following class as well:
public static class IL
{
public static void cpblk<T>(ref T src, ref T dst, int c_elem)
=> IL<T>.cpblk(ref src, ref dst, c_elem);
}
With this, you don't need to specify the type arguments and the previous example becomes simply:
IL.cpblk(ref src[2], ref dst[3], 2);