I have C# code that interacts with C++ code, which performs operations with strings.
I have this piece of code in a static helper class:
internal static unsafe byte* GetConstNullTerminated(string text, Encoding encoding)
{
int charCount = text.Length;
fixed (char* chars = text)
{
int byteCount = encoding.GetByteCount(chars, charCount);
byte* bytes = stackalloc byte[byteCount + 1];
encoding.GetBytes(chars, charCount, bytes, byteCount);
*(bytes + byteCount) = 0;
return bytes;
}
}
As you can see, it returns a pointer to the bytes created with the stackalloc
keyword.
However from the C# Specifications 18.8:
All stack allocated memory blocks created during the execution of a function member are automatically discarded when that function member returns.
Does it mean that the pointer is actually invalid as soon as the method returns?
Current usage of the method:
byte* bytes = StringHelper.GetConstNullTerminated(value ?? string.Empty, Encoding);
DirectFunction(NativeMethods.SCI_SETTEXT, UIntPtr.Zero, (IntPtr) bytes);
Should the code be changed to
...
int byteCount = encoding.GetByteCount(chars, charCount);
byte[] byteArray = new byte[byteCount + 1];
fixed (byte* bytes = byteArray)
{
encoding.GetBytes(chars, charCount, bytes, byteCount);
*(bytes + byteCount) = 0;
}
return byteArray;
And use fixed
again on the array returned, to pass the pointer to the DirectFunction
method?
I'm trying to minimise the number of fixed
usages (including the fixed
statements in other overloads of GetByteCount()
and GetBytes()
of Encoding
).
tl;dr
Is the pointer invalid as soon as the method returns? Is it invalid at the point of being passed to DirectFunction()
?
If so, what is the best way to use the fewest fixed
statements to achieve the task?
Does it mean that the pointer is actually invalid as soon as the method returns?
Yes, it is technically invalid - although it almost certainly won't be detected. This scenario is self-inflicted via unsafe
. Any action on that memory now has undefined behavior. Anything you do, but in particular calling methods, may randomly overwrite that memory - or not - depending on the relative stack-frame sizes and depth.
This scenario is specifically one of the ones that the proposed future ref
changes hope to target, meaning: allowing stackalloc
into ref
(rather than a pointer), with the compiler knowing that it is a stack-referring ref
or ref-like type, and thus disallowing ref
-return of that value.
Ultimately, the moment you type unsafe
you're saying "I take full responsibility if this goes wrong". In this case, it is indeed wrong.
It is valid to use the pointer before leaving the method, so one viable approach might be (assuming yo want a fairly general purpose API) to allow the caller to pass in a delegate or interface that specifies what the caller wants you to do with the pointer, i.e.
StringHelper.GetConstNullTerminated(value ?? string.Empty, Encoding,
ptr => DirectFunction(NativeMethods.SCI_SETTEXT, UIntPtr.Zero, (IntPtr) ptr));
with:
unsafe delegate void PointerAction(byte* ptr);
internal static unsafe void GetConstNullTerminated(string text, Encoding encoding,
PointerAction action)
{
int charCount = text.Length;
fixed (char* chars = text)
{
int byteCount = encoding.GetByteCount(chars, charCount);
byte* bytes = stackalloc byte[byteCount + 1];
encoding.GetBytes(chars, charCount, bytes, byteCount);
*(bytes + byteCount) = 0;
action(bytes);
}
}
Note also that very large strings may cause you to stack-overflow.
stackalloc causes memory to be allocated on the stack. The stack is automatically unwound when the function returns. C# is protecting you from creating a hanging pointer by not letting you return the pointer, as there is no possible way the memory could still be valid after the stack is unwound when the function returns.
If you want the memory to live beyond the scope of the function allocating it, you cannot allocate it on the stack. You have to allocate on the heap via new.