I have a few questions about the functionality of the stackalloc
operator.
How does it actually allocate? I thought it does something like:
void* stackalloc(int sizeInBytes)
{
void* p = StackPointer (esp);
StackPointer += sizeInBytes;
if(StackPointer exceeds stack size)
throw new StackOverflowException(...);
return p;
}
But I have done a few tests, and I'm not sure that's how it work. We can't know exactly what it does and how it does it, but I want to know the basics.
I thought that stack allocation (Well, I am actually sure about it) is faster than heap allocation. So why does this example:
class Program
{
static void Main(string[] args)
{
Stopwatch sw1 = new Stopwatch();
sw1.Start();
StackAllocation();
Console.WriteLine(sw1.ElapsedTicks);
Stopwatch sw2 = new Stopwatch();
sw2.Start();
HeapAllocation();
Console.WriteLine(sw2.ElapsedTicks);
}
static unsafe void StackAllocation()
{
for (int i = 0; i < 100; i++)
{
int* p = stackalloc int[100];
}
}
static void HeapAllocation()
{
for (int i = 0; i < 100; i++)
{
int[] a = new int[100];
}
}
}
gives the average results of 280~ ticks for stack allocation, and usually 1-0 ticks for heap allocation? (On my personal computer, Intel Core i7).
On the computer I am using now (Intel Core 2 Duo), the results make more sense that the previous ones (Probably because Optimize code was not checked in VS):
460~ ticks for stack allocation, and about 380 ticks for heap allocation.
But this still doesn't make sense. Why is it so? I guess that the CLR notices that we don't use the array, so maybe it doesn't even allocate it?
A case where stackalloc is faster:
private static volatile int _dummy; // just to avoid any optimisations
// that have us measuring the wrong
// thing. Especially since the difference
// is more noticable in a release build
// (also more noticable on a multi-core
// machine than single- or dual-core).
static void Main(string[] args)
{
System.Diagnostics.Stopwatch sw1 = new System.Diagnostics.Stopwatch();
Thread[] threads = new Thread[20];
sw1.Start();
for(int t = 0; t != 20; ++t)
{
threads[t] = new Thread(DoSA);
threads[t].Start();
}
for(int t = 0; t != 20; ++t)
threads[t].Join();
Console.WriteLine(sw1.ElapsedTicks);
System.Diagnostics.Stopwatch sw2 = new System.Diagnostics.Stopwatch();
threads = new Thread[20];
sw2.Start();
for(int t = 0; t != 20; ++t)
{
threads[t] = new Thread(DoHA);
threads[t].Start();
}
for(int t = 0; t != 20; ++t)
threads[t].Join();
Console.WriteLine(sw2.ElapsedTicks);
Console.Read();
}
private static void DoSA()
{
Random rnd = new Random(1);
for(int i = 0; i != 100000; ++i)
StackAllocation(rnd);
}
static unsafe void StackAllocation(Random rnd)
{
int size = rnd.Next(1024, 131072);
int* p = stackalloc int[size];
_dummy = *(p + rnd.Next(0, size));
}
private static void DoHA()
{
Random rnd = new Random(1);
for(int i = 0; i != 100000; ++i)
HeapAllocation(rnd);
}
static void HeapAllocation(Random rnd)
{
int size = rnd.Next(1024, 131072);
int[] a = new int[size];
_dummy = a[rnd.Next(0, size)];
}
Important differences between this code and that in the question:
We have several threads running. With stack allocation, they are allocating in their own stack. With heap allocation, they are allocating from a heap shared with other threads.
Larger sizes allocated.
Different sizes allocated each time (though I seeded the random generator to make the tests more deterministic). This makes heap fragmentation more likely to happen, making heap allocation less efficient than with identical allocations each time.
As well as this, it's also worth noting that stackalloc
would often be used as an alternative to using fixed
to pin an array on the heap. Pinning arrays is bad for heap performance (not just for that code, but also for other threads using the same heap), so the performance impact would be even greater then, if the claimed memory would be in use for any reasonable length of time.
While my code demonstrates a case where stackalloc
gives a performance benefit, that in the question is probably closer to most cases where someone might eagerly "optimise" by using it. Hopefully the two pieces of code together show that whole stackalloc
can give a boost, it can also hurt performance a lot too.
Generally, you shouldn't even consider stackalloc
unless you are going to need to use pinned memory for interacting with unmanaged code anyway, and it should be considered an alternative to fixed
rather than an alternative to general heap allocation. Use in this case still requires caution, forethought before you start, and profiling after you finish.
Use in other cases could give a benefit, but it should be far down the list of performance improvements you would try.
Edit:
To answer part 1 of the question. Stackalloc is conceptually much as you describe. It obtains a chunk of the stack memory, and then returns a pointer to that chunk. It doesn't check the memory will fit as such, but rather if it attempts to obtain memory into the end of the stack - which is protected by .NET on thread creation - then this will cause the OS to return an exceptioin to the runtime, which it then turns into a .NET managed exception. Much the same happens if you just allocate a single byte in a method with infinite recursion - unless the call got optimised to avoid that stack allocation (sometimes possible), then a single byte will eventually add up to enough to trigger the stack overflow exception.