I recently used the /FAsu
Visual C++ compiler option to output the source + assembly of a particularly long member function definition. In the assembly output, after the stack frame is set up, there is a single call to a mysterious _chkstk()
function.
The MSDN page on _chkstk()
does not explain the reason why this function is called. I have also seen the Stack Overflow question Allocating a buffer of more a page size on stack will corrupt memory?, but I do not understand what the OP and the accepted answer are talking about.
What is the purpose of the _chkstk()
CRT function? What does it do?
Windows pages in extra stack for your thread as it is used. At the end of the stack, there is one guard page mapped as inaccessible memory -- if the program accesses it (because it is trying to use more stack than is currently mapped), there's an access violation. The OS catches the fault, maps in another page of stack at the same address as the old guard page, creates a new guard page just beyond the old one, and resumes from the instruction that caused the violation.
If a function has more than one page of local variables, then the first address it accesses might be more than one page beyond the current end of the stack. Hence it would miss the guard page and trigger an access violation that the OS doesn't realise is because more stack is needed. If the total stack required is particularly huge, it could perhaps even reach beyond the guard page, beyond the end of the virtual address space assigned to stack, and into memory that's actually in use for something else.
So, _chkstk
ensures that there is enough space for the local variables. You can imagine that it does this by touching the memory for the local variables at page-sized intervals, in increasing order, to ensure that it doesn't miss the guard page (so-called "stack probes"). I don't know whether it actually does that, though, possibly it takes a more direct route and instructs the OS to map in a certain amount of stack. Either way, if the total required is greater than the virtual address space available for stack, then the OS can complain about it instead of doing something undefined.
I looked at the code for __chkstk
and it does do the repeated stack probes at one-page intervals. So this way, it doesn't need to make any calls to the OS. The parameter in rax
is size of data you want to add. It ensures that the target address (current rsp
- rax
) is accessible. If rax
> rsp
, it does this for address 0. As an interesting shortcut, it first compares the address with gs:[10h]
, which is the current lowest page that is mapped; if the target address >= this, then it does nothing.
By the way, for 64-bit code at least, it is spelled with two underscores: __chkstk__
.