In this this question, I give some background on a parallel language I have implemented. The compiler generates native x86-32 code.
A key implementation decision is to allocate stack space from the heap for every function (call). This allows for recursion until you run out of VM, and enables a cactus stack for lexical scopes even for nested parallel children, etc.
The compiler's code generator can compute how much stack space is needed by the function itself; that's messy but straightforward and it already does that well. There's no problem with stack demands from OS calls; my functions don't make any (if that's needed, the code switches to a standard "big stack", does the system calls, and then switches back). To be safe in the face of exceptions and asynchronous calls, it adds an egregious constant, presently about 500 bytes to that stack space needed by a function, intended to cover a x86-32 complete context save, calibrated from windows 32 experience.
This language and the asynch exception handling all work great on x86-32 systems. We have occasional problems running this 32 bit implementation on x86-64 systems. I suspect a stack overflow on an exception.
The question is, how much can Windows push onto a stack for a (divide by zero) hardware exception, or a StopThread call, when running my 32 bit implementation on a Windows 64 box? I'm nervous that Windows pushes a complete x86-64 context, which is way bigger than an x86-32 context. Does anybody know? Is there a document that answers this chapter-and-verse?
I'm about ready to run some dynamic experiments to see.
[Answer complete; see specific values for both Win32 Vista and Win64 Wow64 for Windows7]
Running on 32 bit Windows Vista, doing an IDIV with zero divisor, I get the following values:
So from the ESP=0x1C00FF8 at the point of the divide, to the bottom of the pushed context block, 0x1C00FF8-0x1C00D2C = 0x2CC = 716 bytes are pushed. From the bottom of the pushed context block to the entry at SEH, 0x1C00D2C-0x1C00C30 = 0xFC == 252 bytes are pushed. So, it appears that 716+252 = 968 bytes get pushed (which I find rediculous).
It gets worse. What follows is a dump of the stack frame at entry to SEH; notice the values below 0x1C00C30 down to 0x1C00B78 (see at least the "obvious Win32 return address" 0x77c39534 at 0x1C00BD8) that are not cdcdcdcd; I believe that Windows has stepped on these values while passing control to my SEH. That's 0x1C00B78-0x1C00C30 = 0xB8 = 184 additional bytes. (So, rediculous + unbelievable) = 1152 bytes are needed to get to the SEH, minimum. [Weirdly, a Win32 ThreadStop executed by another thread appears to push nothing on the stopped threads stack]
Running on 64 bit Windows 7, running 32 bit process under WOW64, doing an IDIV with zero divisor, I get the following values:
So from the ESP=0x02100FF8 at the point of the divide, to the bottom of the pushed context block, 0x02100FF8-0x02100D10 = 0x2E8 = 744 bytes are pushed (Win32 pushed 716). From the bottom of the pushed context block to the entry at SEH, 0x02100D10-0x02100BD4 = 0x132 == 316 bytes are pushed (Windows32 pushed 252). So, it appears that 744+316 = 1060 bytes get pushed (which I find worse than the rediculous amount pushed by Win32).
It gets worse. What follows is a dump of the stack frame at entry to SEH; notice the values below 0x02100BD4 down to 0x021009D8 (see at least the "obvious Win32 return address" 0x77c39534 at 0x021009D8) that are not cdcdcdcd; I believe that Windows has stepped on these values while passing control to my SEH. That's 0x02100BD4-0x021009D8 = 0x1FC = 508 additional bytes. (So, rediculous + unbelievable) = 1568 bytes are needed to get to the SEH, minimum.
Final summary for cost to enter SEH:
Its damn hard to define a "small activation record" scheme in the face of Windows profligate use of stack space.
I'd guess that exception handling under Windows must be exceeding slow, to boot; it takes time to read and write all those bytes.
I'll likely try this again with a beta version of Windows8. I expect to to be disgusted.
The same stack context, if you talking about emulated x32 environment on a x64 box, then its exactly the same size as on a x32 which in my case is 0x3E0 bytes aligned to DWORD.
Everything emulated on WOW64 process should be handled exactly the same at is x32 counter part at least when it comes to functionality, now if you rely on TEB32 for inspect the stack that is a different case, as you can see on this article:
http://www.dumpanalysis.org/blog/index.php/2009/07/07/raw-stack-dump-of-wow64-process/
Sadly there isn't an official paper i could find regarding your question.
Also here is an interesting paper you can read regarding WOW64 emulated process:
http://blog.rewolf.pl/blog/?p=102#.UBTmHaBEUXw
Finally if what you meant is to have an stack made function to handle exceptions it can be done without any worry, i can see here on trace logs after the exception is trigger, how some function made in stack, is receiving the exception before the mine SEH, it seems to be some kind of Avast engine or maybe some spyware, i cant trace it to any knowed module since after the function pass, it is dealloced.
Hope that i've helped on something.
PS: If you can post some extra info maybe the stack log and your function to handle exceptions, we could help more.