Writing naked functions with custom prolog and epi

2019-04-10 09:13发布

问题:

I'm writing some plugin code in a dll that is called by a host over which I have no control.

The host assumes that the plugins are exported as __stdcall functions. The host is told the name of the function and the details of the arguments that it expects and dynamically crufts up a call to it via LoadLibrary, GetProcAddress and manually pushing the arguments onto the stack.

Usually plugin dlls expose a constant interface. My plugin exposes an interface that is configured at dll load time. To achieve this my plugin exposes a set of standard entry points that are defined at the time the dll is compiled and it allocates them as needed to internal functionality that's being exposed.

Each of the internal functions may take different arguments but this is communicated to the host along with the physical entrypoint name. All of my physical dll entrypoints are defined to take a single void * pointer and I marshal subsequent parameters from the stack myself by working from offsets from the first argument and the known argument list that has been communicated to the host.

The host can successfully call the functions in my plugin with the correct arguments and all works well... However, I'm aware that a) my functions aren't cleaning up the stack as they're supposed to as they're defined as __stdcall functions that take a 4 byte pointer and so they always do a 'ret 4' at the end even if the caller has pushed more arguments onto the stack. and b) I can't deal with functions that take no arguments as the ret 4 will pop 4 bytes too many off of the stack on my return.

Having traced out of my plugin into the host's calling code I can see that actually a) isn't that big a deal; the host loses some stack space until it returns from the dispatch call at which point it cleans up its stack frame which cleans up my rubbish; however...

I can solve b) by switching to __cdecl and not cleaning up at all. I assume I can solve a) by switching to naked functions and writing my own generic argument clean up code.

Since I know the amount of argument space used by the function that was just called I had hoped that it would be as simple as:

extern "C" __declspec(naked) __declspec(dllexport) void  * __stdcall EntryPoint(void *pArg1)
{                                                                                                        
   size_t argumentSpaceUsed;
   {
      void *pX = RealEntryPoint(
         reinterpret_cast<ULONG_PTR>(&pArg1), 
         argumentSpaceUsed);

      __asm
      {
         mov eax, dword ptr pX
      }
   }
   __asm
   {
      ret argumentSpaceUsed
   }
}

But that doesn't work as ret needs a compile time constant... Any suggestions?

UPDATED:

Thanks to Rob Kennedy's suggestions I've got to this, which seems to work...

extern "C" __declspec(naked) __declspec(dllexport) void  * __stdcall EntryPoint(void *pArg1)
{      
   __asm {                                                                                                        
      push ebp          // Set up our stack frame            
      mov ebp, esp  
      mov eax, 0x0      // Space for called func to return arg space used, init to 0            
      push eax          // Set up stack for call to real Entry point
      push esp
      lea eax, pArg1                
      push eax                      
      call RealEntryPoint   // result is left in eax, we leave it there for our caller....         
      pop ecx 
      mov esp,ebp       // remove our stack frame
      pop ebp  
      pop edx           // return address off
      add esp, ecx      // remove 'x' bytes of caller args
      push edx          // return address back on                   
      ret                        
   }
}

Does this look right?

回答1:

Since ret requires a constant argument, you need to arrange for your function to have a constant number of parameters, but that situation is only required at the point you're ready to return from the function. So, just before the end of the function, do this:

  1. Pop the return address off the top of the stack and store it in a temporary; ECX is a good place.
  2. Remove the variable number of arguments from the stack, either by popping each one off individually, or by adjusting ESP directly.
  3. Push the return address back onto the stack.
  4. Use ret with a constant argument.

Incidentally, the issue you refer to as (a) really is a problem, in the general case. You've just been lucky that the caller seems to always refer to its own local variables using a frame pointer instead of the stack pointer. Functions aren't required to do that, though, and there's no guarantee that a future version of the host program will continue to work that way. The compiler is also liable to save some register values on the stack only for the duration of the call, and then expect to be able to pop them off again afterward. Your code would break that.