I have an x86 NASM program which seems to work perfectly. I have problems using the values returned from it. This is 32-Bit Windows using MSVC++. I expect the return value in ST0.
A minimal example demonstrating the problem with the returned values can be seen in this C++ and NASM assembly code:
#include <iostream>
extern "C" float arsinh(float);
int main()
{
float test = arsinh(5.0);
printf("%f\n", test);
printf("%f\n", arsinh(5.0));
std::cout << test << std::endl;
std::cout << arsinh(5.0) << std::endl;
}
Assembly code:
section .data
value: dq 1.0
section .text
global _arsinh
_arsinh:
fld dword[esi-8] ;loads the given value into st0
ret
I can't figure out how to use the return value though, as I always get the wrong value no matter which data type I use. In this example the value 5 should be returned and I'd expect output like:
5.000000
5.000000
5
5
Instead I get output similar to:
-9671494178951383518019584.000000
-9671494178951383518019584.000000
-9.67149e+24
5
Only the final value appears to be correct. What is wrong with this code? Why doesn't it always return the floating point value I am expecting from my function? How can I fix this code?
The primary issue is not that there is a failure returning a value in floating point register ST0, but in the way you attempt to load the 32-bit (single precision) float parameter from the stack. The issue is here:
fld dword[esi-8] ;loads the given value into st0
This should read:
fld dword[esp+4] ;loads the DWORD parameter from stack into st0
fld dword[esi-8]
only works sometimes because of the way the calling function uses ESI internally. With different C compilers and optimizations enabled you may find the code fails to work altogether.
With 32-bit C/C++ code parameters are passed on the stack from right to left. When you do a CALL instruction in 32-bit code the 4 byte return address is placed on the stack. Memory address esp+0
would contain the return address and the first parameter would be at esp+4
. If you had a second parameter it would be at esp+8
. A good description of the Microsoft 32-bit CDECL calling convention can be found in this WikiBook entry. Of importance:
- Function arguments are passed on the stack, in right-to-left order.
- Function result is stored in EAX/AX/AL
- Floating point return values will be returned in ST0
- 8-bit and 16-bit integer arguments are promoted to 32-bit arguments.
When dealing with x87 FPU instructions it is very important that the only value on the stack when returning a FLOAT is the value in ST0. Failure to release(popping/freeing) anything else you put on the FPU stack can lead to your function failing when called multiple times. The x87 FPU stack only has 8 slots (not very many). If you don't clean off the FPU stack before the function returns, can lead to FPU stack overflows when future instructions need to load a new value on the FPU stack.
An example implementation of your function could have looked like:
use32
section .text
; _arsinh takes a single float (angle) as a parameter
; angle is at memory location esp+4 on the stack
; arcsinh(x) = ln(x + sqrt(x^2+1))
global _arsinh
_arsinh:
fldln2 ; st(0) = ln2
fld dword[esp+4] ; st(0) = angle, st(1)=ln2
fld st0 ; st(0) = angle, st(1) = angle, st(2)=ln2
fmul st0 ; st(0) = angle^2, st(1) = angle, st(2)=ln2
fld1 ; st(0) = 1, st(1) = angle^2, st(2) = angle, st(3)=ln2
faddp ; st(0) = 1 + angle^2, st(1) = angle, st(2)=ln2
fsqrt ; st(0) = sqrt(1 + angle^2), st(1) = angle, st(2)=ln2
faddp ; st(0) = sqrt(1 + angle^2) + angle, st(1)=ln2
fyl2x ; st(0) = log2(sqrt(1 + angle^2) + angle)*ln2
; st(0) = asinh(angle)
ret