I am trying to make a program that calculates equations (what equation doesn't matter currently) that use 64-bit registers, floats, and coprocessor instructions. Unfortunately I don't know how to access the final outcome of the equation as a float. I can do:
fist qword ptr [bla]
mov rax,bla
and change the function type to INT and get my value, but I cannot access it as a FLOAT. Even when I leave the result in ST(0) (the top of the coprocessor stack) it doesn't work as expected and my C++ program gets the wrong result. My assembly code is:
public funct
.data
bla qword ?
bla2 qword 10.0
.code
funct PROC
push rbp
mov rbp, rsp
push rbx
mov bla,rcx
fild qword ptr[bla]
fld qword ptr [bla2]
fmul st(0), st(1)
fist dword ptr [bla]
pop rbx
pop rbp
ret
funct ENDP
END
My C++ code is:
#include <stdlib.h>
#include <cstdlib>
#include <stdio.h>
extern "C" float funct(long long n);
int main(){
float value1= funct(3);
return 0;
}
What is the problem, and how can I fix it?
Your question is a bit ambiguous, and so is your code. I'll present a few ideas using the x87 FPU, and SSE instructions. The usage of x87 FPU instructions is discouraged in 64-bit code, and SSE/SSE2 is preferred. SSE/SSE2 are available on all 64-bit AMD and 64-bit Intel x86 processors.
32-bit float in 64-bit code using x87 FPU
If your question is "How do I write 64-bit assembler code that uses 32-bit floats using the x87 FPU?" then there your C++ code looks fine, but your assembler code needs some work. Your C++ code suggests the output type of the function is a 32-bit float:
We need to create a function that returns a 32-bit float. Your assembler code could be modified in the following fashion. I am keeping the stack frame code and the push/pop of RBX in your code, since I assume you were just giving us a minimal example and that your real code is using RBX. With that in mind the following code should work:
I've commented the code, but the thing that might be of interest is that I don't use a second variable in the DATA section. The 64-bit Windows Calling Convention requires the caller of a function to ensure the stack is aligned on a 16-byte boundary and that there is a 32 byte shadow space (AKA register parameter area) allocated before making a call. This area can be used as a scratch area. Since we set up a stack frame, RBP is at
RBP+0
, the return address is atRBP+8
and the scratch area starts atRBP+16
. If you weren't using a stack frame then the return address is atRSP+0
, and the shadow space would start atRSP+8
We can store the result of our floating point operation there instead of in the QWORD you labelled bla.It is a reasonable idea to unwind the floating point stack so nothing remains on it before we exit our function. I use the FPU floating point functions that pop the registers after we are done using them.
The 64-bit Microsoft calling convention requires floating point values to be returned in XMM0. We use the SSE instruction MOVSS to move a scalar single (32-bit float) to the XMM0 register. That is where the C++ code will expect that value to be returned.
32-bit float in 64-bit code using SSE
Building on the ideas in the section above, we can modify the code to use SSE instructions with 32-bit floats. An example of such code is as follows:
This code removes the usage of the x87 FPU by using SSE instructions. In particular we use:
CVTSI2SS converts a scalar integer to a scalar single (float). In this case the 64-bit integer value in RCX is converted to a 32-bit float and stored in XMM0. XMM0 is the register we'll be placing our returned value into. XMM0 to XMM5 are considered volatile so we don't need to save their values.
MULSS is an SSE instruction that is used for SSE multiplication using scalar single (float). In this case MULSS would do XMM0=XMM0*(32-bit float memory operand). This would have the effect of doing 32-bit floating point multiply of XMM0 by the 32-bit float of 10.0. Since XMM0 also contains our final result we have nothing more to do but properly exit the function.
64-bit double float in 64-bit code using x87 FPU
This is a variation on the first example, but now we are using 64-bit floats also known as the
double
type in C++,REAL8
(orQWORD
) in assembler, and ascalar double
in SSE2. Since we are now usingdouble
as the return type we have to modify the C++ code to be:The assembly code would look like:
This code is nearly identical to the x87 code using 32-bit float. We are using REAL8 (same as QWORD) to store a 64-bit float and use MOVSD to move a 64-bit double float (scalar double) to XMM0. MOVSD is an SSE2 instruction. It is important to return the proper size float in XMM0. Had you used MOVSS the value returned to the C++ function would likely be incorrect.
64-bit double float in 64-bit code using SSE2
This is a variation on the second example, but now we are using 64-bit floats also known as the
double
type in C++,REAL8
(orQWORD
) in assembler, and ascalar double
in SSE2. The C++ code should use the code from the previous section so that double is used instead of float. The assembler code would be similar to this:The primary difference from the second example is that we use CVTSI2SD instead of CVTSI2SS. SD in the instruction means we are converting to a scalar double (64-bit double float). Similarly we use the MULSD instruction for multiplication using scalar doubles. XMM0 will hold the 64-bit scalar double (double float) that will be returned to the calling function.
You could pass the address of the result as parameter:
main.c:
callee.asm: