According to Intel documentation, this is what FPTAN
does:
Replace ST(0) with its approximate tangent and push 1 onto the FPU stack.
And this is a code I wrote in NASM:
section .data
fVal: dd 4
fSt0: dq 0.0
fSt1: dq 0.0
section .text
fldpi
fdiv dword[fVal] ; divide pi by 4 and store result in ST(0).
fptan
fstp qword[fSt0] ; store ST(0)
fstp qword[fSt1] ; store ST(1)
At this point the values of fSt0
and fSt1
, I find are:
fSt0 = 5.60479e+044
fSt1 = -1.#IND
But, shouldn't fSt0
and fSt1
be both 1
?
As Michael Petch has already pointed out in a comment, you have a simple typo. Instead of declaring
fVal
as a floating-point value (as intended), you declared it as a 32-bit integer. Change:to:
Then your code will work as intended. It is correctly written.
If you wanted to take an integer input, you could do it by changing your code to use the
FIDIV
instruction. This instruction will first convert an integer to a double-precision floating-point value, and then do the divide:But because the conversion is required, this is slightly less efficient than if you had just given the input as a floating-point value.
Note that, if you were going to do this, it would be more efficient on certain older CPUs to break up the load so that it was done separately from the division—e.g.,
In other words, we break the
FIDIV
instruction apart into separateFILD
(integer load) andFDIVP
(divide-and-pop) instructions. This improves overlapping, and thus shaves off a couple of clock cycles from the execution speed of the code. (On newer CPUs, from AMD Family 15h [Bulldozer] and Intel Pentium II and later—there's no real advantage to breaking upFIDIV
intoFILD
+FDIV
; either way you write it should be equally performant.)Of course, since everything you have here is a constant, and
tan(pi/4) == 1
, your code is equivalent to:…which is what an optimizing compiler would generate. :-)