I know that x87 has higher internal precision, which is probably the biggest difference that people see between it and SSE operations. But I have to wonder, is there any other benefit to using x87? I have a habit of typing -mfpmath=sse
automatically in any project, and I wonder if I'm missing anything else that the x87 FPU offers.
问题:
回答1:
x87 has some instructions that don't exist in the SSE instruction set.
Out of the head it's all the trigonometric stuff like fsin, fcos, fatan, fatan2 and some of the exponential/logarithm stuff.
If your code spends most of the time doing trigonometry you may see a slight performance boost if you use x87. Some DSP algorithms would fall into this category.
However, for code math-code where you spend most of your time doing additions, multiplications ect. SSE is usually faster.
回答2:
- It's present on really old machines.
EOF
回答3:
FPU instructions are smaller than SSE instructions, so they are ideal for demoscene stuff
回答4:
There is considerable legacy and small system compatibility with the x87: SSE is a relatively new processor feature. If your code is to run on an embedded microcontroller, there's a good chance it won't support SSE instructions.
Even systems which don't have an FPU installed will often provide 80x87 emulators which will make the code run transparently (more or less). I don't know of any SSE emulators—certainly one of my systems doesn't have any, so the newest Adobe Photoshop elements versions refuse to run.
The 80x87 instructions have good parallel operation characteristics which have been thoroughly explored and analyzed since its introduction in 1982 or so. Various clones of the x86 might stall on an SSE instructions.
回答5:
Conversion between float
and double
is faster with x87 (usually free) than with SSE. With x87, you can load and store a float
, double
or long double
to or from the register stack and it is converted to or from extended precision without extra cost. With SSE, additional instructions are required to do the type conversion if types are mixed, because the registers contain float
or double
values. These conversion instructions are fairly fast but do take extra time.
The real fix is to refrain from mixing float
and double
excessively, not to use x87, of course.