Is using float
type slower than using double
type?
I heard that modern Intel and AMD CPUs can do calculations with doubles faster than with floats.
What about standard math functions (sqrt
, pow
, log
, sin
, cos
, etc.)? Computing them in single-precision should be considerably faster because it should require less floating-point operations. For example, single precision sqrt
can use simpler math formula than double precision sqrt
. Also, I heard that standard math functions are faster in 64 bit mode (when compiled and run on 64 bit OS). What is the definitive answer on this?
The classic x86 architecture uses floating-point unit (FPU) to perform floating-point calculations. The FPU performs all calculations in its internal registers, which have 80-bit precision each. Every time you attempt to work with float
or double
, the variable is first loaded from memory into the internal register of the FPU. This means that there is absolutely no difference in the speed of the actual calculations, since in any case the calculations are carried out with full 80-bit precision. The only thing that might be different is the speed of loading the value from memory and storing the result back to memory. Naturally, on a 32-bit platform it might take longer to load/store a double
as compared to float
. On a 64-bit platform there shouldn't be any difference.
Modern x86 architectures support extended instruction sets (SSE/SSE2) with new instructions that can perform the very same floating-point calculations without involving the "old" FPU instructions. However, again, I wouldn't expect to see any difference in calculation speed for float
and double
. And since these modern platforms are 64-bit ones, the load/store speed is supposed to be the same as well.
On a different hardware platform the situation could be different. But normally a smaller floating-point type should not provide any performance benefits. The main purpose of smaller floating-point types is to save memory, not to improve performance.
Edit: (To address @MSalters comment)
What I said above applies to fundamental arithmetical operations. When it comes to library functions, the answer will depend on several implementation details. If the platform's floating-point instruction set contains an instruction that implements the functionality of the given library function, then what I said above will normally apply to that function as well (that would normally include functions like sin
, cos
, sqrt
). For other functions, whose functionality is not immediately supported in the FP instruction set, the situation might prove to be significantly different. It is quite possible that float
versions of such functions can be implemented more efficiently than their double
versions.
Your first question has already been answer here on SO.
Your second question is entirely dependent on the "size" of the data you are working with. It all boils down to the low level architecture of the system and how it handles large values. 64-bits of data in a 32 bit system would require 2 cycles to access 2 registers. The same data on a 64 bit system should only take 1 cycle to access 1 register.
Everything always depends on what you're doing. I find there are no fast and hard rules so you need to analyze the current task and choose what works best for your needs for that specific task.
From some research and empirical measurements I have made in Java:
- basic arithmetic operations on doubles and floats essentially perform identically on Intel hardware, with the exception of division;
- on the other hand, on the Cortex-A8 as used in the iPhone 4 and iPad, even "basic" arithmetic on doubles takes around twice as long as on floats (a register FP addition on a float taking around 4ns vs a register FP on a double taking around 9ns);
- I've made some timings of methods on java.util.Math (trigonometrical functions etc) which may be of interest -- in principle, some of these may well be faster on floats as fewer terms would be required to calculate to the precision of a float; on the other hand, many of these end up being "not as bad as you'd think";
It is also true that there may be special circumstances in which e.g. memory bandwidth issues outweigh "raw" calculation times.
The "native" internal floating point representation in the x86 FPU is 80 bits wide. This is different from both float
(32 bits) and double
(64 bits). Every time a value moves in or out of the FPU, a conversion is performed. There is only one FPU instruction that performs a sin operation, and it works on the internal 80 bit representation.
Whether this conversion is faster for float
or for double
depends on many factors, and must be measured for a given application.
While on most systems double
will be the same speed as float
for individual values, you're right that computing functions like sqrt
, sin
, etc. in single-precision should be a lot faster than computing them to double-precision. In C99, you can use the sqrtf
, sinf
, etc. functions even if your variables are double
, and get the benefit.
Another issue I've seen mentioned is memory (and likewise storage device) bandwidth. If you have millions or billions of values to deal with, float
will almost certainly be twice as fast as double
since everything will be memory-bound or io-bound. This is a good reason to use float
as the type in an array or on-disk storage in some cases, but I would not consider it a good reason to use float
for the variables you do your computations with.
It depends on the processor. If the processor has native double-precision instructions, it'll usually be faster to just do double-precision arithmetic than to be given a float, convert it to a double, do the double-precision arithmetic, then convert it back to a float.