This question already has an answer here:
-
Is using double faster than float?
7 answers
I am reading "accelerated C++". I found one sentence which states "sometimes double
is faster in execution than float
in C++". After reading sentence I got confused about float
and double
working. Please explain this point to me.
Depends on what the native hardware does.
If the hardware implements double (like the x86 does), then float is emulated by extending it there, and the conversion will cost time. In this case, double will be faster.
If the hardware implements float only, then emulating double with it will cost even more time. In this case, float will be faster.
And if the hardware implements neither, and both have to be implemented in software. In this case, both will be slow, but double will be slightly slower (more load and store operations at the least).
The quote you mention is probably referring to the x86 platform, where the first case was given. But this doesn't hold true in general.
You can find a complete answer on this article
What Every Computer Scientist Should Know About Floating-Point Arithmetic
This is a quote from a previous Stack Overflow Thread of float x double regarding Memory Bandwidth
If a double requires
more storage than a float, then it
will take longer to read the data.
That's the naive answer. On a modern
IA32, it all depends on where the data
is coming from. If it's in L1 cache,
the load is negligible provided the
data comes from a single cache line.
If it spans more than one cache line
there's a small overhead. If it's from
L2, it takes a while longer, if it's
in RAM then it's longer still and
finally, if it's on disk it's a huge
time. So the choice of float or double
is less imporant than the way the data
is used. If you want to do a small
calculation on lots of sequential
data, a small data type is preferable.
Doing a lot of computation on a small
data set would allow you to use bigger
data types with any significant
effect. If you're accessing the data
very randomly, then the choice of data
size is unimportant - data is loaded
in pages / cache lines. So even if you
only want a byte from RAM, you could
get 32 bytes transfered (this is very
dependant on the architecture of the
system). On top of all of this, the
CPU/FPU could be super-scalar (aka
pipelined). So, even though a load may
take several cycles, the CPU/FPU could
be busy doing something else (a
multiply for instance) that hides the
load time to a degree
Short answer is: it depends.
CPU with x87 will crunch floats and doubles equally fast. Vectorized code will run faster with floats, because SSE can crunch 4 floats or 2 doubles in one pass.
Another thing to consider is memory speed. Depending on your algorithm, your CPU could be idling a lot while waiting for the data. Memory intensive code will benefit from using floats, but ALU limited code won't (unless it is vectorized).
I can think of two basic cases when doubles are faster than floats:
Your hardware supports double operations but not float operations, so floats will be emulated by software and therefore be slower.
You really need the precision of doubles. Now, if you use floats anyway you will have to use two floats to reach similar precision to double. The emulation of a true double with floats will be slower than using floats in the first place.
- You do not necessarily need doubles but your numeric algorithm converges faster due to the enhanced precision of doubles. Also, doubles might offer enough precision to use a faster but numerically less stable algorithm at all.
For completeness' sake I also give some reasons for the opposite case of floats being faster. You can see for yourself whichs reasons dominate in your case:
Floats are faster than doubles when you don't need double's
precision and you are memory-bandwidth bound and your hardware
doesn't carry a penalty on floats.
They conserve memory-bandwidth because they occupy half the space
per number.
There are also platforms that can process more floats than doubles
in parallel.
On Intel, the coprocessor (nowadays integrated) will handle both equally fast, but as some others have noted, doubles result in higher memory bandwidth which can cause bottlenecks. If you're using scalar SSE instructions (default for most compilers on 64-bit), the same applies. So generally, unless you're working on a large set of data, it doesn't matter much.
However, parallel SSE instructions will allow four floats to be handled in one instruction, but only two doubles, so here float can be significantly faster.
There is only one reason 32-bit floats can be slower than 64-bit doubles (or 80-bit 80x87). And that is alignment. Other than that, floats take less memory, generally meaning faster access, better cache performance. It also takes fewer cycles to process 32-bit instructions. And even when (co)-processor has no 32-bit instructions, it can perform them on 64-bit registers with the same speed. It probably possible to create a test case where doubles will be faster than floats, and v.v., but my measurements of real statistics algos didn't show noticeable difference.
In experiments of adding 3.3 for 2000000000 times, results are:
Summation time in s: 2.82 summed value: 6.71089e+07 // float
Summation time in s: 2.78585 summed value: 6.6e+09 // double
Summation time in s: 2.76812 summed value: 6.6e+09 // long double
So double is faster and default in C and C++. It's more portable and the default across all C and C++ library functions. Alos double has significantly higher precision than float.
Even Stroustrup recommends double over float:
"The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don't have that understanding, get advice, take the time to learn, or use double and hope for the best."
Perhaps the only case where you should use float instead of double is on 64bit hardware with a modern gcc. Because float is smaller; double is 8 bytes and float is 4 bytes.
float is usually faster. double offers greater precision. However performance may vary in some cases if special processor extensions such as 3dNow or SSE are used.