This may appear to be a bit stupid question but seeing Alexandre C's reply in the other topic, I'm curious to know that if there is any performance difference with the built-in types:
char
vsshort
vsint
vs.float
vs.double
.
Usually we don't consider such performance difference (if any) in our real life projects, but I would like to know this for educational purpose. The general questions can be asked is:
Is there any performance difference between integral arithmetics and floating-point arithmetic?
Which is faster? What is the reason for being faster? Please explain this.
Yes. However, this is very much platform and CPU specific. Different platforms can do different arithmetic operations at different speeds.
That being said, the reply in question was a bit more specific.
pow()
is a general purpose routine that works on double values. By feeding it integer values, it's still doing all of the work that would be required to handle non-integer exponents. Using direct multiplication bypasses a lot of the complexity, which is where the speed comes into play. This is really not an issue (so much) of different types, but rather of bypassing a large amount of complex code required to make pow function with any exponent.Absolutely.
First, of course, it depends entirely on the CPU architecture in question.
However, integral and floating-point types are handled very differently, so the following is nearly always the case:
On some CPUs, doubles may be significantly slower than floats. On some architectures, there is no dedicated hardware for doubles, and so they are handled by passing two float-sized chunks through, giving you a worse throughput and twice the latency. On others (the x86 FPU, for example), both types are converted to the same internal format 80-bit floating point, in the case of x86), so performance is identical. On yet others, both float and double have proper hardware support, but because float has fewer bits, it can be done a bit faster, typically reducing the latency a bit relative to double operations.
Disclaimer: all the mentioned timings and characteristics are just pulled from memory. I didn't look any of it up, so it may be wrong. ;)
For different integer types, the answer varies wildly depending on CPU architecture. The x86 architecture, due to its long convoluted history, has to support both 8, 16, 32 (and today 64) bit operations natively, and in general, they're all equally fast ( they use basically the same hardware, and just zero out the upper bits as needed).
However, on other CPUs, datatypes smaller than an
int
may be more costly to load/store (writing a byte to memory might have to be done by loading the entire 32-bit word it is located in, and then do bit masking to update the single byte in a register, and then write the whole word back). Likewise, for datatypes larger thanint
, some CPUs may have to split the operation into two, loading/storing/computing the lower and upper halves separately.But on x86, the answer is that it mostly doesn't matter. For historical reasons, the CPU is required to have pretty robust support for each and every data type. So the only difference you're likely to notice is that floating-point ops have more latency (but similar throughput, so they're not slower per se, at least if you write your code correctly)
No, not really. This of course depends on CPU and compiler, but the performance difference is typically negligible- if there even is any.
There is certainly a difference between floating point and integer arithmetic. Depending on the CPU's specific hardware and micro-instructions, you get different performance and/or precision. Good google terms for the precise descriptions (I don't know exactly either):
With regards to the size of the integers, it is best to use the platform/architecture word size (or double that), which comes down to an
int32_t
on x86 andint64_t
on x86_64. SOme processors might have intrinsic instructions that handle several of these values at once (like SSE (floating point) and MMX), which will speed up parallel additions or multiplications.Float vs. integer:
Historically, floating-point could be much slower than integer arithmetic. On modern computers, this is no longer really the case (it is somewhat slower on some platforms, but unless you write perfect code and optimize for every cycle, the difference will be swamped by the other inefficiencies in your code).
On somewhat limited processors, like those in high-end cell phones, floating-point may be somewhat slower than integer, but it's generally within an order of magnitude (or better), so long as there is hardware floating-point available. It's worth noting that this gap is closing pretty rapidly as cell phones are called on to run more and more general computing workloads.
On very limited processors (cheap cell phones and your toaster), there is generally no floating-point hardware, so floating-point operations need to be emulated in software. This is slow -- a couple orders of magnitude slower than integer arithmetic.
As I said though, people are expecting their phones and other devices to behave more and more like "real computers", and hardware designers are rapidly beefing up FPUs to meet that demand. Unless you're chasing every last cycle, or you're writing code for very limited CPUs that have little or no floating-point support, the performance distinction doesn't matter to you.
Different size integer types:
Typically, CPUs are fastest at operating on integers of their native word size (with some caveats about 64-bit systems). 32 bit operations are often faster than 8- or 16- bit operations on modern CPUs, but this varies quite a bit between architectures. Also, remember that you can't consider the speed of a CPU in isolation; it's part of a complex system. Even if operating on 16-bit numbers is 2x slower than operating on 32-bit numbers, you can fit twice as much data into the cache hierarchy when you represent it with 16-bit numbers instead of 32-bits. If that makes the difference between having all your data come from cache instead of taking frequent cache misses, then the faster memory access will trump the slower operation of the CPU.
Other notes:
Vectorization tips the balance further in favor of narrower types (
float
and 8- and 16-bit integers) -- you can do more operations in a vector of the same width. However, good vector code is hard to write, so it's not as though you get this benefit without a lot of careful work.Why are there performance differences?
There are really only two factors that effect whether or not an operation is fast on a CPU: the circuit complexity of the operation, and user demand for the operation to be fast.
(Within reason) any operation can be made fast, if the chip designers are willing to throw enough transistors at the problem. But transistors cost money (or rather, using lots of transistors makes your chip larger, which means you get fewer chips per wafer and lower yields, which costs money), so chip designers have to balance how much complexity to use for which operations, and they do this based on (perceived) user demand. Roughly, you might think of breaking operations into four categories:
high-demand, low-complexity operations will be fast on nearly any CPU: they're the low-hanging fruit, and confer maximum user benefit per transistor.
high-demand, high-complexity operations will be fast on expensive CPUs (like those used in computers), because users are willing to pay for them. You're probably not willing to pay an extra $3 for your toaster to have a fast FP multiply, however, so cheap CPUs will skimp on these instructions.
low-demand, high-complexity operations will generally be slow on nearly all processors; there just isn't enough benefit to justify the cost.
low-demand, low-complexity operations will be fast if someone bothers to think about them, and non-existent otherwise.
Further reading:
Generally, integer math is faster than floating-point math. This is because integer math involves simpler computations. However, in most operations we're talking about less than a dozen clocks. Not millis, micros, nanos, or ticks; clocks. The ones that happen between 2-3 billion times per second in modern cores. Also, since the 486 a lot of cores have a set of Floating-point Processing Units or FPUs, which are hard-wired to perform floating-point arithmetic efficiently, and often in parallel with the CPU.
As a result of these, though technically it's slower, floating-point calculations are still so fast that any attempt to time the difference would have more error inherent in the timing mechanism and thread scheduling than it actually takes to perform the calculation. Use ints when you can, but understand when you can't, and don't worry too much about relative calculation speed.