可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have heard a teacher drop this once, and it has been bugging me ever since. Let's say we want to check if the integer x
is bigger than or equal to 0. There are two ways to check this:
if (x > -1){
//do stuff
}
and
if (x >= 0){
//do stuff
}
According to this teacher >
would be slightly faster then >=
. In this case it was Java, but according to him this also applied for C, c++ and other languages. Is there any truth to this statement?
回答1:
There's no difference in any real-world sense.
Let's take a look at some code generated by various compilers for various targets.
- I'm assuming a signed int operation (which seem the intent of the OP)
- I've limited by survey to C and to compilers that I have readily at hand (admittedly a pretty small sample - GCC, MSVC and IAR)
- basic optimizations enabled (
-O2
for GCC, /Ox
for MSVC, -Oh
for IAR)
using the following module:
void my_puts(char const* s);
void cmp_gt(int x)
{
if (x > -1) {
my_puts("non-negative");
}
else {
my_puts("negative");
}
}
void cmp_gte(int x)
{
if (x >= 0) {
my_puts("non-negative");
}
else {
my_puts("negative");
}
}
And here's what each of them produced for the comparison operations:
MSVC 11 targeting ARM:
// if (x > -1) {...
00000 |cmp_gt| PROC
00000 f1b0 3fff cmp r0,#0xFFFFFFFF
00004 dd05 ble |$LN2@cmp_gt|
// if (x >= 0) {...
00024 |cmp_gte| PROC
00024 2800 cmp r0,#0
00026 db05 blt |$LN2@cmp_gte|
MSVC 11 targeting x64:
// if (x > -1) {...
cmp_gt PROC
00000 83 f9 ff cmp ecx, -1
00003 48 8d 0d 00 00 // speculative load of argument to my_puts()
00 00 lea rcx, OFFSET FLAT:$SG1359
0000a 7f 07 jg SHORT $LN5@cmp_gt
// if (x >= 0) {...
cmp_gte PROC
00000 85 c9 test ecx, ecx
00002 48 8d 0d 00 00 // speculative load of argument to my_puts()
00 00 lea rcx, OFFSET FLAT:$SG1367
00009 79 07 jns SHORT $LN5@cmp_gte
MSVC 11 targeting x86:
// if (x > -1) {...
_cmp_gt PROC
00000 83 7c 24 04 ff cmp DWORD PTR _x$[esp-4], -1
00005 7e 0d jle SHORT $LN2@cmp_gt
// if (x >= 0) {...
_cmp_gte PROC
00000 83 7c 24 04 00 cmp DWORD PTR _x$[esp-4], 0
00005 7c 0d jl SHORT $LN2@cmp_gte
GCC 4.6.1 targeting x64
// if (x > -1) {...
cmp_gt:
.seh_endprologue
test ecx, ecx
js .L2
// if (x >= 0) {...
cmp_gte:
.seh_endprologue
test ecx, ecx
js .L5
GCC 4.6.1 targeting x86:
// if (x > -1) {...
_cmp_gt:
mov eax, DWORD PTR [esp+4]
test eax, eax
js L2
// if (x >= 0) {...
_cmp_gte:
mov edx, DWORD PTR [esp+4]
test edx, edx
js L5
GCC 4.4.1 targeting ARM:
// if (x > -1) {...
cmp_gt:
.fnstart
.LFB0:
cmp r0, #0
blt .L8
// if (x >= 0) {...
cmp_gte:
.fnstart
.LFB1:
cmp r0, #0
blt .L2
IAR 5.20 targeting an ARM Cortex-M3:
// if (x > -1) {...
cmp_gt:
80B5 PUSH {R7,LR}
.... LDR.N R1,??DataTable1 ;; `?<Constant "non-negative">`
0028 CMP R0,#+0
01D4 BMI.N ??cmp_gt_0
// if (x >= 0) {...
cmp_gte:
80B5 PUSH {R7,LR}
.... LDR.N R1,??DataTable1 ;; `?<Constant "non-negative">`
0028 CMP R0,#+0
01D4 BMI.N ??cmp_gte_0
If you're still with me, here are the differences of any note between evaluating (x > -1)
and (x >= 0)
that show up:
- MSVC targeting ARM uses
cmp r0,#0xFFFFFFFF
for (x > -1)
vs cmp r0,#0
for (x >= 0)
. The first instruction's opcode is two bytes longer. I suppose that may introduce some additional time, so we'll call this an advantage for (x >= 0)
- MSVC targeting x86 uses
cmp ecx, -1
for (x > -1)
vs test ecx, ecx
for (x >= 0)
. The first instruction's opcode is one byte longer. I suppose that may introduce some additional time, so we'll call this an advantage for (x >= 0)
Note that GCC and IAR generated identical machine code for the two kinds of comparison (with the possible exception of which register was used). So according to this survey, it appears that (x >= 0)
has an ever-so-slight chance of being 'faster'. But whatever advantage the minimally shorter opcode byte encoding might have (and I stress might have) will be certainly completely overshadowed by other factors.
I'd be surprised if you found anything different for the jitted output of Java or C#. I doubt you'd find any difference of note even for a very small target like an 8 bit AVR.
In short, don't worry about this micro-optimization. I think my write up here has already spent more time than will be spent by any difference in the performance of these expressions accumulated across all the CPUs executing them in my lifetime. If you have the capability to measure the difference in performance, please apply your efforts to something more important like studying the behavior of sub-atomic particles or something.
回答2:
It is very much dependent on the underlying architecture, but any difference will be minuscule.
If anything, I'd expect (x >= 0)
to be slightly faster, as comparison with 0
comes for free on some instruction sets (such as ARM).
Of course, any sensible compiler will choose the best implementation regardless of which variant is in your source.
回答3:
Your teacher has been reading some really old books. It used to be the case with some architectures lacking the greater than or equal
instruction that evaluating >
required fewer machine cycles than >=
, but these platforms are rare these days. I suggest going for readability, and using >= 0
.
回答4:
A bigger concern here is premature optimisation. Many consider writing readable code more important than writing efficient code [1, 2]. I would apply these optimisations as a last stage in a low level library once the design has been proven to work.
You shouldn't be constantly considering making minuscule optimisations in your code at the cost of readability, since it'll make reading and maintaing the code harder. If these optimisations need to take place, abstract them into lower level functions so you're still left with code that's easier to read for humans.
As a crazy example, consider someone who writes their programs in assembly to someone who's willing to forgo that extra efficiency and use Java for its benefits in design, ease of use and maintainability.
As a side note, if you're using C, perhaps writing a macro which uses the slightly more efficient code is a more feasible solution, since it will achieve efficiency, readability and maintainability more than scattered operations.
And of course the tradeoffs of efficiency and readability depend on your application. If that loop is running 10000 times a second then it's a possibly bottleneck and you may want to invest time in optimising it, but if it's a single statement that's called occasionally it's probably not worth it for the minute gain.
回答5:
Yes, there is a difference, you should see the bytecode.
for
if (x >= 0) {
}
the bytecode is
ILOAD 1
IFLT L1
for
if (x > -1) {
}
the bytecode is
ILOAD 1
ICONST_M1
IF_ICMPLE L3
version 1 is faster, because it uses a special zero operand operation
iflt : jump if less than zero
But it is possible to see the difference only running JVM in interpret-only mode java -Xint ...
, eg this Test
int n = 0;
for (;;) {
long t0 = System.currentTimeMillis();
int j = 0;
for (int i = 100000000; i >= n; i--) {
j++;
}
System.out.println(System.currentTimeMillis() - t0);
}
shows 690 ms for n = 0 and 760 ms for n = 1. (I used 1 instead of -1 because it's easier to demonstrate, the idea stays the same)
回答6:
In fact I believe the second version should be slightly faster as it requires a single bit check(assuming you compare at zero as you show above). However such optimizations never really show as most compilers will optimize such calls.
回答7:
">=" is single operation, just like ">". Not 2 separate operations with OR.
But >=0 is probably faster, because computer need to check only one bit (negative sign).
回答8:
According to this teacher > would be slightly faster then >=. In this
case it was Java, but according to him this also applied for C, c++
and other languages. Is there any truth to this statement?
Your teacher is fundamentally wrong.
Not only why chance are than comparing with 0 can be sligly fast, but because this sort of local optimization are well done by your compiler / interpreter, and you can mess all trying to help. Definitively not a good thing to teach.
You can read:
this or this
回答9:
Sorry to barge in on this conversation about performance.
Before I digress, let's note that the JVM has special instructions for handling not only zero, but also constants one through three. With this said, it's likely that the ability of the architecture to handle zero is long lost behind more than compiler optimization, but also bytecode to machine code translation and the such.
I remember from my x86 assembler language days that there were instructions in the set for both greater than (ja
) and greater than or equal to (jae
). You would do one of these:
; x >= 0
mov ax, [x]
mov bx, 0
cmp ax, bx
jae above
; x > -1
mov ax, [x]
mov bx, -1
cmp ax, bx
ja above
These alternatives take the same amount of time, because the instructions are identical or similar, and they consume a predictable number of clock cycles. See, for example, this. ja
and jae
may indeed check a different number of arithmetic registers, but that check is dominated by the need for the instruction to take a predictable time. This in turn is needed to keep the CPU architecture manageable.
But I did come here to digress.
The answers before me tend to be pertinent, and also indicative that you're gonna be in the same ballpark insofar as performance is concerned, regardless of which approach you choose.
Which leaves you with choosing based on other criteria. And this is where I wanted to make a note. When testing indices, prefer the tight bound style check, chiefly x >= lowerBound
, to the x > lowerBound - 1
. The argument is bound to be contrived, but it boils down to readability, as here all else truly is equal.
Since conceptually you're testing against a lower bound, x >= lowerBound
is the canonical test that elicits the most adapted cognition from readers of your code. x + 10 > lowerBound + 9
, x - lowerBound >= 0
, and x > -1
are all roundabout ways to test against a lower bound.
Again, sorry to barge in, but I felt like this was important beyond the academics of things. I always think in these terms and let the compiler worry about the minute optimizations that it thinks it can get out of fiddling with the constants and the strictness of the operators.
回答10:
First of all it highly depends on hardware platform.
For modern PCs and ARM SoCs difference rely mostly on compiler optimisations.
But for CPUs without FPU, signed math would be disaster.
For example simple 8-bit CPUs such as Intel 8008, 8048,8051, Zilog Z80, Motorola 6800 or even modern RISC PIC or Atmel microcontollers do all math via ALU with 8-bit registers and have basically only carry flag bit and z (zero value indicator) flag bits . All serious math is done via libraries, and expression
BYTE x;
if (x >= 0)
would definitely win, using JZ or JNZ asm instructions without very costly library calls.