strcmp behaviour in 32-bit and 64-bit systems

2020-02-07 04:45发布

问题:

The following piece of code behaves differently in 32-bit and 64-bit operating systems.

char *cat = "v,a";
if (strcmp(cat, ",") == 1)
    ...

The above condition is true in 32-bit but false in 64-bit. I wonder why this is different? Both 32-bit and 64-bit OS are Linux (Fedora).

回答1:

The strcmp() function is only defined to return a negative value if argument 1 precedes argument 2, zero if they're identical, or a positive value if argument 1 follows argument 2.

There is no guarantee of any sort that the value returned will be +1 or -1 at any time. Any equality test based on that assumption is faulty. It is conceivable that the 32-bit and 64-bit versions of strcmp() return different numbers for a given string comparison, but any test that looks for +1 from strcmp() is inherently flawed.

Your comparison code should be one of:

if (strcmp(cat, ",") >  0)    // cat >  ","
if (strcmp(cat, ",") == 0)    // cat == ","
if (strcmp(cat, ",") >= 0)    // cat >= ","
if (strcmp(cat, ",") <= 0)    // cat <= ","
if (strcmp(cat, ",") <  0)    // cat <  ","
if (strcmp(cat, ",") != 0)    // cat != ","

Note the common theme — all the tests compare with 0. You'll also see people write:

if (strcmp(cat, ","))   // != 0
if (!strcmp(cat, ","))  // == 0

Personally, I prefer the explicit comparisons with zero; I mentally translate the shorthands into the appropriate longhand (and resent being made to do so).


Note that the specification of strcmp() says:

ISO/IEC 9899:2011 §7.24.4.2 The strcmp function

¶3 The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

It says nothing about +1 or -1; you cannot rely on the magnitude of the result, only on its signedness (or that it is zero when the strings are equal).



回答2:

Standard functions doesn't exhibit different behaviour based on the "bittedness" of your OS unless you're doing something silly like, for example, not including the relevant header file. They are required to exhibit exactly the behaviour specified in the standard, unless you violate the rules. Otherwise, your compiler, while close, will not be a C compiler.

However, as per the standard, the return value from strcmp() is either zero, positive or negative, it's not guaranteed to be +/-1 when non-zero.

Your expression would be better written as:

strcmp (cat, ",") > 0

The faultiness of using strcmp (cat, ",") == 1 has nothing to do with whether your OS is 32 or 64 bits, and everything to do with the fact you've misunderstood the return value. From the ISO C11 standard (my bold):

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.



回答3:

The semantics guaranteed by strcmp() are well explained above in Jonathan's answer.


Coming back to your original question i.e.

Q. Why strcmp() behaviour differs in 32-bit and 64-bit systems?

Answer: strcmp() is implemented in glibc, wherein there exist different implementations for various architectures, all highly optimised for the corresponding architecture.

  • strcmp() on x86
  • strcmp() on x86-64

As the spec simply defines that the the return value is one of 3 possibilities (-ve, 0, +ve), the various implementations are free to return any value as long as the sign indicates the result appropriately.

  • On certain architectures (in this case x86), it is faster to simply compare each byte without storing the result. Hence its quicker to simply return -/+1 on a mismatch.

    (Note that one could use subb instead of cmpb on x86 to obtain the difference in magnitude of the non-matching bytes. But this would require 1 additional clock cycle per byte. This would mean an addition 3% increase in total time taken as each complete iteration runs in less than 30 clock cycles.)

  • On other architectures (in this case x86-64), the difference between the byte values of the corresponding characters is already available as a by-product of the comparision. Hence it faster to simply return it rather than test them again and return -/+1.

Both are perfectly valid output as the strcmp() function is ONLY guaranteed to return the result using the proper sign and the magnitude is architecture/implementation specific.