What's wrong with strcmp?

2019-01-26 11:33发布

问题:

In the responses to the question Reading In A String and comparing it C, more than one person discouraged the use of strcmp(), saying things like

I also strongly, strongly advise you to get used to using strncmp() now, ... to avoid many problems down the road.

or (in Why does my string comparison fail? )

Make certain you use strncmp and not strcmp. strcmp is profoundly unsafe.

What problems are they alluding to?

The reason scanf() with string specifiers and gets() are strongly discouraged is because they almost inevitably lead to buffer overflow vulnerabilities. However, it's not possible to overflow a buffer with strcmp(), right?

"A buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory."

( -- Wikipedia: buffer overflow).

Since the strcmp() function never writes to any buffer, the strcmp() function cannot cause a buffer overflow, right?

What is the reason people discourage the use of strcmp(), and recommend strncmp() instead?

回答1:

While strncmp can prevent you from overrunning a buffer, its primary purpose isn't for safety. Rather, it exists for the case where one wants to compare only the first N characters of a (properly possibly NUL-terminated) string.

From the man page:

The strcmp() function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.

The strncmp() function is similar, except it compares the only first (at most) n bytes of s1 and s2.

Note that strncmp in this case cannot be replaced with a simple memcmp, because you still need to take advantage of its stop-on-NUL behavior, in case one of the strings is shorter than n.

If strcmp causes a buffer overrun, then one of two things is true:

  1. Your data isn't expected to be NUL-terminated, and you should be using memcmp instead.
  2. Your data is expected to be NUL-terminated, but you've already screwed up when you populated the buffer, by somehow not NUL-terminating it.

Note that reading past the end of a buffer is still considered a buffer overrun. While it may seem harmless, it can be just as dangerous as writing past the end.

Reading, writing, executing... it doesn't matter. Any memory reference to an unintended address is undefined behavior. In the most apparent scenario, you attempt to access a page that isn't mapped into your process's address space, causing a page fault, and subsequent SIGSEGV. In the worst case, you sometimes run into a \0 byte, but other times you run into some other buffer, causing inconstant program behavior.



回答2:

A string is by definition "a contiguous sequence of characters terminated by and including the first null character".

The only case where strncmp() would be safer than strcmp() is when you're comparing two character arrays as strings, you're certain that both arrays are at least n bytes long (the 3rd argument passed to strncmp()), and you're not certain that both arrays contain strings (i.e., contain a '\0' null character terminator).

In most cases, your code (if it's correct) will guarantee that any arrays that are supposed to contain null-terminated strings actually do contain null-terminated strings.

That added n in strncmp() is not a magic wand that makes unsafe code safe. It doesn't guard against null pointers, uninitialized pointers, uninitialized arrays, an incorrect value of n, or just passing incorrect data. You can shoot yourself in the foot with either function.

And if you're trying to call strcmp or strncmp with an array that you thought contained a null-terminated string but actually doesn't, then your code already has a bug. Using strncmp() might help you avoid the immediate symptom of that bug, but it won't fix it.



回答3:

strcmp compares two strings character to character until a difference has been detected or the \0 is found at one of them.

On the other hand, strncmp provides a way to limit the number of characters to be compared so if the strings do not end with \0 the function won't continue checking after the size limit has been reached.

Imagine what would happen if you are comparing two strings at this two memory regions:

0x40, 0x41, 0x42,... 0x40, 0x41, 0x42,...

And you are only interested in the two first characters. Somehow \0 has been removed from the end of the strings and the third byte happens to coincide at the two regions. strncmp would avoid comparing this third byte if num parameter is 2.

EDIT As the comments below indicate, this situation is derived from a wrong or very concrete use of the language.