Microsoft's strncat reads bytes beyond source

2019-04-28 23:05发布

问题:

I observe an interesting problem with the Microsoft implementation of strncat. It touches 1 byte beyond the source buffer. Consider the following code:

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>

void main()
{
    char dstBuf[1024];
    char* src = malloc(112);
    memset(src, 'a', 112);
    dstBuf[0] = 0;
    strncat(dstBuf, src, 112);
}

strncat reads 1 byte after 112 byte block. So if you are unlucky enough to get allocation on an invalid page boundary, your application crashes. Large applications can crash intermittently in such places. (Note that such condition can be simulated with gflags PageHeap setting; block size has to be divisible by pointer size for proper alignment.)

Is this the expected behavior or a bug? Any links confirming that? (I read several descriptions of strncat but they can be interpreted both ways depending on your initial set of mind...)

Update (to answer questions about evidence): I apologize if it is not clear from the text above, but this is an experimental fact. I observe intermittent crashes in an application at strncat reading address src+srcBufSize. In this small example run with gflags PageHeap on crash reproduces consistently (100%). So as far as I can see the evidence is very solid.

Update2 (info on compiler) MS Visual Studio 2005 Version 8.0.50727.867. Build platform: 64 bit release (no repro for 32 bit). OS used to repro the crash: Windows Server 2008 R2.

Update 3 The problem also reproduces with a binary built in MS Visual Studio 2012 11.0.50727.1

Update 4 Link to issue on Microsoft Connect; link to discussion on MSDN Forums

Update 5 The problem will be fixed in the next VS release. No fix is planned for old versions. See the "Microsoft Connect" link above.

回答1:

The documentation for strncat states:

src - pointer to the null-terminated byte string to copy from

Therefore, the implementation can assume that the src input parameter is in fact NUL-terminated, even if it is longer than count characters.

For further confirmation, Microsoft's own documentation states:

strSource

Null-terminated source string.

On the other hand, the actual C standard states something like:

The strncat function appends not more than n characters (a null character and characters that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1.

As pointed out in the comments below, this identifies the second parameter s2 as an array and not a NUL-terminated string. However, this is still ambiguous with respect to the original question, because this documentation describes the ultimate effect on s1, rather than the behaviour of the function when reading from s2.

This could of course be settled with respect to the specific Microsoft implementation by consulting the C Runtime Library source code.



回答2:

s2 is not a "string" in strncat(s1, s2, n).

So if Microsoft is reading pass n bytes, it is not C11 compliant.

C11 7.24.2.3.1 strcat() mentions
"appends a copy of the string pointed to by s2 (including the terminating null character) to the end of the string pointed to by s1".

C11 7.24.2.3.2 strncat says
"The strncat function appends not more than n characters (a null character and characters that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1. ... A terminating null character is always appended to the result"

Clearly in the strncat case, s2 is viewed as an "array" with a string-like limitations on how much is appended to s1. Thus during the concatenation, this is no need to inspect s2 more than what is absolutely needed. The final written \0 comes from code, not s2.

Don't know about the older C99 standard.



回答3:

English is an imperfect language, more so than C.

The documentation says "at most n characters" (my emphasis). There is no evidence to indicate that strncat copies more than 112 characters. What makes you believe it does?

The code of strncat might index past an offset of 112, but not actually reference offset 113 which could cause a storage fault. This ptr behavior is defined as acceptable in K&R.

Finally, again this is an English/reasoning problem, the documentation probably does say null terminated string. But really, isn't it redundant to say a string is null terminated? They are by definition, otherwise they would be an array of characters. So, the documentation is being vague and non-specific. The programmer is left to read between the lines. Software documentation are not legal tomes, they are descriptions that are meant to be understood by someone practiced in the art.



标签: c pageheap