I am getting an invalid read error when the src string ends with \n
, the error disappear when i remove \n
:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (void)
{
char *txt = strdup ("this is a not socket terminated message\n");
printf ("%d: %s\n", strlen (txt), txt);
free (txt);
return 0;
}
valgrind output:
==18929== HEAP SUMMARY:
==18929== in use at exit: 0 bytes in 0 blocks
==18929== total heap usage: 2 allocs, 2 frees, 84 bytes allocated
==18929==
==18929== All heap blocks were freed -- no leaks are possible
==18929==
==18929== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==18929==
==18929== 1 errors in context 1 of 1:
==18929== Invalid read of size 4
==18929== at 0x804847E: main (in /tmp/test)
==18929== Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd
==18929== at 0x402A17C: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==18929== by 0x8048415: main (in /tmp/test)
==18929==
==18929== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
How to fix this without sacrificing the new line character?
It's not about the newline character, nor the printf format specifier. You've found what is arguably a bug in
strlen()
, and I can tell you must be using gcc.Your program code is perfectly fine. The printf format specifier could be a little better, but it won't cause the valgrind error you are seeing. Let's look at that valgrind error:
"Invalid read of size 4" is the first message we must understand. It means that the processor ran an instruction which would load 4 consecutive bytes from memory. The next line indicates that the address attempted to be read was "Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd."
With this information, we can figure it out. First, if you replace that
'\n'
with a'$'
, or any other character, the same error will be produced. Try it.Secondly, we can see that your string has 40 characters in it. Adding the
\0
termination character brings the total bytes used to represent the string to 41.Because we have the message "Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd," we now know everything about what is going wrong.
strdup()
allocated the correct amount of memory, 41 bytes.strlen()
attempted to read 4 bytes, starting at the 40th, which would extend to a non-existent 43rd byte.This is a glib() bug. Once upon a time, a project called Tiny C Compiler (TCC) was starting to take off. Coincidentally, glib was completely changed so that the normal string functions, such as
strlen()
no longer existed. They were replaced with optimized versions which read memory using various methods such as reading four bytes at a time. gcc was changed at the same time to generate calls to the appropriate implementations, depending on the alignment of the input pointer, the hardware compiled for, etc. The TCC project was abandoned when this change to the GNU environment made it so difficult to produce a new C compiler, by taking away the ability to use glib for the standard library.If you report the bug, glib maintainers probably won't fix it. The reason is that under practical use, this will likely never cause an actual crash. The
strlen
function is reading bytes 4 at a time because it sees that the addresses are 4-byte aligned. It's always possible to read 4 bytes from a 4-byte-aligned address without segfaulting, given that reading 1 byte from that address would succeed. Therefore, the warning from valgrind doesn't reveal a potential crash, just a mismatch in assumptions about how to program. I consider valgrind technically correct, but I think there is zero chance that glib maintainers will do anything to squelch the warning.The error message seems to indicate that it's
strlen
that read past themalloc
ed buffer allocated bystrdup
. On a 32-bit platform, an optimalstrlen
implementation could read 4 bytes at a time into a 32-bit register and do some bit-twiddling to see if there's a null byte in there. If near the end of the string, there are less than 4 bytes left, but 4 bytes are still read to perform the null byte check, then I could see this error getting printed. In that case, presumably thestrlen
implementer would know if it's "safe" to do this on the particular platform, in which case the valgrind error is a false positive.