Suppose I want to read from stdin
, and let the user input strings that contain null characters. Is this possible with string-input functions like fgets
or gets_s
? Or do I have to use e.g. fgetc
or fread
?
Someone here wanted to do this.
Suppose I want to read from stdin
, and let the user input strings that contain null characters. Is this possible with string-input functions like fgets
or gets_s
? Or do I have to use e.g. fgetc
or fread
?
Someone here wanted to do this.
Not truly.
fgets()
is not specified to leave the rest of the buffer alone (after the appended'\0'
), so pre-loading the buffer for post analyses is not specified to work.In the read error case, the buffer is specified as "array contents are indeterminate", yet that case can be eliminated from further concern by checking the return value.
If not for that, then doing the various test like suggested by @R.. will work.
fgets()
is just not fully up to the job to read user input that may contain null characters. It remains a hole in C.I've tried to code this fgets() Alternative, though I am not fully satisfied with it.
As some of the other answers show, the answer is apparently "Yes -- just barely." Similarly, it is possible to hammer in nails using a screwdriver. Similarly, it is possible to write (what amounts to) BASIC or FORTRAN code in C.
But none of these things is remotely a good idea. Use the right tool for the job. If you want to drive nails, use a hammer. If you want to write BASIC or FORTRAN, use a BASIC interpreter or a FORTRAN compiler. And if you want to read binary data that might contain null characters, use
fread
(or maybegetc
). Don't usefgets
, because its interface was never designed for this task.There's a way to reliably detect the presence of
\0
characters read byfgets(3)
but it's far very inefficient. To reliably detect that there's a null character read from the input stream, you have to first fill the buffer with non null characters. The reason for this is thatfgets()
delimit it's input by placing a\0
at the end of the input and (it's supposed to) doesn't write anything else past that character.Well, after filling the input buffer with, let's say,
\001
chars, callfgets()
on your buffer, and begint searching from the end of the buffer backwards for a\0
character: That's the end of the input buffer. No need to check the character before (the only case for it not to be a\n
is if the last character is a\0
and the input line was longer than the space in the buffer for a complete, nul terminated string, or a bogus implementation offgets(3)
(there are some). From the beginning you can have as many\0
s as can be, but don't worry, they are from the input stream.As you see, this is quite inefficient.
Example
The following sample code will illustrate the thing, first an execution example:
Makefile
pru.c
with auxiliary function, to print the buffer contents:
fprintbuf.h
fprintbuf.c
For
fgets
, yes.fgets
is specified to behave as if by repeatedfgetc
and storage of the resulting characters into the array. No special provision is made for the null character, except that, in addition to the characters read, a null character is stored at the end (after the last character).To successfully distinguish embedded null characters from the termination, however, requires some work.
First, prefill your buffer with
'\n'
(e.g. usingmemset
). Now, whenfgets
returns, look for the first'\n'
in the buffer (e.g. usingmemchr
).If there is no
'\n'
,fgets
stopped due to filling up the output buffer, and everything but the last byte (null terminator) is data that was read from the file.If the first
'\n'
is immediately followed by a'\0'
(null termination),fgets
stopped due to reaching the newline, and everything up through that newline was read from the file.If the first
'\n'
is not followed by a'\0'
(either at the end of the buffer, or followed by another'\n'
) thenfgets
stopped due to EOF or error, and everything up to the byte just before the'\n'
(which is necessarily a'\0'
) but not including it, was read from the file.For
gets_s
, I have no idea, and I would strongly recommend against using it. The only widely-implemented version of the Annex K "*_s" functions, Microsoft's, does not even comply to the specifications they pushed into an annex of the C standard, and reportedly has issues that might make this approach not work.