Is it possible to read null characters correctly u

2019-07-14 05:06发布

问题:

Suppose I want to read from stdin, and let the user input strings that contain null characters. Is this possible with string-input functions like fgets or gets_s? Or do I have to use e.g. fgetc or fread?

Someone here wanted to do this.

回答1:

Is it possible to read null characters correctly using fgets or gets_s?

As some of the other answers show, the answer is apparently "Yes -- just barely." Similarly, it is possible to hammer in nails using a screwdriver. Similarly, it is possible to write (what amounts to) BASIC or FORTRAN code in C.

But none of these things is remotely a good idea. Use the right tool for the job. If you want to drive nails, use a hammer. If you want to write BASIC or FORTRAN, use a BASIC interpreter or a FORTRAN compiler. And if you want to read binary data that might contain null characters, use fread (or maybe getc). Don't use fgets, because its interface was never designed for this task.



回答2:

For fgets, yes. fgets is specified to behave as if by repeated fgetc and storage of the resulting characters into the array. No special provision is made for the null character, except that, in addition to the characters read, a null character is stored at the end (after the last character).

To successfully distinguish embedded null characters from the termination, however, requires some work.

First, prefill your buffer with '\n' (e.g. using memset). Now, when fgets returns, look for the first '\n' in the buffer (e.g. using memchr).

  • If there is no '\n', fgets stopped due to filling up the output buffer, and everything but the last byte (null terminator) is data that was read from the file.

  • If the first '\n' is immediately followed by a '\0' (null termination), fgets stopped due to reaching the newline, and everything up through that newline was read from the file.

  • If the first '\n' is not followed by a '\0' (either at the end of the buffer, or followed by another '\n') then fgets stopped due to EOF or error, and everything up to the byte just before the '\n' (which is necessarily a '\0') but not including it, was read from the file.

For gets_s, I have no idea, and I would strongly recommend against using it. The only widely-implemented version of the Annex K "*_s" functions, Microsoft's, does not even comply to the specifications they pushed into an annex of the C standard, and reportedly has issues that might make this approach not work.



回答3:

Is it possible to read null characters correctly using fgets or gets_s?

Not truly.

fgets() is not specified to leave the rest of the buffer alone (after the appended '\0'), so pre-loading the buffer for post analyses is not specified to work.

In the read error case, the buffer is specified as "array contents are indeterminate", yet that case can be eliminated from further concern by checking the return value.

If not for that, then doing the various test like suggested by @R.. will work.

  char buf[80];
  int length = 0;
  memset(buf, sizeof buf, '\n');
  // Check return value before reading `buf`.
  if (fgets(buf, sizeof buf, stdin)) {
    // The buffer should end with a \0 and 0 to 78 \n
    // Starting at the end, look for the first non-\n
    int i = sizeof buf - 1;
    while (i > 0) {
      if (buf[i] != '\n') {
        if (buf[i] == '\0') {
          // found appended null
          length = i;
        } else {
          length = -1;  // indeterminent length
        }
        break;
      }
      i--;
    }
    if (i == 0) {
      // entire buffer was \n
      length = -1;  // indeterminent length
    }
  }

fgets() is just not fully up to the job to read user input that may contain null characters. It remains a hole in C.

I've tried to code this fgets() Alternative, though I am not fully satisfied with it.



回答4:

There's a way to reliably detect the presence of \0 characters read by fgets(3) but it's far very inefficient. To reliably detect that there's a null character read from the input stream, you have to first fill the buffer with non null characters. The reason for this is that fgets() delimit it's input by placing a \0 at the end of the input and (it's supposed to) doesn't write anything else past that character.

Well, after filling the input buffer with, let's say, \001 chars, call fgets() on your buffer, and begint searching from the end of the buffer backwards for a \0 character: That's the end of the input buffer. No need to check the character before (the only case for it not to be a \n is if the last character is a \0 and the input line was longer than the space in the buffer for a complete, nul terminated string, or a bogus implementation of fgets(3) (there are some). From the beginning you can have as many \0s as can be, but don't worry, they are from the input stream.

As you see, this is quite inefficient.

#define NON_ZERO         1
#define BOGUS_FGETS      -2 /* -1 is used by EOF */

/**
 * variant of fgets that returns the number of characters actually read */
ssize_t variant_of_fgets(const char *buffer, const size_t sz, FILE *in)
{
    /* set buffer to non zero value */
    memset(buffer, NON_ZERO, sz);

    /* do actual fgets */
    if (!fgets(buffer, sizeof buffer, stdin)) {
        /* EOF */
        return EOF;
    }
    char *p = buffer + sizeof buffer; 
    while (--p >= buffer)
        if (!*p) 
            break; /* if char is a \0 we're out */
    /* ASSERT: (p < buffer)[not-found] || (p >= buffer)[found] */
    if (p <= buffer) { 
        /* Why do we check for p <= buffer ?
         * p must be > buffer, as if p == buffer
         * the implementation must be also bogus, because
         * the returned string should be an empty string "".
         * this can happen only with a bogus implementation
         * or an absurd buffer of length one (with only place for
         * the \0 char).  Else, it must be a read character
         * (it can be a \0, but then it must have another \0 
         * behind, and p must be greater than this) */
        return BOGUS_FGETS;
    }
    /* ASSERT: p > buffer && p < buffer + sz  [found a \0] 
     * p points to the position of the last \0 in the buffer */ 

    return p - buffer;  /* this is the string length */
} /* variant_of_fgets */ 

Example

The following sample code will illustrate the thing, first an execution example:

$ pru
===============================================
<OFFSET> : pru.c:24:main: buffer initial contents
00000000 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
00000010 : e0 dd cf eb 02 56 00 00 e0 d7 cf eb 02 56 00 00 : .....V.......V..
00000020
<OFFSET> : pru.c:30:main: buffer after memset
00000000 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................
00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................
00000020
^@^@^@^@^D^D
<OFFSET> : pru.c:41:main: buffer after fgets(returned size should be 4)
00000000 : 00 00 00 00 00 fa fa fa fa fa fa fa fa fa fa fa : ................
00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................
00000020
===============================================
<OFFSET> : pru.c:24:main: buffer initial contents
00000000 : 00 00 00 00 00 fa fa fa fa fa fa fa fa fa fa fa : ................
00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................
00000020
<OFFSET> : pru.c:30:main: buffer after memset
00000000 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................
00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................
00000020
^D
<OFFSET> : pru.c:41:main: buffer after fgets(returned size should be 0)
00000000 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................
00000010 : fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa : ................
00000020
===============================================
pru.c:45:main: END OF PROGRAM
$ _

Makefile

RM ?= rm -f

targets = pru
toclean += $(targets)

all: $(targets)
clean:
    $(RM) $(toclean)

pru_objs = pru.o fprintbuf.o
toclean += $(pru_objs)

pru: $(pru_objs)
    $(CC) -o $@ $($@_objs)

pru.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

#include "fprintbuf.h"

#define F(fmt) __FILE__":%d:%s: " fmt, __LINE__, __func__

void line()
{
    puts("===============================================");
}
int main()
{
    uint8_t buffer[32];
    int eof;

    line();
    do {
        fprintbuf(stdout,
                buffer, sizeof buffer, 
                F("buffer initial contents"));

        memset(buffer, 0xfa, sizeof buffer);

        fprintbuf(stdout,
                buffer, sizeof buffer, 
                F("buffer after memset"));

        eof = !fgets(buffer, sizeof buffer, stdin);

        /* search for the last \0 */
        uint8_t *p = buffer + sizeof buffer;
        while (*--p && (p > buffer))
            continue;

        if (p <= buffer)
            printf(F("BOGUS implementation"));

        fprintbuf(stdout,
                buffer, sizeof buffer,
                F("buffer after fgets(size should be %u)"),
                p - buffer);
        line();
    } while(!eof);
}

with auxiliary function, to print the buffer contents:

fprintbuf.h

/* $Id: fprintbuf.h,v 2.0 2005-10-04 14:54:49 luis Exp $
 * Author: Luis Colorado <Luis.Colorado@HispaLinux.ES>
 * Date: Thu Aug 18 15:47:09 CEST 2005
 *
 * Disclaimer:
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *  
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *  
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, write to the Free Software
 *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 */
#ifndef FPRINTBUF_H
#define FPRINTBUF_H

#ifdef __cplusplus
extern "C" {
#endif /* __cplusplus */

#include <stdio.h>
#include <stdint.h>

size_t fprintbuf (
    FILE               *f,      /* fichero de salida */
    const uint8_t      *b,      /* puntero al buffer */
    size_t              t,      /* tamano del buffer */
    const char         *fmt,    /* rotulo de cabecera */
                        ...);

#ifdef __cplusplus
} /* extern "C" */
#endif /* __cplusplus */

#endif /* FPRINTBUF_H */

fprintbuf.c

/* $Id: fprintbuf.c,v 2.0 2005-10-04 14:54:49 luis Exp $
 * AUTHOR: Luis Colorado <licolorado@indra.es>
 * DATE: 7.10.92.
 * DESC: muestra un buffer de datos en hexadecimal y ASCII.
 */

#include <sys/types.h>
#include <ctype.h>
#include <stdio.h>
#include <stdarg.h>
#include "fprintbuf.h"

#define     TAM_REG         16

size_t
fprintbuf(
        FILE           *f,      /* fichero de salida */
        const uint8_t  *b,      /* puntero al buffer */
        size_t          t,      /* tamano del buffer */
        const char     *fmt,    /* rotulo de cabecera */
                        ...)
{
    size_t off, i;
    uint8_t c;
    va_list lista;
    size_t escritos = 0;

    if (fmt)
            escritos += fprintf (f, "<OFFSET> : ");
    va_start (lista, fmt);
    escritos += vfprintf (f, fmt, lista);
    va_end (lista);
    escritos += fprintf (f, "\n");
    off = 0;
    while (t > 0) {
            escritos += fprintf (f, "%08lx : ", off);
            for (i = 0; i < TAM_REG; i++) {
                    if (t > 0)
                            escritos += fprintf (f, "%02x ", *b);
                    else escritos += fprintf (f, "   ");
                    off++;
                    t--;
                    b++;
            }
            escritos += fprintf (f, ": ");
            t += TAM_REG;
            b -= TAM_REG;
            off -= TAM_REG;
            for (i = 0; i < TAM_REG; i++) {
                    c = *b++;
                    if (t > 0)
                            if (isprint (c))
                                    escritos += fprintf (f, "%c", c);
                            else    escritos += fprintf (f, ".");
                    else break;
                    off++;
                    t--;
            }
            escritos += fprintf (f, "\n");
    }
    escritos += fprintf (f, "%08lx\n", off);

    return escritos;
} /* fprintbuf */