fgets() call with redirection get abnormal data st

2019-02-17 10:43发布

I was about to write a shell with C language. Here is the source code below:

#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/wait.h>
#include <stdlib.h>

int
getcmd(char *buf, int nbuf)
{
  memset(buf, 0, nbuf);
  fgets(buf, nbuf, stdin);
  printf("pid: %d, ppid: %d\n", getpid(), getppid());
  printf("buf: %s", buf);
  if(buf[0] == 0) {// EOF
    printf("end of getcmd\n");
    return -1;
  }
  return 0;
}

int
main(void)
{
  static char buf[100];
  int fd, r, ret;

  // Read and run input commands.
  while((ret = getcmd(buf, sizeof(buf))) >= 0){
    if(fork() == 0)
      exit(0);
    wait(&r);
  }
  exit(0);
}

When I execute the compiled executable with redirection of stdin to a file named t.sh which the content of is "1111\n2222\n" like ./myshell < t.sh, the output is:

pid: 2952, ppid: 2374
buf: 1111
pid: 2952, ppid: 2374
buf: 2222
pid: 2952, ppid: 2374
buf: 2222
pid: 2952, ppid: 2374
buf: end of getcmd

Obviously, function getcmd() get 3 lines(1111, 2222, 2222), while there are only 2 lines in t.sh. And these situation get even worse when putting more lines in t.sh.

And the main process is the only process execute getcmd, which we can tell by the output of pid.

By the way, I find if the line of code wait(&r) is removed, the output can get normal.

1条回答
该账号已被封号
2楼-- · 2019-02-17 11:24

The wait ensures that the child process gets time to run before the parent finishes with the file. If I strace the file under Linux, I get

% strace -f ./a.out
[lots of stuff]
wait4(-1, strace: Process 29317 attached
 <unfinished ...>
[pid 29317] lseek(0, -2, SEEK_CUR)      = 0
[pid 29317] exit_group(0)               = ?
[pid 29317] +++ exited with 0 +++
<... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 29317
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=29317, si_uid=1000, si_status=0
    _utime=0, si_stime=0} ---
[lots of stuff]

The child process rewinds the standard input as one of the first operations after the fork, after which it will promptly exit. Specifically it rewinds back as many bytes from the stream as was read into it by fgets into buffer but still unused. libc does that automatically after the fork. I also saw the child process flushing the stdout.

I am not sure what to think about this... but clearly if you want to write a shell, you mustn't interact with the standard streams with <stdio.h> at all. If the lseek didn't occur, then the child process would see up to 4095 bytes of the stdin being skipped! You must always use just read and write from <unistd.h> instead. Alternatively, you might have some luck with adding the following call into the beginning of main before anything is read from stdin:

if (setvbuf(stdin, NULL, _IONBF, 0) != 0) {
    perror("setvbuf:");
   exit(1);
}

This will set the stdin stream to unbuffered mode, so it shouldn't read too much. Nevertheless, the Linux manual page for fgets say:

It is not advisable to mix calls to input functions from the stdio library with low-level calls to read(2) for the file descriptor associated with the input stream; the results will be undefined and very probably not what you want.

BTW, this cannot be reproduced if stdin comes from a pipe instead:

% echo -e '1\n2' | ./a.out  
pid: 498, ppid: 21285
buf: 1
pid: 498, ppid: 21285
buf: 2
pid: 498, ppid: 21285
buf: end of getcmd

But naturally that makes the other problem visible - that the child sees input being skipped.


P.S.

You never check the return value of fgets so you do not know when a read error occurs.

If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

查看更多
登录 后发表回答