Read from file or stdin

2019-01-08 12:17发布

问题:

I am writing a utility which accepts either a filename, or reads from stdin.

I would like to know the most robust / fastest way of checking to see if stdin exists (data is being piped to the program) and if so reading that data in. If it doesn't exist, the processing will take place on the filename given. I have tried using the following the test for size of stdin but I believe since it's a stream and not an actual file, it's not working as I suspected it would and it's always printing -1. I know I could always read the input 1 character at a time while != EOF but I would like a more generic solution so I could end up with either a fd or a FILE* if stdin exists so the rest of the program will function seamlessly. I would also like to be able to know its size, pending the stream has been closed by the previous program.

long getSizeOfInput(FILE *input){
  long retvalue = 0;
  fseek(input, 0L, SEEK_END);
  retvalue = ftell(input);
  fseek(input, 0L, SEEK_SET);
  return retvalue;
}

int main(int argc, char **argv) {
  printf("Size of stdin: %ld\n", getSizeOfInput(stdin));
  exit(0);
}

Terminal:

$ echo "hi!" | myprog
Size of stdin: -1

回答1:

First, ask the program to tell you what is wrong by checking the errno, which is set on failure, such as during fseek or ftell.

Others (tonio & LatinSuD) have explained the mistake with handling stdin versus checking for a filename. Namely, first check argc (argument count) to see if there are any command line parameters specified if (argc > 1), treating - as a special case meaning stdin.

If no parameters are specified, then assume input is (going) to come from stdin, which is a stream not file, and the fseek function fails on it.

In the case of a stream, where you cannot use file-on-disk oriented library functions (i.e. fseek and ftell), you simply have to count the number of bytes read (including trailing newline characters) until receiving EOF (end-of-file).

For usage with large files you could speed it up by using fgets to a char array for more efficient reading of the bytes in a (text) file. For a binary file you need to use fopen(const char* filename, "rb") and use fread instead of fgetc/fgets.

You could also check the for feof(stdin) / ferror(stdin) when using the byte-counting method to detect any errors when reading from a stream.

The sample below should be C99 compliant and portable.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>

long getSizeOfInput(FILE *input){
   long retvalue = 0;
   int c;

   if (input != stdin) {
      if (-1 == fseek(input, 0L, SEEK_END)) {
         fprintf(stderr, "Error seek end: %s\n", strerror(errno));
         exit(EXIT_FAILURE);
      }
      if (-1 == (retvalue = ftell(input))) {
         fprintf(stderr, "ftell failed: %s\n", strerror(errno));
         exit(EXIT_FAILURE);
      }
      if (-1 == fseek(input, 0L, SEEK_SET)) {
         fprintf(stderr, "Error seek start: %s\n", strerror(errno));
         exit(EXIT_FAILURE);
      }
   } else {
      /* for stdin, we need to read in the entire stream until EOF */
      while (EOF != (c = fgetc(input))) {
         retvalue++;
      }
   }

   return retvalue;
}

int main(int argc, char **argv) {
   FILE *input;

   if (argc > 1) {
      if(!strcmp(argv[1],"-")) {
         input = stdin;
      } else {
         input = fopen(argv[1],"r");
         if (NULL == input) {
            fprintf(stderr, "Unable to open '%s': %s\n",
                  argv[1], strerror(errno));
            exit(EXIT_FAILURE);
         }
      }
   } else {
      input = stdin;
   }

   printf("Size of file: %ld\n", getSizeOfInput(input));

   return EXIT_SUCCESS;
}


回答2:

You're thinking it wrong.

What you are trying to do:

If stdin exists use it, else check whether the user supplied a filename.

What you should be doing instead:

If the user supplies a filename, then use the filename. Else use stdin.

You cannot know the total length of an incoming stream unless you read it all and keep it buffered. You just cannot seek backwards into pipes. This is a limitation of how pipes work. Pipes are not suitable for all tasks and sometimes intermediate files are required.



回答3:

You may want to look at how this is done in the cat utility, for example.

See code here. If there is no filename as argument, or it is "-", then stdin is used for input. stdin will be there, even if no data is pushed to it (but then, your read call may wait forever).



回答4:

You can just read from stdin unless the user supply a filename ?

If not, treat the special "filename" - as meaning "read from stdin". The user would have to start the program like cat file | myprogram - if he wants to pipe data to it, and myprogam file if he wants it to read from a file.

int main(int argc,char *argv[] ) {
  FILE *input;
  if(argc != 2) {
     usage();
     return 1;
   }
   if(!strcmp(argv[1],"-")) {
     input = stdin;
    } else {
      input = fopen(argv[1],"rb");
      //check for errors
    }

If you're on *nix, you can check whether stdin is a fifo:

 struct stat st_info;
 if(fstat(0,&st_info) != 0)
   //error
  }
  if(S_ISFIFO(st_info.st_mode)) {
     //stdin is a pipe
  }

Though that won't handle the user doing myprogram <file

You can also check if stdin is a terminal/console

if(isatty(0)) {
  //stdin is a terminal
}


回答5:

Just testing for end of file with feof would do, I think.



回答6:

Note that what you want is to know if stdin is connected to a terminal or not, not if it exists. It always exists but when you use the shell to pipe something into it or read a file, it is not connected to a terminal.

You can check that a file descriptor is connected to a terminal via the termios.h functions:

#include <termios.h>
#include <stdbool.h>

bool stdin_is_a_pipe(void)
{
    struct termios t;
    return (tcgetattr(STDIN_FILENO, &t) < 0);
}

This will try to fetch the terminal attributes of stdin. If it is not connected to a pipe, it is attached to a tty and the tcgetattr function call will succeed. In order to detect a pipe, we check for tcgetattr failure.