Effective stdin reading c programming

2019-02-19 14:01发布

问题:

can anyone help me optimalize code for reading standard input. Here it is what I have now:

unsigned char *msg;
size_t msgBytes = 0;
size_t inputMsgBuffLen = 1024;
if ( (msg = (unsigned char *) malloc(sizeof(unsigned char) * inputMsgBuffLen) ) == NULL ) {
    quitErr("Couldn't allocate memmory!", EXIT_FAILURE);
}
for (int c; (c = getchar()) != EOF; msgBytes++) {
    if (msgBytes >= (inputMsgBuffLen)) {
        inputMsgBuffLen <<= 1;
        if ( ( msg = (unsigned char *)realloc(msg, sizeof(unsigned char) * inputMsgBuffLen) ) == NULL) {
            free(msg);
            quitErr("Couldn't allocate more memmory!", EXIT_FAILURE);
        }
    }
    msg[msgBytes] = (unsigned char)c;
}

回答1:

Question: are you reading binary or text data from stdin? If text, why are you using unsigned char?

Some advice:

  1. Drop all the casts on malloc and realloc; they aren't necessary and clutter up the code;
  2. Instead of repeatedly calling getchar, use fread or fgets (depending on whether you're reading binary or text);
  3. Remember that realloc can potentially return NULL, so you want to assign the result to a temporary value, otherwise you'll lose track of the original pointer and wind up leaking memory;
  4. Use a statically allocated buffer for each chunk of input;
  5. Use sizeof on objects, not types; it's a little cleaner, and it protects you in case the types change (e.g., T *p = malloc(sizeof *p * number_of_elements);.

Cleaned-up version assuming you intend to use unsigned chars:

#define inputBufSize 1024

unsigned char *msg = NULL;
size_t msgBytes = 0;
size_t inputMsgBufSize = 0;
unsigned char inputBuffer[inputBufSize];
size_t bytesRead = 0;

while ((bytesRead = fread(
    inputBuffer,            // target buffer
    sizeof inputBuffer,     // number of bytes in buffer
    1,                      // number of buffer-sized elements to read
    stdin)) > 0)
{
  unsigned char *tmp = realloc(msg, inputMsgBufSize + bytesRead));
  if (tmp)
  {
    msg = tmp;
    memmove(&msg[inputMsgBufSize], inputBuffer, bytesRead);
    inputMsgBufSize += bytesRead;
  }
  else
  {
    printf("Ran out of memory\n");
    free(msg);
    break;
  }
}


回答2:

Try to read fixed chunks of at least 8192 bytes. Don't use single char input since it's quite slow.



回答3:

Why do you want to "optimalize" the code?

Did you time it?
Did you find it was too slow?
Are you ready to time the new versions?
Do you realize timing run time of code is dependent of many many factors (like current processor load, number of active users, disk activity, ..., ...)

The best optimization you can do, is start with a very large value for malloc (and possibly realloc down after all data has been read).

size_t inputMsgBuffLen = 400000000; /* approx 400 mega */