can anyone help me optimalize code for reading standard input. Here it is what I have now:
unsigned char *msg;
size_t msgBytes = 0;
size_t inputMsgBuffLen = 1024;
if ( (msg = (unsigned char *) malloc(sizeof(unsigned char) * inputMsgBuffLen) ) == NULL ) {
quitErr("Couldn't allocate memmory!", EXIT_FAILURE);
}
for (int c; (c = getchar()) != EOF; msgBytes++) {
if (msgBytes >= (inputMsgBuffLen)) {
inputMsgBuffLen <<= 1;
if ( ( msg = (unsigned char *)realloc(msg, sizeof(unsigned char) * inputMsgBuffLen) ) == NULL) {
free(msg);
quitErr("Couldn't allocate more memmory!", EXIT_FAILURE);
}
}
msg[msgBytes] = (unsigned char)c;
}
Question: are you reading binary or text data from stdin
? If text, why are you using unsigned char
?
Some advice:
- Drop all the casts on
malloc
and realloc
; they aren't necessary and clutter up the code;
- Instead of repeatedly calling
getchar
, use fread
or fgets
(depending on whether you're reading binary or text);
- Remember that
realloc
can potentially return NULL, so you want to assign the result to a temporary value, otherwise you'll lose track of the original pointer and wind up leaking memory;
- Use a statically allocated buffer for each chunk of input;
- Use
sizeof
on objects, not types; it's a little cleaner, and it protects you in case the types change (e.g., T *p = malloc(sizeof *p * number_of_elements);
.
Cleaned-up version assuming you intend to use unsigned chars:
#define inputBufSize 1024
unsigned char *msg = NULL;
size_t msgBytes = 0;
size_t inputMsgBufSize = 0;
unsigned char inputBuffer[inputBufSize];
size_t bytesRead = 0;
while ((bytesRead = fread(
inputBuffer, // target buffer
sizeof inputBuffer, // number of bytes in buffer
1, // number of buffer-sized elements to read
stdin)) > 0)
{
unsigned char *tmp = realloc(msg, inputMsgBufSize + bytesRead));
if (tmp)
{
msg = tmp;
memmove(&msg[inputMsgBufSize], inputBuffer, bytesRead);
inputMsgBufSize += bytesRead;
}
else
{
printf("Ran out of memory\n");
free(msg);
break;
}
}
Try to read fixed chunks of at least 8192 bytes. Don't use single char input since it's quite slow.
Why do you want to "optimalize" the code?
Did you time it?
Did you find it was too slow?
Are you ready to time the new versions?
Do you realize timing run time of code is dependent of many many factors (like current processor load, number of active users, disk activity, ..., ...)
The best optimization you can do, is start with a very large value for malloc (and possibly realloc down after all data has been read).
size_t inputMsgBuffLen = 400000000; /* approx 400 mega */