I have the following ANSI C code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *buffer = 0;
int length = 0;
FILE *f = fopen("text.txt", "r");
if(f) {
fseek(f, 0, SEEK_END);
length = ftell(f);
fseek(f, 0, SEEK_SET);
buffer = malloc(length);
fread(buffer, 1, length, f);
fclose (f);
}
printf("File size: %d\nBuffer size: %d\nContent: %s\n=END=", length, strlen(buffer), buffer);
return 0;
}
Which for some reason after the malloc alocates more memory than needed and output extra garbage from the memory, example:
First run:
File size: 12
Buffer size: 22
Content: 123456789012les=$#▬rW|
=END=
Second run:
File size: 12
Buffer size: 22
Content: 123456789012les↔1↕.'
=END=
Third run:
File size: 12
Buffer size: 22
Content: 123456789012les=▬kπà
=END=
Could someone please help me with this and also explain why my version is behaving weird?
I use MingW TDM-GCC 4.9.2 32bit for compilation (gcc)
You have undefined behavior (this explains why you should be afraid of UB) -because of buffer overflow. You forgot to add a terminating null byte.
Replace the faulty lines:
// WRONG CODE:
buffer = malloc(length);
fread(buffer, 1, length, f);
with
buffer = malloc(length+1);
if (!buffer)
{ perror("malloc"); exit(EXIT_FAILURE); };
memset (buffer, 0, length+1);
if (fread(buffer, 1, length, f) < length)
{ perror("fread"); exit(EXIT_FAILURE); };
(You could zero just the ending byte; I prefer to clear with memset
the entire buffer)
BTW, ANSI C is obsolete. You should use a C11 compliant compiler (e.g. a recent GCC used as gcc -std=c11 -Wall -Wextra -g
) and target C11 compliance (or at least C99). Learn to use the debugger (e.g. gdb
)
Read carefully the documentation of malloc(3), fread(3), perror(3) etc....
The use of fseek(f, 0, SEEK_END);
invokes undefined behavior. First, you're not reading in binary mode, so the number of bytes in the file isn't necessarily the number of bytes that will be read.
But if you switch to a binary stream, per 7.19.9.2 of the C Standard:
A binary stream need not meaningfully support fseek
calls with a
whence
value of SEEK_END
.
and
Setting the file position indicator to end-of-file, as with
fseek(file, 0, SEEK_END)
, has undefined behavior for a binary
stream (because of possible trailing null characters) or for any
stream with state-dependent encoding that does not assuredly end in
the initial shift state.