Consider the following simple C program that read a file into a buffer and displays that buffer to the console:
#include<stdio.h>
main()
{
FILE *file;
char *buffer;
unsigned long fileLen;
//Open file
file = fopen("HelloWorld.txt", "rb");
if (!file)
{
fprintf(stderr, "Unable to open file %s", "HelloWorld.txt");
return;
}
//Get file length
fseek(file, 0, SEEK_END);
fileLen=ftell(file);
fseek(file, 0, SEEK_SET);
//Allocate memory
buffer=(char *)malloc(fileLen+1);
if (!buffer)
{
fprintf(stderr, "Memory error!");
fclose(file);
return;
}
//Read file contents into buffer
fread(buffer, fileLen, 1, file);
//Send buffer contents to stdout
printf("%s\n",buffer);
fclose(file);
}
The file it will read simply contains:
Hello World!
The output is:
Hello World!²²²²▌▌▌▌▌▌▌↔☺
It has been a while since I did anything significant in C/C++, but normally I would assume the buffer was being allocated larger than necessary, but this does not appear to be the case.
fileLen ends up being 12, which is accurate.
I am thinking now that I must just be displaying the buffer wrong, but I am not sure what I am doing wrong.
Can anyone clue me in to what I am doing wrong?
You need to NUL-terminate your string. Add
buffer[fileLen] = 0;
before printing it.
JesperE's approach will work, but you may be interested to know that there's an alternate way of handling this.
You can always print a string of known length, even when there's no NUL-terminator, by providing the length to printf
as the precision for the string field:
printf("%.*s\n", fileLen, buffer);
This allows you print the string without modifying the buffer.
JesperE is correct regarding the nul-termination issue in your example, I'll just add that if you are processing text files it would be better to use fgets() or something similar as this will properly handle newline sequences across different platforms and will always nul-terminate the string for you. If you are really working with binary data then you don't want to use printf() to output the data as the printf functions expect strings and a nul byte in the data will cause truncation of the output.
Your approach to determine file size by seeking to the end of the file and then using ftell()
is wrong:
- If it is a text file, opened without
"b"
in the second parameter to the fopen()
call, then ftell()
may not tell you the number of characters that you can read from the file. For example, windows uses two bytes for end of line, but when read, it is one char
. In fact, the return value of ftell()
for streams opened in text mode is useful only in calls to fseek()
, and not to determine file size.
- If it is a binary file, opened with
"b"
in the second parameter to fopen()
, then the C standard has this to say:
Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END)
, has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state.
So, what you are doing isn't necessarily going to work in standard C. Your best bet is to use fread()
to read, and if you happen to need more memory, use realloc()
. Your system may provide mmap()
, or may make guarantees about setting the file position indicator to end-of-file for binary streams—but relying on those is not portable.
See also this C-FAQ: What's the difference between text and binary I/O?.
You can use calloc
instead of malloc
to allocate memory that is already initialised. calloc
takes on extra argument. It's useful for allocating arrays; the first parameter of calloc
indicates the number of elements in the array that you would like to allocate memory for, and the second argument is the size of each element. Since the size of a char
is always 1, we can just pass 1
as the second argument:
buffer = calloc (fileLen + 1, 1);
In C, there is no need to cast the return value of malloc
or calloc
. The above will ensure that the string will be null terminated even if the reading of file ended prematurely for whatever reason. calloc
does take longer than malloc
because it has to zero out all the memory you asked for before giving it to you.