Lately I've been asked to write a function that reads the binary file into the std::vector<BYTE>
where BYTE
is an unsigned char
. Quite quickly I came with something like this:
#include <fstream>
#include <vector>
typedef unsigned char BYTE;
std::vector<BYTE> readFile(const char* filename)
{
// open the file:
std::streampos fileSize;
std::ifstream file(filename, std::ios::binary);
// get its size:
file.seekg(0, std::ios::end);
fileSize = file.tellg();
file.seekg(0, std::ios::beg);
// read the data:
std::vector<BYTE> fileData(fileSize);
file.read((char*) &fileData[0], fileSize);
return fileData;
}
which seems to be unnecessarily complicated and the explicit cast to char*
that I was forced to use while calling file.read
doesn't make me feel any better about it.
Another option is to use std::istreambuf_iterator
:
std::vector<BYTE> readFile(const char* filename)
{
// open the file:
std::ifstream file(filename, std::ios::binary);
// read the data:
return std::vector<BYTE>((std::istreambuf_iterator<char>(file)),
std::istreambuf_iterator<char>());
}
which is pretty simple and short, but still I have to use the std::istreambuf_iterator<char>
even when I'm reading into std::vector<unsigned char>
.
The last option that seems to be perfectly straightforward is to use std::basic_ifstream<BYTE>
, which kinda expresses it explicitly that "I want an input file stream and I want to use it to read BYTE
s":
std::vector<BYTE> readFile(const char* filename)
{
// open the file:
std::basic_ifstream<BYTE> file(filename, std::ios::binary);
// read the data:
return std::vector<BYTE>((std::istreambuf_iterator<BYTE>(file)),
std::istreambuf_iterator<BYTE>());
}
but I'm not sure whether basic_ifstream
is an appropriate choice in this case.
What is the best way of reading a binary file into the vector
? I'd also like to know what's happening "behind the scene" and what are the possible problems I might encounter (apart from stream not being opened properly which might be avoided by simple is_open
check).
Is there any good reason why one would prefer to use std::istreambuf_iterator
here?
(the only advantage that I can see is simplicity)
Since you are loading the entire file into memory the most optimal version is to map the file into memory. This is because the kernel loads the file into kernel page cache anyway and by mapping the file you just expose those pages in the cache into your process. Also known as zero-copy.
When you use
std::vector<>
it copies the data from the kernel page cache intostd::vector<>
which is unnecessary when you just want to read the file.Also, when passing two input iterators to
std::vector<>
it grows its buffer while reading because it does not know the file size. When resizingstd::vector<>
to the file size first it needlessly zeroes out its contents because it is going to be overwritten with file data anyway. Both of the methods are sub-optimal in terms of space and time.I would have thought that the first method, using the size and using
stream::read()
would be the most efficient. The "cost" of casting tochar *
is most likely zero - casts of this kind simply tell the compiler that "Hey, I know you think this is a different type, but I really want this type here...", and does not add any extra instrucitons - if you wish to confirm this, try reading the file into a char array, and compare the actual assembler code. Aside from a little bit of extra work to figure out the address of the buffer inside the vector, there shouldn't be any difference.As always, the only way to tell for sure IN YOUR CASE what is the most efficient is to measure it. "Asking on the internet" is not proof.
When testing for performance, I would include a test case for:
My thinking is that the constructor of Method 1 touches the elements in the
vector
, and then theread
touches each element again.Method 2 and Method 3 look most promising, but could suffer one or more
resize
's. Hence the reason toreserve
before reading or inserting.I would also test with
std::copy
:In the end, I think the best solution will avoid
operator >>
fromistream_iterator
(and all the overhead and goodness fromoperator >>
trying to interpret binary data). But I don't know what to use that allows you to directly copy the data into the vector.Finally, my testing with binary data is showing
ios::binary
is not being honored. Hence the reason fornoskipws
from<iomanip>
.