I would like to read an file into a string. I am looking for different ways for how to do it efficiently.
Using a fixed size *char buffer
I have received an answer from Tony what creates a 16 kb buffer and reads into that buffer and appends the buffer till there is nothing more to read. I understand how it works and I found it very fast. What I don't understand is that in the comments of that answer it is said that this way copies everything twice. But as I understand it, it only happens in the memory, not from the disk, so it is almost unnoticable. Is it a problem that it copies from the buffer to the string in the memory?
Using istreambuf_iterator
The other answer I received uses istreambuf_iterator. The code looks beautiful and minimal, but it is extremely slow. I don't know why does it happen. Why are those iterators so slow?
Using memcpy()
For this question I received comments that I should use memcpy() as it is the fastest native method. But how can I use memcpy() with a string and an ifstream object? Isn't ifstream supposed to work with its own read function? Why does using memcpy() ruin portability? I am looking for a solution which is compatible with VS2010 as well as GCC. Why would memcpy() not work with those?
+ Any other efficient way possible?
What do you recommend, what shell I use, for small < 10 MB binary files?
(I did not want to split this question in parts, as I am more interested in the comparison between the different way how can I read an ifstream into a string)
That is indeed correct. Still, a solution that doesn’t do that may be faster.
The code is slow not because of the iterators but because the string doesn’t know how much memory to allocate: the
istreambuf_iterator
s can only be traversed once so the string is essentially forced to perform repeated concatenations with resulting memory reallocations, which are very slow.My favourite one-liner, from another answer is streaming directly from the underlying buffer:
On recent platforms this will indeed pre-allocate the buffer. It will however still result in a redundant copy (from the
stringstream
to the final string).The most general way would be probably be the response using the
istreambuf_iterator
:Although exact performance is very dependent on the implementation, it's highly unlikely that this is the fastest solution.
An interesting alternative would be:
This could be very rapid, if the implementation has do a good job on the
operator<<
you're using, and in how it grows the string within theistringstream
. Some earlier implementations (and maybe sone more recent ones as well) were very bad at this, however.In general, performance using an
std::string
will depend on how efficient the implementation is in growing a string; the implementation cannot determine how large to make it initially. You might want to compare the first algorithm using the same code withstd::vector<char>
instead ofstd::string
, or if you can make a good estimate of the maximum size, usingreserve
, or something like:memcpy
cannot read from a file, and with a good compiler, will not be as fast as usingstd::copy
(with the same data types).I tend to use the second solution, above, with the
<<
on therdbuf()
, but that's partially for historical reasons; I got used to doing this (usingistrstream
) before the STL was added to the standard library. For that matter, you might want to experiment withistrstream
and a pre-allocated buffer (supposing you can find an appropriate size for the buffer).