How do I read a file into a std::string
, i.e., read the whole file at once?
Text or binary mode should be specified by the caller. The solution should be standard-compliant, portable and efficient. It should not needlessly copy the string's data, and it should avoid reallocations of memory while reading the string.
One way to do this would be to stat the filesize, resize the std::string
and fread()
into the std::string
's const_cast<char*>()
'ed data()
. This requires the std::string
's data to be contiguous which is not required by the standard, but it appears to be the case for all known implementations. What is worse, if the file is read in text mode, the std::string
's size may not equal the file's size.
A fully correct, standard-compliant and portable solutions could be constructed using std::ifstream
's rdbuf()
into a std::ostringstream
and from there into a std::string
. However, this could copy the string data and/or needlessly reallocate memory. Are all relevant standard library implementations smart enough to avoid all unnecessary overhead? Is there another way to do it? Did I miss some hidden Boost function that already provides the desired functionality?
Please show your suggestion how to implement it.
void slurp(std::string& data, bool is_binary)
taking into account the discussion above.
Use
or something very close. I don't have a stdlib reference open to double-check myself.
Yes, I understand I didn't write the
slurp
function as asked.Something like this shouldn't be too bad:
The advantage here is that we do the reserve first so we won't have to grow the string as we read things in. The disadvantage is that we do it char by char. A smarter version could grab the whole read buf and then call underflow.
The shortest variant: Live On Coliru
It requires the header
<iterator>
.There were some reports that this method is slower than preallocating the string and using
std::istream::read
. However, on a modern compiler with optimisations enabled this no longer seems to be the case, though the relative performance of various methods seems to be highly compiler dependent.I do not have enough reputation to comment directly on responses using
tellg()
.Please be aware that
tellg()
can return -1 on error. If you're passing the result oftellg()
as an allocation parameter, you should sanity check the result first.An example of the problem:
In the above example, if
tellg()
encounters an error it will return -1. Implicit casting between signed (ie the result oftellg()
) and unsigned (ie the arg to thevector<char>
constructor) will result in a your vector erroneously allocating a very large number of bytes. (Probably 4294967295 bytes, or 4GB.)Modifying paxos1977's answer to account for the above:
This solution adds error checking to the rdbuf()-based method.
I'm adding this answer because adding error-checking to the original method is not as trivial as you'd expect. The original method uses stringstream's insertion operator (
str_stream << file_stream.rdbuf()
). The problem is that this sets the stringstream's failbit when no characters are inserted. That can be due to an error or it can be due to the file being empty. If you check for failures by inspecting the failbit, you'll encounter a false positive when you read an empty file. How do you disambiguate legitimate failure to insert any characters and "failure" to insert any characters because the file is empty?You might think to explicitly check for an empty file, but that's more code and associated error checking.
Checking for the failure condition
str_stream.fail() && !str_stream.eof()
doesn't work, because the insertion operation doesn't set the eofbit (on the ostringstream nor the ifstream).So, the solution is to change the operation. Instead of using ostringstream's insertion operator (<<), use ifstream's extraction operator (>>), which does set the eofbit. Then check for the failiure condition
file_stream.fail() && !file_stream.eof()
.Importantly, when
file_stream >> str_stream.rdbuf()
encounters a legitimate failure, it shouldn't ever set eofbit (according to my understanding of the specification). That means the above check is sufficient to detect legitimate failures.