In the below code, I have a corrupt "hello.bz2" which has stray characters beyond the EOF.
Is there a way to make the boost::iostreams::copy() call to throw ?
#include <fstream>
#include <iostream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/bzip2.hpp>
int main()
{
using namespace std;
using namespace boost::iostreams;
ifstream file("hello.bz2", ios_base::in | ios_base::binary);
filtering_streambuf<input> in;
in.push(bzip2_decompressor());
in.push(file);
boost::iostreams::copy(in, cout);
}
EDIT: Please ignore the line that is so far attracted most attention; the EOF. Please assume working with a corrupted bzip2 file. I used "EOF" suggesting the error I got when I run bzcat on the file
bzcat hello.bz2
hello world
bzcat: hello.bz2: trailing garbage after EOF ignored
Research
std::ios_base::failure is the "the base class for the types of all objects thrown as exceptions, by functions in the Iostreams library, to report errors detected during stream buffer operations."
Looking at the boost docs:
bzip2_error is a specific exception thrown when using the bzip2 filter, which inherits from std::ios_base::failure. As you can see, it is constructed by passing in an integer representing the error code. It also has a method error() which returns the error code it was constructed with.
The docs list bzip2 error codes as the following:
Code
EDIT I also want to clarify that boost::iostreams::copy() will not be the one throwing the exception here, but the bzip2 filter. Only the iostream or filters will throw exceptions, copy just uses the iostream/filter which may cause the iostream/filter to throw an exception.
**EDIT 2 ** It appears the problem is with bzip2_decompressor_impl as you have expected. I have replicated the endless spinning loop when the bz2 file is empty. It took me a little while to figure out how to build boost and link with bzip2, zlib, and iostreams library to see if I could replicate your results.
test.cpp:
debugging:
There is a loop that drives the bzip2's uncompression in symmetric.hpp:109 :
bzip2_decompressor_impl's filter method bzip2.hpp:344 gets called on symmetric.hpp:117 :
I think the problem is simple, the bzip2_decompressor_impl's eof_ flag never gets set. Unless it's suppose to happen in some magic way I don't understand, it's owned by the bzip2_decompressor_impl class and it's only ever being set to false. So when we do this:
We get a spinning loop that never ends, we don't break when an EOF is hit. This is certainly a bug, because other programs (like vim) would have no problem opening a text file created in a similar manner. However I am able to get the filter to throw when the bz2 file is "corrupted":
Sometimes you have to take open source code with a grain of salt. It will be more likely that your bz2's will be corrupted and properly throw. However, the /dev/null case is a serious bug. We should submit it to the boost dev so they can fix it.
How do you have stray characters beyond the end of the file?
If you mean that the file has garbage data in it, how would the decompression algorithm be able to tell whether or not the data is garbage to be able to make a decision to
throw
?