Decompressing archives with boost and filtering st

2019-07-19 17:56发布

问题:

I am working on decompressing large files, with contains specified blocks of compressed data various ways. I wrote a following code:

// input_file - path to file
std::ifstream file(input_file, std::ios_base::in | std::ios_base::binary);
//move to begin of n-th data block, compressed by zlib
file.seekg(offset, std::ios_base::beg);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(file);
// write decompressed data to output file
boost::iostreams::copy(in, output);

My understanding is this line

boost::iostreams::copy(in, output); 

will start decompressing and copying data until the end of the file, which is unwanted in that case.

It is important that, I know the proper offset and length of compressed data.

Boost documentation says that:

A model of Source can be defined as follows:

struct Source {
    typedef char        char_type;
    typedef source_tag  category;
    std::streamsize read(char* s, std::streamsize n) 
    {
        // Read up to n characters from the input 
        // sequence into the buffer s, returning   
        // the number of characters read, or -1 
        // to indicate end-of-sequence.
    }
};

I wanted to inherit from ifstream class, overwrite it read method, and inside that method calculate how many bytes were read and return -1 if there is no more data in that chunk, but unfortunately, it seems not working.

I wrote:

class ifstream_t : public std::ifstream{
     public:
     ifstream_t(const std::string& fp, std::ios_base::openmode mode = std::ios_base::in) : std::ifstream(fp, mode){}
     std::streamsize read(char* s, std::streamsize n) {
         // calculate remaining bytes 
         return -1;
     }   
};

and used it in:

ifstream_t file(this->fp, std::ios_base::in | std::ios_base::binary);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(file);
boost::iostreams::copy(in, output);

method read, from my class in not invoked.

回答1:

My understanding is this line

 boost::iostreams::copy(in, output);

will start decompressing and copying data until the end of the file, which is unwanted in that case.

I just tested this, and that's not the case. The decompressor correctly detects the end of stream when the compressed data is completed.

I created a file with some random data sandwiching its own compressed source:¹

(dd if=/dev/urandom bs=1 count=$((0x3214a)); cat main.cpp | zlib-flate -compress; dd if=/dev/urandom bs=1 count=$((0x3214a))) > input.txt 

When using the program with hardcoded offset and that file:

Live On Coliru

#include <boost/iostreams/filter/zlib.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <fstream>
#include <iostream>

int main() {
    static std::string const input_file = "input.txt";
    static size_t      const offset     = 0x3214a;
    std::ostream& output = std::cout;

    // input_file - path to file
    std::ifstream file(input_file, std::ios_base::in | std::ios_base::binary);

    //move to begin of n-th data block, compressed by zlib
    file.seekg(offset, std::ios_base::beg);
    boost::iostreams::filtering_streambuf<boost::iostreams::input> in;

    in.push(boost::iostreams::zlib_decompressor());
    in.push(file);

    // write decompressed data to output file
    boost::iostreams::copy(in, output);
}

Which happily reproduces its own source, as you can see live on coliru


¹ zib-flate is absent on coliru, so I used python:

python -c 'import zlib; import sys; sys.stdout.write(zlib.compress(sys.stdin.read()))'


标签: c++ c++11 boost