C++ binary files and iterators: getting away with

2019-05-02 00:24发布

This answer points out the fact that C++ is not well suited for the iteration over a binary file, but this is what I need right now, in short I need to operate on files in a "binary" way, yes all files are binary even the .txt ones, but I'm writing something that operates on image files, so I need to read files that are well structured, were the data is arranged in a specific way.

I would like to read the entire file in a data structure such as std::vector<T> so I can almost immediately close the file and work with the content in memory without caring about disk I/O anymore.

Right now, the best way to perform a complete iteration over a file according to the standard library is something along the lines of

std::ifstream ifs(filename, std::ios::binary);
  for (std::istreambuf_iterator<char, std::char_traits<char> > it(ifs.rdbuf());
       it != std::istreambuf_iterator<char, std::char_traits<char> >(); it++) {
    // do something with *it;
  }
ifs.close();

or use std::copy, but even with std::copy you are always using istreambuf iterators ( so if I understand the C++ documentation correctly, you are basically reading 1 byte at each call with the previous code ).

So the question is: how do I write a custom iterator ? from where I should inherit from ?

I assume that this is also important while writing a file to disk, and I assume that I could use the same iterator class for writing, if I'm wrong please feel free to correct me.

2条回答
成全新的幸福
2楼-- · 2019-05-02 00:47

My suggestion is not to use a custom stream, stream-buffer or stream-iterator.

#include <fstream>

struct Data {
    short a;
    short b;
    int   c;
};

std::istream& operator >> (std::istream& stream, Data& data) {
    static_assert(sizeof(Data) == 2*sizeof(short) + sizeof(int), "Invalid Alignment");
    if(stream.read(reinterpret_cast<char*>(&data), sizeof(Data))) {
        // Consider endian
    }
    else {
        // Error
    }
    return stream;
}

int main(int argc, char* argv[])
{
    std::ifstream stream;
    Data data;
    while(stream >> data) {
        // Process
    }
    if(stream.fail()) {
        // Error (EOF is good)
    }
    return 0;
}

You could dare to make a stream buffer iterator reading elements having a bigger size than the underlaying char_type:

  • What if the data has an invalid format ?
  • What if the data is incomplete and at EOF ?

The state of the stream is not maintained by the buffer or iterator.

查看更多
Lonely孤独者°
3楼-- · 2019-05-02 00:58

It is possible to optimize std::copy() using std::istreambuf_iterator<char> but hardly any implementation does. Just deriving from something won't really do the trick either because that isn't how iterators work.

The most effective built-in approach is probably to simply dump the file into an std::ostringstream and the get a std::string from there:

std::ostringstream out;
out << file.rdbuf();
std::string content = out.str();

If you want to avoid travelling through a std::string you could write a stream buffer directly dumping the content into a memory area or a std::vector<unsigned char> and also using the output operation above.

The std::istreambuf_iterator<char>s could, in principle have a backdoor to the stream buffer's and bypass characterwise operations. Without that backdoor you won't be able to speed up anything using these iterators. You could create an iterator on top of stream buffers using the stream buffer's sgetn() to deal with a similar buffer. In that case you'd pretty much need a version of std::copy() dealing with segments (i.e., each fill of a buffer) efficiently. Short of either I'd just read the file into buffer using a stream buffer and iterate over that.

查看更多
登录 后发表回答