I'm currently working on a small project which requires loading messages from a file. The messages are stored sequentially in the file and files can become huge, so loading the entire file content into memory is unrewarding.
Therefore we decided to implement a FileReader
class that is capable of moving to specific elements in the file quickly and load them on request. Commonly used something along the following lines
SpecificMessage m;
FileReader fr;
fr.open("file.bin");
fr.moveTo(120); // Move to Message #120
fr.read(&m); // Try deserializing as SpecificMessage
The FileReader per se works great. Therefore we thought about adding STL compliant iterator support as well: A random access iterator that provides read-only references to specific messages. Used in the following way
for (auto iter = fr.begin<SpecificMessage>(); iter != fr.end<SpecificMessage>(); ++iter) {
// ...
}
Remark: the above assumes that the file only contains messages of type SpecificMessage. We've been using boost::iterator_facade
to simplify the implementation.
Now my question boils down to: how to implement the iterator correctly? Since FileReader
does not actually hold a sequence of messages internally, but loads them on request.
What we've tried so far:
Storing the message as an iterator member
This approach stores the message in the iterator instance. Which works great for simple use-cases but fails for more complex uses. E.g. std::reverse_iterator
has a dereference operation that looks like this
reference operator*() const
{ // return designated value
_RanIt _Tmp = current;
return (*--_Tmp);
}
This breaks our approach as a reference to a message from a temporary iterator is returned.
Making the reference type equal the value type
@DDrmmr in the comments suggested making the reference type equal the value type, so that a copy of the internally stored object is returned. However, I think this is not valid for the reverse iterator which implements the -> operator as
pointer operator->() const {
return (&**this);
}
which derefs itself, calls the *operator which then returns a copy of a temporary and finally returns the address of this temporary.
Storing the message externally
Alternatively I though about storing the message externally:
SpecificMessage m;
auto iter = fr.begin<SpecificMessage>(&m);
// ...
which also seems to be flawed for
auto iter2 = iter + 2
which will have both iter2
and iter
point to the same content.
As I hinted in my other answer, you could consider using memory mapped files. In the comment you asked:
Well, if your SpecificMessage is a POD type, you could just iterate over the raw memory directly. If not, you could have a deserialization helper (as you already have) and use Boost
transform_iterator
to do the deserialization on demand.Note that we can make the memory mapped file managed, effectively meaning that you can just use it as a regular heap, and you can store all standard containers. This includes node-based containers (
map<>
, e.g.), dynamic-size containers (e.g.vector<>
) in addition to the fixed-size containers (array<>
) - and any combinations of those.Here's a demo that takes a simple
SpecificMessage
that contains a string, and (de)derializes it directly into shared memory:The part that interests you would be the consuming part:
So this prints each 13th message, in reverse order, followed by a random blob.
Full Demo
The sample online uses the lines of the sources as "messages".
Live On Coliru
You are having issues because your iterator does not conform to the forward iterator requirements. Specifically:
*i
must be an lvalue reference tovalue_type
orconst value_type
([forward.iterators]/1.3)*i
cannot be a reference to an object stored in the iterator itself, due to the requirement that two iterators are equal if and only if they are bound to the same object ([forward.iterators]/6)Yes, these requirements are a huge pain in the butt, and yes, that means that things like
std::vector<bool>::iterator
are not random access iterators even though some standard library implementations incorrectly claim that they are.EDIT: The following suggested solution is horribly broken, in that dereferencing a temporary iterator returns a reference to an object that may not live until the reference is used. For example, after
auto& foo = *(i + 1);
the object referenced byfoo
may have been released. The implementation ofreverse_iterator
referenced in the OP will cause the same problem.I'd suggest that you split your design into two classes:FileCache
that holds the file resources and a cache of loaded messages, andFileCache::iterator
that holds a message number and lazily retrieves it from theFileCache
when dereferenced. The implementation could be something as simple as storing a container ofweak_ptr<Message>
inFileCache
and ashared_ptr<Message>
in the iterator: Simple demoBoost PropertyMap
You could avoid writing the bulk of the code using Boost PropertyMap:
Live On Coliru
Sample output is
Using Memory Mapped Files
You could store a map of index -> BLOB objects in a shared
vector<array<byte, N>>
,flat_map<size_t, std::vector<uint8_t> >
or similar.So, now you only have to deserialize from
myshared_map[index].data()
(begin()
andend()
in case the BLOB size varies)I have to admit I may not fully understand the trouble you have with holding the current MESSAGE as a member of Iter. I would associate each iterator with the FileReader it should read from and implement it as a lightweight encapsulation of a read index for FileReader::(read|moveTo). The most important method to overwtite is
boost::iterator_facade<...>::advance(...)
which modifies the current index and tries to pull a new MESSAGE from the FileReader If this fails it flags the the iterator as invalid and dereferencing will fail.