可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm currently working on a small project which requires loading messages from a file. The messages are stored sequentially in the file and files can become huge, so loading the entire file content into memory is unrewarding.
Therefore we decided to implement a FileReader
class that is capable of moving to specific elements in the file quickly and load them on request. Commonly used something along the following lines
SpecificMessage m;
FileReader fr;
fr.open("file.bin");
fr.moveTo(120); // Move to Message #120
fr.read(&m); // Try deserializing as SpecificMessage
The FileReader per se works great. Therefore we thought about adding STL compliant iterator support as well: A random access iterator that provides read-only references to specific messages. Used in the following way
for (auto iter = fr.begin<SpecificMessage>(); iter != fr.end<SpecificMessage>(); ++iter) {
// ...
}
Remark: the above assumes that the file only contains messages of type SpecificMessage. We've been using boost::iterator_facade
to simplify the implementation.
Now my question boils down to: how to implement the iterator correctly? Since FileReader
does not actually hold a sequence of messages internally, but loads them on request.
What we've tried so far:
Storing the message as an iterator member
This approach stores the message in the iterator instance. Which works great for simple use-cases but fails for more complex uses. E.g. std::reverse_iterator
has a dereference operation that looks like this
reference operator*() const
{ // return designated value
_RanIt _Tmp = current;
return (*--_Tmp);
}
This breaks our approach as a reference to a message from a temporary iterator is returned.
Making the reference type equal the value type
@DDrmmr in the comments suggested making the reference type equal the value type, so that a copy of the internally stored object is returned. However, I think this is not valid for the reverse iterator which implements the -> operator as
pointer operator->() const {
return (&**this);
}
which derefs itself, calls the *operator which then returns a copy of a temporary and finally returns the address of this temporary.
Storing the message externally
Alternatively I though about storing the message externally:
SpecificMessage m;
auto iter = fr.begin<SpecificMessage>(&m);
// ...
which also seems to be flawed for
auto iter2 = iter + 2
which will have both iter2
and iter
point to the same content.
回答1:
As I hinted in my other answer, you could consider using memory mapped files. In the comment you asked:
As far as memory mapped files is concerned, this seems not what I want to have, as how would you provide an iterator over SpecificMessages for them?
Well, if your SpecificMessage is a POD type, you could just iterate over the raw memory directly. If not, you could have a deserialization helper (as you already have) and use Boost transform_iterator
to do the deserialization on demand.
Note that we can make the memory mapped file managed, effectively meaning that you can just use it as a regular heap, and you can store all standard containers. This includes node-based containers (map<>
, e.g.), dynamic-size containers (e.g. vector<>
) in addition to the fixed-size containers (array<>
) - and any combinations of those.
Here's a demo that takes a simple SpecificMessage
that contains a string, and (de)derializes it directly into shared memory:
using blob_t = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;
The part that interests you would be the consuming part:
bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
std::cout << "blob: '" << first->contents << "'\n";
// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
So this prints each 13th message, in reverse order, followed by a random blob.
Full Demo
The sample online uses the lines of the sources as "messages".
Live On Coliru
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <iostream>
#include <boost/iterator/transform_iterator.hpp>
#include <boost/range/iterator_range.hpp>
static char const* DBASE_FNAME = "database.map";
namespace bip = boost::interprocess;
namespace shm {
using segment_manager = bip::managed_mapped_file::segment_manager;
template <typename T> using allocator = boost::container::scoped_allocator_adaptor<bip::allocator<T, segment_manager> >;
template <typename T> using vector = bip::vector<T, allocator<T> >;
}
using blob_t = shm::vector<uint8_t>;
using shared_blobs = shm::vector<blob_t>;
struct SpecificMessage {
// for demonstration purposes, just a string; could be anything serialized
std::string contents;
// trivial save/load serialization code:
template <typename Blob>
friend bool save(Blob& blob, SpecificMessage const& msg) {
blob.assign(msg.contents.begin(), msg.contents.end());
return true;
}
template <typename Blob>
friend bool load(Blob const& blob, SpecificMessage& msg) {
msg.contents.assign(blob.begin(), blob.end());
return true;
}
};
template <typename Message> struct LazyLoader {
using type = Message;
Message operator()(blob_t const& blob) const {
Message result;
if (!load(blob, result)) throw std::bad_cast(); // TODO custom excepion
return result;
}
};
///////
// for demo, create some database contents
void create_database_file() {
bip::file_mapping::remove(DBASE_FNAME);
bip::managed_mapped_file mmf(bip::open_or_create, DBASE_FNAME, 1ul<<20); // Even sparse file size is limited on Coliru
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
std::ifstream ifs("main.cpp");
std::string line;
while (std::getline(ifs, line)) {
table->emplace_back();
save(table->back(), SpecificMessage { line });
}
std::cout << "Created blob table consisting of " << table->size() << " blobs\n";
}
///////
void display_random_messages() {
bip::managed_mapped_file mmf(bip::open_only, DBASE_FNAME);
shared_blobs* table = mmf.find_or_construct<shared_blobs>("blob_table")(mmf.get_segment_manager());
using It = boost::transform_iterator<LazyLoader<SpecificMessage>, shared_blobs::const_reverse_iterator>;
// for fun, let's reverse the blobs
for (It first(table->rbegin()), last(table->rend()); first < last; first+=13)
std::cout << "blob: '" << first->contents << "'\n";
// any kind of random access is okay, though:
auto random = rand() % table->size();
SpecificMessage msg;
load(table->at(random), msg);
std::cout << "Random blob #" << random << ": '" << msg.contents << "'\n";
}
int main()
{
#ifndef CONSUMER_ONLY
create_database_file();
#endif
srand(time(NULL));
display_random_messages();
}
回答2:
You are having issues because your iterator does not conform to the forward iterator requirements. Specifically:
*i
must be an lvalue reference to value_type
or const value_type
([forward.iterators]/1.3)
*i
cannot be a reference to an object stored in the iterator itself, due to the requirement that two iterators are equal if and only if they are bound to the same object ([forward.iterators]/6)
Yes, these requirements are a huge pain in the butt, and yes, that means that things like std::vector<bool>::iterator
are not random access iterators even though some standard library implementations incorrectly claim that they are.
EDIT: The following suggested solution is horribly broken, in that dereferencing a temporary iterator returns a reference to an object that may not live until the reference is used. For example, after auto& foo = *(i + 1);
the object referenced by foo
may have been released. The implementation of reverse_iterator
referenced in the OP will cause the same problem.
I'd suggest that you split your design into two classes: FileCache
that holds the file resources and a cache of loaded messages, and FileCache::iterator
that holds a message number and lazily retrieves it from the FileCache
when dereferenced. The implementation could be something as simple as storing a container of weak_ptr<Message>
in FileCache
and a shared_ptr<Message>
in the iterator: Simple demo
回答3:
I have to admit I may not fully understand the trouble you have with holding the current MESSAGE as a member of Iter. I would associate each iterator with the FileReader it should read from and implement it as a lightweight encapsulation of a read index for FileReader::(read|moveTo). The most important method to overwtite is boost::iterator_facade<...>::advance(...)
which modifies the current index and tries to pull a new MESSAGE from the FileReader If this fails it flags the the iterator as invalid and dereferencing will fail.
template<class MESSAGE,int STEP>
class message_iterator;
template<class MESSAGE>
class FileReader {
public:
typedef message_iterator<MESSAGE, 1> const_iterator;
typedef message_iterator<MESSAGE,-1> const_reverse_iterator;
FileReader();
bool open(const std::string & rName);
bool moveTo(int n);
bool read(MESSAGE &m);
// get the total count of messages in the file
// helps us to find end() and rbegin()
int getMessageCount();
const_iterator begin() {
return const_iterator(this,0);
}
const_iterator end() {
return const_iterator(this,getMessageCount());
}
const_reverse_iterator rbegin() {
return const_reverse_iterator(this,getMessageCount()-1);
}
const_reverse_iterator rend() {
return const_reverse_iterator(this,-1);
}
};
// declaration of message_iterator moving over MESSAGE
// STEP is used to specify STEP size and direction (e.g -1 == reverse)
template<class MESSAGE,int STEP=1>
class message_iterator
: public boost::iterator_facade<
message_iterator<MESSAGE>
, const MESSAGE
, boost::random_access_traversal_tag
>
{
typedef boost::iterator_facade<
message_iterator<MESSAGE>
, const MESSAGE
, boost::random_access_traversal_tag
> super;
public:
// constructor associates an iterator with its FileReader and a given position
explicit message_iterator(FileReader<MESSAGE> * p=NULL,int n=0): _filereader(p),_idx(n),_valid(false) {
advance(0);
}
bool equal(const message_iterator & i) const {
return i._filereader == _filereader && i._idx == _idx;
}
void increment() {
advance(+1);
}
void decrement() {
advance(-1);
}
// overwrite with central functionality. Move to a given relative
// postion and check wether the position can be read. If move/read
// fails we flag the iterator as incalid.
void advance(int n) {
_idx += n*STEP;
if(_filereader!=NULL) {
if( _filereader->moveTo( _idx ) && _filereader->read(_m)) {
_valid = true;
return;
}
}
_valid = false;
}
// Return a ref to the currently cached MESSAGE. Throw
// an acception if positioning at this location in advance(...) failes.
typename super::reference dereference() const {
if(!_valid) {
throw std::runtime_error("access to invalid pos");
}
return _m;
}
private:
FileReader<MESSAGE> * _filereader;
int _idx;
bool _valid;
MESSAGE _m;
};
回答4:
Boost PropertyMap
You could avoid writing the bulk of the code using Boost PropertyMap:
Live On Coliru
#include <boost/property_map/property_map.hpp>
#include <boost/property_map/function_property_map.hpp>
using namespace boost;
struct SpecificMessage {
// add some data
int index; // just for demo
};
template <typename Message>
struct MyLazyReader {
typedef Message type;
std::string fname;
MyLazyReader(std::string fname) : fname(fname) {}
Message operator()(size_t index) const {
Message m;
// FileReader fr;
// fr.open(fname);
// fr.moveTo(index); // Move to Message
// fr.read(&m); // Try deserializing as SpecificMessage
m.index = index; // just for demo
return m;
}
};
#include <iostream>
int main() {
auto lazy_access = make_function_property_map<size_t>(MyLazyReader<SpecificMessage>("file.bin"));
for (int i=0; i<10; ++i)
std::cout << lazy_access[rand()%256].index << "\n";
}
Sample output is
103
198
105
115
81
255
74
236
41
205
Using Memory Mapped Files
You could store a map of index -> BLOB objects in a shared vector<array<byte, N>>
, flat_map<size_t, std::vector<uint8_t> >
or similar.
So, now you only have to deserialize from myshared_map[index].data()
(begin()
and end()
in case the BLOB size varies)