When I read from files in C++(11) I map them in to memory using:
boost::interprocess::file_mapping* fm = new file_mapping(path, boost::interprocess::read_only);
boost::interprocess::mapped_region* region = new mapped_region(*fm, boost::interprocess::read_only);
char* bytes = static_cast<char*>(region->get_address());
Which is fine when I wish to read byte by byte extremely fast. However, I have created a csv file which I would like to map to memory, read each line and split each line on the comma.
Is there a way I can do this with a few modifications of my above code?
(I am mapping to memory because I have an awful lot of memory and I do not want any bottleneck with disk/IO streaming).
Simply create an istringstream from your memory mapped bytes and parse that using :
Note that on many systems memory mapping isn't providing any speed benefit compared to sequential read. In both cases you will end up reading the data from the disk page by page, probably with the same amount of read ahead, and both the IO latency and bandwidth will be the same in both cases. Whether you have lots of memory or not won't make any difference. Also, depending on the system, memory_mapping, even read-only, might lead to surprising behaviours (e.g. reserving swap space) that don't that sometimes keep people busy troubleshooting.
Here's my take on "fast enough". It zips through 116 MiB of CSV (2.5Mio lines[1]) in ~1 second.
The result is then randomly accessible at zero-copy, so no overhead (unless pages are swapped out).
Here's the parser in all it's glory:
The only tricky thing (and the only optimization there) is the semantic action to construct a
CsvField
from the source iterator with the matches number of characters.Here's the main:
Printing
You can use the values just as
std::string
:Which prints eg.:
The fully working sample is here Live On Coliru
[1] I created the file by repeatedly appending the output of
to
csv.txt
, until it counted 2.5 million lines.