Many of my programs output huge volumes of data for me to review on Excel. The best way to view all these files is to use a tab deliminated text format. Currently i use this chunk of code to get it done:
ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
for (int i = 0; i < dim; i++)
output << arrayPointer[j * dim + i] << " ";
output << endl;
}
This seems to be a very slow operation, is a more efficient way of outputting text files like this to the hard drive?
Update:
Taking the two suggestions into mind, the new code is this:
ofstream output (fileName.c_str());
for (int j = 0; j < dim; j++)
{
for (int i = 0; i < dim; i++)
output << arrayPointer[j * dim + i] << "\t";
output << "\n";
}
output.close();
writes to HD at 500KB/s
But this writes to HD at 50MB/s
{
output.open(fileName.c_str(), std::ios::binary | std::ios::out);
output.write(reinterpret_cast<char*>(arrayPointer), std::streamsize(dim * dim * sizeof(double)));
output.close();
}
I decided to test JPvdMerwe's claim that C stdio is faster than C++ IO streams. (Spoiler: yes, but not necessarily by much.) To do this, I used the following test programs:
Common wrapper code, omitted from programs below:
Program 1: normal synchronized C++ IO streams
Program 2: unsynchronized C++ IO streams
Same as program 1, except with
std::cout.sync_with_stdio(false);
prepended.Program 3: C stdio printf()
All programs were compiled with GCC 4.8.4 on Ubuntu Linux, using the following command:
and timed using the command:
Here are the results of the test on my laptop (measured in wall clock time):
I also ran the same test with
g++ -O2
to test the effect of optimization, and got the following results:-O2
: 3.118s (= 100%)-O2
: 2.943s (= 94%)-O2
: 2.734s (= 88%)(The last line is not a fluke; program 3 consistently runs slower for me with
-O2
than without it!)Thus, my conclusion is that, based on this test, C stdio is indeed about 10% to 25% faster for this task than (synchronized) C++ IO. Using unsynchronized C++ IO saves about 5% to 10% over synchronized IO, but is still slower than stdio.
Ps. I tried a few other variations, too:
Using
std::endl
instead of"\n"
is, as expected, slightly slower, but the difference is less than 5% for the parameter values given above. However, printing more but shorter output lines (e.g.-DROWS=1000000 -DCOLS=10
) makesstd::endl
more than 30% slower than"\n"
.Piping the output to a normal file instead of
/dev/null
slows down all the programs by about 0.2s, but makes no qualitative difference to the results.Increasing the line count by a factor of 10 also yields no surprises; the programs all take about 10 times longer to run, as expected.
Prepending
std::cout.sync_with_stdio(false);
to program 3 has no noticeable effect.Using
(double)(i-j)
(and"%g\t"
forprintf()
) slows down all three programs a lot! Notably, program 3 is still fastest, taking only 9.3s where programs 1 and 2 each took a bit over 14s, a speedup of nearly 40%! (And yes, I checked, the outputs are identical.) Using-O2
makes no significant difference either.does it have to be written in C? if not, there are many tools already written in C, eg (g)awk (can be used in unix/windows) that does the job of file parsing really well, also on big files.
Use '\t' instead of " "
Use C IO, it's a lot faster than C++ IO. I've heard of people in programming contests timing out purely because they used C++ IO and not C IO.
Just change
%d
to be the correct type.Don't use endl. It will be flushing the stream buffers, which is potentially very inefficient. Instead:
It may be faster to do it this way: