I was trying to solve a problem on InterviewStreet. After some time I determine that I was actually spending the bulk of my time reading the input. This particular question had a lot of input, so that makes some amount of sense. What doesn't make sense is why the varying methods of input had such different performances:
Initially I had:
std::string command;
std::cin >> command;
Replacing it made it noticeably faster:
char command[5];
cin.ignore();
cin.read(command, 5);
Rewriting everything to use scanf made it even faster
char command;
scanf("get_%c", &command);
All told I cut the time reading the input down by about a 1/3.
I'm wondering there is such a variation in performance between these different methods. Additionally, I'm wondering why using gprof didn't highlight the time I was spending in I/O, rather seeming to point the blame to my algorithm.
There is a big variation in these routines because console input speed almost never matters.
And where it does (Unix shell) the code is written in C, reads directly from the stdin device and is efficient.
At the risk of being downvoted, I/O streams are, in general, slower and bulkier than their C counterparts. That's not a reason to avoid using them though in many purposes as they are safer (ever run into a scanf or printf bug? Not very pleasant) and more general (ex: overloaded insertion operator allowing you to output user-defined types). But I'd also say that's not a reason to use them dogmatically in very performance-critical code.
I do find the results a bit surprising though. Out of the three you listed, I would have suspected this to be fastest:
char command[5];
cin.ignore();
cin.read(command, 5);
Reason: no memory allocations needed and straightforward reading of a character buffer. That's also true of your C example below, but calling scanf to read a single character repeatedly isn't anywhere close to optimal either even at the conceptual level, as scanf must parse the format string you passed in each time. I'd be interested in the details of your I/O code as it seems that there is a reasonable possibility of something wrong happening when scanf calls to read a single character turn out to be the fastest. I just have to ask and without meaning to offend, but is the code truly compiled and linked with optimizations on?
Now as to your first example:
std::string command;
std::cin >> command;
We can expect this to be quite a bit slower than optimal for the reason that you're working with a variable-sized container (std::string) which will have to involve some heap allocations to read in the desired buffer. When it comes to stack vs. heap issues, the stack is always significantly faster, so if you can anticipate the maximum buffer size needed in a particular case, a simple character buffer on the stack will beat std::string for input (even if you used reserve). This is likewise true of an array on the stack as opposed to std::vector but these containers are best used for cases where you cannot anticipate the size in advance. Where std::string can be faster would be cases where people might be tempted to call strlen repeatedly where storing and maintaining a size variable would be better.
As to the details of gprof, it should be highlighting those issues. Are you looking at the full call graph as opposed to a flat profile? Naturally the flat profile could be misleading in this case. I'd have to know some further details on how you are using gprof to give a better answer.
gprof only samples during CPU time, not during blocked time.
So, a program may spend an hour doing I/O, and a microsecond doing computation, and gprof will only see the microsecond.
For some reason, this isn't well known.
By default, the standard iostreams are configured to work together and with the C stdio library — in practice this means using cin
and cout
for things other than interactive input and output tend to be slow.
To get good performance using cin
and cout
, you need to disable the synchronization with stdio. For high performance input, you might even want to untie the streams.
See the following stackoverflow question for more details.
How to get IOStream to perform better?