I was hanging out in my profiler for a while trying to figure out how to speed up a common log parser which was bottlenecked around the date parsing, and I tried various algorithms to speed things up.
The thing I tried that was fastest for me was also by far the most readable, but potentially non-standard C.
This worked quite well in GCC, icc, and my really old and picky SGI compiler. As it's a quite readable optimization, where doesn't it do what I want?
static int parseMonth(const char *input) {
int rv=-1;
int inputInt=0;
int i=0;
for(i=0; i<4 && input[i]; i++) {
inputInt = (inputInt << 8) | input[i];
}
switch(inputInt) {
case 'Jan/': rv=0; break;
case 'Feb/': rv=1; break;
case 'Mar/': rv=2; break;
case 'Apr/': rv=3; break;
case 'May/': rv=4; break;
case 'Jun/': rv=5; break;
case 'Jul/': rv=6; break;
case 'Aug/': rv=7; break;
case 'Sep/': rv=8; break;
case 'Oct/': rv=9; break;
case 'Nov/': rv=10; break;
case 'Dec/': rv=11; break;
}
return rv;
}
As mentioned by others, that code throws a bunch of warnings and is probably not endian-safe.
Was your original date parser hand-written as well? Have you tried strptime(3)?
I only know what the C Standard says about this (C99):
(6.4.4.4/10 taken from a draft)
So it's implementation defined. Meaning it is not guaranteed it works the same everywhere, but the behavior must be documented by the implementation. For example if
int
is only 16 bits wide in a particular implementation, then'Jan/'
can't be represented anymore like you intend it (char
must be at least 8 bits, while a character literal is always of typeint
).National Instrument's CVI 8.5 for Windows compiler fails on your original code with multiple warnings:
and errors of the form:
It succeeds on Jonathan's code.
The fact that a four character constant is equivalent to an particular 32-bit integer is a non-standard feature often seen on compilers for MS Windows and Mac computers (and PalmOS, AFAICR).
On theses systems a four character string is commonly used as a tag for identifying chunks of data files, or as an application / data-type identifier (e.g. "APPL").
It's a convenience then for the developer that they can store such a string into various data-structures without worrying about zero-byte termination, pointers, etc.
I get warnings, but no errors (gcc). Seems to compile and operate fine. May not work for big-endian systems, though!
I wouldn't suggest this method, though. Perhaps you can xor instead of or-shift, to create a single byte. Then use the case statement on a byte (or, faster, use a LUT of the first N bits).
I'd sure love to see the profiling that shows this is your most significant bottleneck, but in any case if you're going to pull something like this, use a union instead of 50 instructions looping and shifting. Here's a little example program, I'll leave it to you to fit it into your program.
Here's example output: