Is there a C compiler that fails to compile this?-第2页回答

I was hanging out in my profiler for a while trying to figure out how to speed up a common log parser which was bottlenecked around the date parsing, and I tried various algorithms to speed things up.

The thing I tried that was fastest for me was also by far the most readable, but potentially non-standard C.

This worked quite well in GCC, icc, and my really old and picky SGI compiler. As it's a quite readable optimization, where doesn't it do what I want?

static int parseMonth(const char *input) {
    int rv=-1;
    int inputInt=0;
    int i=0;

    for(i=0; i<4 && input[i]; i++) {
        inputInt = (inputInt << 8) | input[i];
    }

    switch(inputInt) {
        case 'Jan/': rv=0; break;
        case 'Feb/': rv=1; break;
        case 'Mar/': rv=2; break;
        case 'Apr/': rv=3; break;
        case 'May/': rv=4; break;
        case 'Jun/': rv=5; break;
        case 'Jul/': rv=6; break;
        case 'Aug/': rv=7; break;
        case 'Sep/': rv=8; break;
        case 'Oct/': rv=9; break;
        case 'Nov/': rv=10; break;
        case 'Dec/': rv=11; break;
    }
    return rv;
}

标签： c

13条回答

等我变得足够好

2楼-- · 2020-02-03 07:15

As mentioned by others, that code throws a bunch of warnings and is probably not endian-safe.

Was your original date parser hand-written as well? Have you tried strptime(3)?

0人赞添加讨论(0) 举报

Emotional °昔

3楼-- · 2020-02-03 07:17

I only know what the C Standard says about this (C99):

The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-deﬁned. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

(6.4.4.4/10 taken from a draft)

So it's implementation defined. Meaning it is not guaranteed it works the same everywhere, but the behavior must be documented by the implementation. For example if int is only 16 bits wide in a particular implementation, then 'Jan/' can't be represented anymore like you intend it (char must be at least 8 bits, while a character literal is always of type int).

0人赞添加讨论(0) 举报

Fickle 薄情

4楼-- · 2020-02-03 07:17

National Instrument's CVI 8.5 for Windows compiler fails on your original code with multiple warnings:

  Warning: Excess characters in multibyte character literal ignored.

and errors of the form:

  Duplicate case label '77'.

It succeeds on Jonathan's code.

0人赞添加讨论(0) 举报

乱世女痞

5楼-- · 2020-02-03 07:19

The fact that a four character constant is equivalent to an particular 32-bit integer is a non-standard feature often seen on compilers for MS Windows and Mac computers (and PalmOS, AFAICR).

On theses systems a four character string is commonly used as a tag for identifying chunks of data files, or as an application / data-type identifier (e.g. "APPL").

It's a convenience then for the developer that they can store such a string into various data-structures without worrying about zero-byte termination, pointers, etc.

0人赞添加讨论(0) 举报

手持菜刀，她持情操

6楼-- · 2020-02-03 07:20

I get warnings, but no errors (gcc). Seems to compile and operate fine. May not work for big-endian systems, though!

I wouldn't suggest this method, though. Perhaps you can xor instead of or-shift, to create a single byte. Then use the case statement on a byte (or, faster, use a LUT of the first N bits).

0人赞添加讨论(0) 举报

家丑人穷心不美

7楼-- · 2020-02-03 07:22

I'd sure love to see the profiling that shows this is your most significant bottleneck, but in any case if you're going to pull something like this, use a union instead of 50 instructions looping and shifting. Here's a little example program, I'll leave it to you to fit it into your program.

/* union -- demonstrate union for characters */

#include <stdio.h>

union c4_i {
    char c4[5];
    int  i ;
} ;

union c4_i ex;

int main (){
    ex.c4[0] = 'a';
    ex.c4[1] = 'b';
    ex.c4[2] = 'c';
    ex.c4[3] = 'd';
    ex.c4[4] = '\0';
    printf("%s 0x%08x\n", ex.c4, ex.i );
    return 0;
}

Here's example output:

bash $ ./union
abcd 0x64636261
bash $

0人赞添加讨论(0) 举报

Is there a C compiler that fails to compile this?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间