Is there a C compiler that fails to compile this?

2020-02-03 06:19发布

I was hanging out in my profiler for a while trying to figure out how to speed up a common log parser which was bottlenecked around the date parsing, and I tried various algorithms to speed things up.

The thing I tried that was fastest for me was also by far the most readable, but potentially non-standard C.

This worked quite well in GCC, icc, and my really old and picky SGI compiler. As it's a quite readable optimization, where doesn't it do what I want?

static int parseMonth(const char *input) {
    int rv=-1;
    int inputInt=0;
    int i=0;

    for(i=0; i<4 && input[i]; i++) {
        inputInt = (inputInt << 8) | input[i];
    }

    switch(inputInt) {
        case 'Jan/': rv=0; break;
        case 'Feb/': rv=1; break;
        case 'Mar/': rv=2; break;
        case 'Apr/': rv=3; break;
        case 'May/': rv=4; break;
        case 'Jun/': rv=5; break;
        case 'Jul/': rv=6; break;
        case 'Aug/': rv=7; break;
        case 'Sep/': rv=8; break;
        case 'Oct/': rv=9; break;
        case 'Nov/': rv=10; break;
        case 'Dec/': rv=11; break;
    }
    return rv;
}

标签: c
13条回答
Lonely孤独者°
2楼-- · 2020-02-03 07:06

Comeau compiler

Comeau C/C++ 4.3.10.1 (Oct  6 2008 11:28:09) for ONLINE_EVALUATION_BETA2
Copyright 1988-2008 Comeau Computing.  All rights reserved.
MODE:strict errors C99 

"ComeauTest.c", line 11: warning: multicharacter character literal (potential
          portability problem)
          case 'Jan/': rv=0; break;
               ^

"ComeauTest.c", line 12: warning: multicharacter character literal (potential
          portability problem)
          case 'Feb/': rv=1; break;
               ^

"ComeauTest.c", line 13: warning: multicharacter character literal (potential
          portability problem)
          case 'Mar/': rv=2; break;
               ^

"ComeauTest.c", line 14: warning: multicharacter character literal (potential
          portability problem)
          case 'Apr/': rv=3; break;
               ^

"ComeauTest.c", line 15: warning: multicharacter character literal (potential
          portability problem)
          case 'May/': rv=4; break;
               ^

"ComeauTest.c", line 16: warning: multicharacter character literal (potential
          portability problem)
          case 'Jun/': rv=5; break;
               ^

"ComeauTest.c", line 17: warning: multicharacter character literal (potential
          portability problem)
          case 'Jul/': rv=6; break;
               ^

"ComeauTest.c", line 18: warning: multicharacter character literal (potential
          portability problem)
          case 'Aug/': rv=7; break;
               ^

"ComeauTest.c", line 19: warning: multicharacter character literal (potential
          portability problem)
          case 'Sep/': rv=8; break;
               ^

"ComeauTest.c", line 20: warning: multicharacter character literal (potential
          portability problem)
          case 'Oct/': rv=9; break;
               ^

"ComeauTest.c", line 21: warning: multicharacter character literal (potential
          portability problem)
          case 'Nov/': rv=10; break;
               ^

"ComeauTest.c", line 22: warning: multicharacter character literal (potential
          portability problem)
          case 'Dec/': rv=11; break;
               ^

"ComeauTest.c", line 1: warning: function "parseMonth" was declared but never
          referenced
  static int parseMonth(const char *input) {
             ^
查看更多
一夜七次
3楼-- · 2020-02-03 07:08

You're just computing a hash of those four characters. Why not predefine some integer constants that compute the hash in the same way and use those? Same readability and you're not depending on any implementation specific idiosyncrasies of the compiler.

uint32_t MONTH_JAN = 'J' << 24 + 'a' << 16 + 'n' << 8 + '/';
uint32_t MONTH_FEB = 'F' << 24 + 'e' << 16 + 'b' << 8 + '/';

...

static uint32_t parseMonth(const char *input) {
    uint32_t rv=-1;
    uint32_t inputInt=0;
    int i=0;

    for(i=0; i<4 && input[i]; i++) {
        inputInt = (inputInt << 8) | (input[i] & 0x7f); // clear top bit
    }

    switch(inputInt) {
        case MONTH_JAN: rv=0; break;
        case MONTH_FEB: rv=1; break;

        ...
    }

    return rv;
}
查看更多
贼婆χ
4楼-- · 2020-02-03 07:09
char *months = "Jan/Feb/Mar/Apr/May/Jun/Jul/Aug/Sep/Oct/Nov/Dec/";
char *p = strnstr(months, input, 4);
return p ? (p - months) / 4 : -1;
查看更多
SAY GOODBYE
5楼-- · 2020-02-03 07:09

There are at least 3 things that keep this program from being portable:

  1. Multi-character constants are implementation-defined so different compilers may handle them differently.
  2. A byte can be more than 8 bits, there is plenty of hardware where the smallest addressable unit of memory is 16 or even 32 bits, you often find this in DSPs for example. If a byte is more than 8 bits then so will char since char is by definition one byte long; your program will not function properly on such systems.
  3. Lastly, there are many machines where int is only 16-bits (which is the smallest size allowed for int) including embedded devices and legacy machines, your program will fail on these machines as well.
查看更多
对你真心纯属浪费
6楼-- · 2020-02-03 07:09

Machine word size issues aside, your compiler may promote input[i] to a negative integer which will just set the upper bits of inputInt with or operation, so I suggest you to be explicit about signedness of char variables.

But since in US, no one cares about the 8th bit, it is probably a non-issue for you.

查看更多
Rolldiameter
7楼-- · 2020-02-03 07:14
if ( !input[0] || !input[1] || !input[2] || input[3] != '/' )
    return -1;

switch ( input[0] )
{
    case 'F': return 1; // Feb
    case 'S': return 8; // Sep
    case 'O': return 9; // Oct
    case 'N': return 10; // Nov
    case 'D': return 11; // Dec;
    case 'A': return input[1] == 'p' ? 3 : 7; // Apr, Aug
    case 'M': return input[2] == 'r' ? 2 : 4; // Mar, May
    default: return input[1] == 'a' ? 0 : (input[2] == 'n' ? 5 : 6); // Jan, Jun, Jul
}

Slightly less readable and not so much validating, but perhaps even faster, no?

查看更多
登录 后发表回答