I'm reading through some emulator code and I've countered something truly odd:
switch (reg){
case 'eax':
/* and so on*/
}
How is this possible? I thought you could only switch
on integral types. Is there some macro trickery going on?
I'm reading through some emulator code and I've countered something truly odd:
switch (reg){
case 'eax':
/* and so on*/
}
How is this possible? I thought you could only switch
on integral types. Is there some macro trickery going on?
As other have said, this is an
int
constant and its actual value is implementation-defined.I assume the rest of the code looks something like
You can be sure that 'eax' in the first part has the same value as 'eax' in the second part, so it all works out, right? ... wrong.
In a comment @Davislor lists some possible values for 'eax':
Notice the first potential value? That is just
'e'
, ignoring the other two characters. The problem is the program probably uses'eax'
,'ebx'
, and so on. If all these constants have the same value as'e'
you end up withThis doesn't look too good, does it?
The good part about "implementation-defined" is that the programmer can check the documentation of their compiler and see if it does something sensible with these constants. If it does, home free.
The bad part is that some other poor fellow can take the code and try to compile it using some other compiler. Instant compile error. The program is not portable.
As @zwol pointed out in the comments, the situation is not quite as bad as I thought, in the bad case the code doesn't compile. This will at least give you an exact file name and line number for the problem. Still, you will not have a working program.
(Only you can answer the "macro trickery" part - unless you paste up more code. But there's not much here for macros to work on - formally you are not allowed to redefine keywords; the behaviour on doing that is undefined.)
In order to achieve program readability, the witty developer is exploiting implementation defined behaviour.
'eax'
is not a string, but a multi-character constant. Note very carefully the single quotation characters aroundeax
. Most likely it is giving you anint
in your case that's unique to that combination of characters. (Quite often each character occupies 8 bits in a 32 bitint
). And everyone knows you canswitch
on anint
!Finally, a standard reference:
The C99 standard says:
According to the C Standard (6.8.4.2 The switch statement)
and (6.6 Constant expressions)
Now what is
'eax'
?The C Standard (6.4.4.4 Character constants)
So
'eax'
is an integer character constant according to the paragraph 10 of the same sectionSo according to the first mentioned quote it can be an operand of an integer constant expression that may be used as a case label.
Pay attention to that a character constant (enclosed in single quotes) has type
int
and is not the same as a string literal (a sequence of characters enclosed in double quotes) that has a type of a character array.The code fragment uses an historical oddity called multi-character character constant, also referred to as multi-chars.
'eax'
is an integer constant whose value is implementation defined.Here is an interesting page on multi-chars and how they can be used but should not:
http://www.zipcon.net/~swhite/docs/computers/languages/c_multi-char_const.html
Looking back further away into the rearview mirror, here is how the original C manual by Dennis Ritchie from the good old days ( https://www.bell-labs.com/usr/dmr/www/cman.pdf ) specified character constants.
The last phrase is all you need to remember about this curious construction: Character constants with more than one character are inherently machine-dependent and should be avoided.