Non-ASCII characters in C

2019-02-22 09:00发布

问题:

I was looking at google go's runtime source code (at https://go.googlecode.com/hg/src/pkg/runtime/ ), and it seems they use a special character for their function names, · . (Look for example at https://go.googlecode.com/hg/src/pkg/runtime/cgocall.c ). Is this accepted across major compilers? It's not ANSI C, is it? Or is it just some macro magic?

Thank you!

回答1:

C90 doesn't allow additional character in identifier (over those in the basic characters set), C99 do (both with the universal character syntax -- \uXXXX and \UXXXXXXXX -- and an implementation defined set of other characters).

6.4.2.1/1 in C99:

identifier:
    identifier-nondigit
    identifier identifier-nondigit
    identifier digit
identifier-nondigit:
    nondigit
    universal-character-name
    other implementation-defined characters
nondigit: one of
    _ a b c d e f g h i j k l m
    n o p q r s t u v w x y z
    A B C D E F G H I J K L M
    N O P Q R S T U V W X Y Z
digit: one of
    0 1 2 3 4 5 6 7 8 9

I don't know how well it is supported by C implementations, I know that Plan9 C compiler could handle other characters before it was standardized.



回答2:

Do you mean the dot? It's character code 183 from ISO 8859-1 (ISO Latin-1) - it's an extended ASCII code corresponding (apparently) to the Georgian comma, aka "middle dot". It is actually a legal character.



回答3:

The C99 Standard "allows" (for sufficiently small values of "allow") 'strange characters'

5.1.1.2 Translation phases

1 The precedence among the syntax rules of translation is specified by the following phases.

  1. Physical source file multibyte characters are mapped, in an implementation defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.


回答4:

Using that middle dot is discussed here:

http://code.google.com/p/go/issues/detail?id=793

Basically, using that dot is not part of the spec, but there are some cases where it is necessary. Bootstrapping, runtime, or assembly.