Where is it specified whether Unicode identifiers

2019-02-16 21:32发布

问题:

I wanted to write some educational code in Haskell with Unicode characters (non-Latin) in the identifiers. (So that the identifiers look nice and natural for speakers of a natural language other than English which is not using the Latin characters in its writing.) So, I set out for finding an appropriate Haskell implementation that would allow this.

But where is this feature specified in the language specification? How would I refer to this feature when looking for a conforming implementation? (And which Haskell implemenations are known to actually support Unicode identifiers?)

It turned out that one Haskell implementation did accept my code with Unicode identifiers, whereas another one failed to accept it. I would like it if there were a way to formalize this requirement of my code, in a form of a language feature switch perhaps, so that if I or someone else tries to run my code, it would be immediately clear whether his implementation is missing the required feature and hence he should look for another one. (There could be also a wiki page for this feature--"Unicode identifiers", which would list which of the existing implementations support it, so that one would know where to go if one needs it.)

(BTW, I have put a "syntax" tag on this question, but I actually perceive it to be an issue of the level of lexing, a lower level than the syntax of a language. Is there a tag here for features of the lexing level of a language, rather than for features of the syntax specification of a language?)

回答1:

The Online Report documents this under Lexemes. It also notes early on that "Haskell uses the Unicode character set. However, source programs are currently biased toward the ASCII character set used in earlier versions of Haskell.".

Actual compilers may or may not support Unicode identifiers. GHC does, but you need to keep in mind that Unicode codepoints must obey the same rules as ASCII characters: types must start with a codepoint which is classed as uppercase or titlecase, variables as lowercase (although de facto this is relaxed to alphabetic and not uppercase/titlecase; this might be worth asking for a clarification from the language committee), operators must be punctuation or symbol. (This means that you can't declare types in Arabic, for example, unless you prefix them with a character in some other script that is uppercase/titlecase.)

As to collecting Unicode support information: while I don't know of a single page that provides it, searching for "unicode" on the Haskell Wiki finds information about Unicode support in a number of Haskell compilers.