-->

What's allowed in a Perl 6 identifier?

2019-04-06 02:01发布

问题:

Synopsis 2 says:

An identifier is composed of an alphabetic character followed by any sequence of alphanumeric characters. The definitions of alphabetic and numeric include appropriate Unicode characters. Underscore is always considered alphabetic. An identifier may also contain isolated apostrophes or hyphens provided the next character is alphabetic.

Syntax in the Perl 6 docs says:

Identifiers are a grammatical building block that occur in several places. An identifier is a primitive name, and must start with an alphabetic character (or an underscore), followed by zero or more word characters (alphabetic, underscore or number). You can also embed dashes - or single quotes ' in the middle, but not two in a row.

The term "appropriate Unicode character" begs the question that we know what appropriate is.

I find that to be too vague if I'm going to choose beyond ASCII characters. I find in Perl6::Grammar this production, but not the definition for <.ident>:

token identifier {
    <.ident> [ <.apostrophe> <.ident> ]*
}

But this also begs the question that you have to know what an identifier is to define an identifier. So, where is <.ident>?

raiph points out that <.ident> is the ident method in QRegex::Cursor, but that defines it in terms of nqp::const::CCLASS_WORD. Now I have to track down that.


I tried to use U+00B2 (SUPERSCRIPT TWO) (General categories No, Other_Number) because I wanted to pass around the result of an expensive square operation, and hey, Perl 6 is supposed to allow this:

my $a² = $a**2;

But, it turns out that ², along with the other superscripts, are operators. That's fine, but ² and the like aren't listed as an operator or in Int or the behavior Int inherits:

$ perl6 -e 'my $Δ² = 6; say $*PERL; say $Δ²'
Use of uninitialized value of type Any in numeric context  in block <unit> at -e line 1
Cannot modify an immutable Int
  in block <unit> at -e line 1

$ perl6 -e 'my $Δ = 6; say $*PERL; say $Δ²'
Perl 6 (6.c)
36

$ perl6 -e 'my $Δ = 6; say $*PERL; say $Δ³'
Perl 6 (6.c)
216

$ perl6 -e 'my $Δ = 6; say $*PERL; say $Δ⁹'
Perl 6 (6.c)
10077696

But I can't use ½ U+00BD (VULGAR FRACTION ONE HALF) (General categories of No and Other_Number):

$ perl6 -e 'my $Δ½ = 6; say $*PERL; say $Δ½'
===SORRY!=== Error while compiling -e
Bogus postfix
at -e:1
------> my $Δ⏏½ = 6; say $*PERL; say $Δ½
    expecting any of:
        constraint
        infix
        infix stopper
        postfix
        statement end
        statement modifier
        statement modifier loop

But, what if I don't put a number in ?

$ perl6 -e 'my $Δ = "foo"; say $*PERL; say $Δ²'
Cannot convert string to number: base-10 number must begin with valid digits or '.' in '⏏foo' (indicated by ⏏)

in block at -e line 1

Actually thrown at:

in block at -e line 1

I was worried that someone defining a postfix operator could break the language, but this seems to work:

$ perl6 -e 'multi sub postfix:<Δ>(Int $n) { 137 }; say  6Δ;'
137

$ perl6 -e 'multi sub postfix:<Δ>(Int $n) { 137 }; my $ΔΔ = 6; say $ΔΔ;'
6

$ perl6 -e 'multi sub postfix:<Δ>(Int $n) { 137 }; my $Δ = 6; say $ΔΔ;'===SORRY!=== Error while compiling -e
Variable '$ΔΔ' is not declared
at -e:1
------> fix:<Δ>(Int $n) { 137 }; my $Δ = 6; say ⏏$ΔΔ;

So, what's going on there?

回答1:

The grammar has an identifer defined as

token apostrophe {
    <[ ' \- ]>
}

token identifier {
    <.ident> [ <.apostrophe> <.ident> ]*
}

with ident a method on cursors which accepts input that starts with a CCLASS_ALPHABETIC character or an underscore _ and continues with zero or more CCLASS_WORD characters.

These classes are implemented in MoarVM and map to various Unicode categories.

Specifically, CCLASS_ALPHABETIC checks for Letter, Lowercase; Letter, Uppercase; Letter, Titlecase; Letter, Modifier and Letter, Other.

CCLASS_WORD additionally accepts characters of category Number, Decimal Digit as well as undercores.

As to why postfix operators do not break identifiers, that's due to longest token matching.

If you want to call a postfix operator Δ on a variable , you have to add a backslash, ie

multi sub postfix:<Δ>(Int $n) { 137 };
my $Δ = 6;
say $Δ\Δ;

or an 'unspace'

say $Δ\   Δ;


标签: unicode perl6