While reading the Xml grammar for perl6 (https://github.com/supernovus/exemel/blob/master/lib/XML/Grammar.pm6), I am having some difficulties understanding the following token.
token pident {
<!before \d> [ \d+ <.ident>* || <.ident>+ ]+ % '-'
}
More specifically <.ident>, there are no other definitions of ident, so I am assuming it is a reserved term. Though i cant find find a proper definition on perl6.org. Does anyone know what this means?
In general, the place to look for the documentation is Perl6 documentation. That's part of a regex, and you can find it in the definition of character classes. It matches Perl6 identifiers. What the
.
in front ofident
does is to suppress capture.First, as jj noted,
<.ident>
only matches, it doesn't capture, because of the.
. For the rest of this answer I'll generally omit the.
because it makes no difference to the rule's meaning besides the capture aspect.The
<ident>
in the code you quoted calls a "rule" which does an equivalent of:What the official doc says
The official doc says, in effect:
But this is misleading doc.
It's not a character class. (Character class rules match a single character in that character class.
<ident>
matches one or more characters that fit a pattern, albeit one that involves character classes.)All rules are default rules. (I think the default comment is there to emphasize that you can write your own
<ident>
rule if you don't like the built-in pattern, something that's also true but generally much less sensical for rules that correspond to canonical character classes such as<lower>
.)It doesn't match identifiers, or more accurately it fails to match many Perl6 identifiers. (See the How the
<ident>
rule is used in matching Perl 6 identifiers section at the end of this answer for an explanation of what it does as part of matching an identifier.)The rest of this answer
Feel free to consider this question sufficiently answered and move on to something else. For those interested in more detail I've written the following sections:
The high level logic that calls the
<ident>
ruleThe mid level logic that's in the
<ident>
ruleThe machine level code that implements the
<ident>
ruleThe specification of the
<ident>
ruleHow the
<ident>
rule is used in matching Perl 6 identifiersThe high level logic that calls the
<ident>
ruleThe
grammar XML::Grammar
statement introduces a user defined Perl 6 grammar.A grammar is a class. ("Grammars are really just slightly specialized classes".)
A rule is a method. (
say rule { ... } ~~ Method; # True
.)Any class created using a declaration of the form
grammar foo { ... }
inherits from theGrammar
class which in turn inherits from theMatch
class.In the Rakudo Perl 6 compiler, the
Match
classdoes
the roleNQPMatchRole
.NQPMatchRole
includes anident
rule (in this case with a regularmethod
declaration).Because
grammar XML::Grammar
does not declare anident
rule, the<ident>
call dispatches to the method inNQPMatchRole
following normal method dispatch logic.The mid level logic that's in the
<ident>
ruleNQPMatchRole
is written in the nqp language, a subset of Perl 6 used to bootstrap the full Perl 6, and the heart of NQP, a compiler toolkit.Excerpting and reformatting just the most salient code from the
ident
method declaration that's written in nqp, the rule begins with:This matches if the first character is either a
_
(95
is the ASCII code / Unicode codepoint for an underscore) or a character matching a character class defined in NQP calledCCLASS_ALPHABETIC
.The other bit of salient code is:
This matches zero or more subsequent characters in the character class
CCLASS_WORD
.A search of NQP for
CCLASS_ALPHABETIC
shows several matches. The most useful seems to be an NQP test file. While this file makes it clear thatCCLASS_WORD
is a superset ofCCLASS_ALPHABETIC
, it doesn't make it clear what those classes actually match.The machine level code that implements the
<ident>
ruleNQP targets multiple "backends" or concrete virtual machines.
Given the relative paucity of Rakudo/NQP doc/tests of what these rules and character classes actually match, one has to look at one of these backends to verify what's what.
MoarVM is the only formally supported backend.
A search of MoarVM for
CCLASS
shows several matches.The important one seems to be ops.c which includes a
switch (cclass)
statement which in turn includes cases forMVM_CCLASS_ALPHABETIC
andMVM_CCLASS_WORD
that correspond to NQP's similarly named constants.According to the code's comments:
CCLASS_ALPHABETIC
currently matches exactly the same characters as the full Perl 6 or NQP<:L>
rule, i.e. the characters Unicode has classified as "Letters".I think that means
<alpha>
is equivalent to the union ofCCLASS_ALPHABETIC
and_
(underscore).CCLASS_WORD
matches the same plus<:Nd>
, i.e. decimal digits (in any human language, not just English).I think that means the full Perl 6 level / NQP
<alnum>
rule is equivalent toCCLASS_WORD
.The specification of the
<ident>
ruleThe official specification of Perl 6 is embodied in roast1.
A search of roast for
ident
shows several matches.Most use
<ident>
only incidentally. The specification requires that they work as shown, but you won't understand what<ident>
is supposed to do by looking at incidental usage.Three tests clearly test
<ident>
itself. One of those is essentially redundant, leaving two. I see no changes between the6.c
and6.c.errata
versions of these two matches:From S05-mass/rx.t:
ok
tests that its first argument returnsTrue
. This call tests that<ident>
skips2+3
and matchesab2
.From S05-mass/charsets.t:
is
tests that its first argument matches its second. This call tests what theident
rule matches from a string consisting of the first 256 Unicode codepoints (the Latin-1 character set).Here's a variation of this test that more clearly shows the matching that happens:
prints:
<ident>
actually matches a lot more than just a hundred or so characters from Latin-1. So, while the above tests cover what<ident>
is officially specified/tested to match, they clearly don't cover the full picture.So let's look at the other source commonly associated with "specification", the historical Perl 6 design docs.
First, we note the warning at the top of the design docs:
The term "specs" in this warning is short for "specification". As already explained, the official specification test suite is roast.
(Some people still think of these historical design docs as "specifications" too, and refer to them as "specs", but a more apt way to look at things is that "specs" as applied to the design docs is short for "speculations" or perhaps "specious nonsense" if the former doesn't make the point clear enough that they're not something to be fully relied upon.)
A search for
ident
in design.perl6.org shows several matches.The most useful match is in the Predefined Subrules section of S05:
So, now we see where the docs got their notion of the meaning of
<ident>
from.How the
<ident>
rule is used in matching Perl 6 identifiersIn nqp's grammar, which is defined in NQP's Grammar.nqp, there's:
In Perl 6's grammar, which is defined in Rakudo's Grammar.nqp, there's code that looks slightly different but has the exact same effect:
So
<identifier>
matches a pattern that includes one or more<ident>
s.The
ident
method is inNQPMatchRole
, which means it's a built-in that's part of the rule namespace of users' grammars.The
identifier
methods are not exported by either Perl 6 or nqp so they're not part of the rule namespace of users' grammars.If I write my own
indentifier
token we can see it in action:displays:
To summarize the above and some other considerations:
<ident>
matches just parts of what<identifier>
matches (though they're the same for the simplest names). Consideris-prime
which is a Perl 6 identifier but contains two<ident>
matches;<identifier>
matches just parts of "Perl 6 identifiers" (though they're the same for the simplest names). Considerinfix:<+>
which is sometimes also referred to as a Perl 6 identifier but requires both an<identifier>
match and a colon pair pattern match;Perl 6 identifiers match just parts of names (though they're the same for the simplest names). Consider
Foo-Bar::Baz-Qux
which contains two<identifier>
matches (each in turn containing two<ident>
matches).1 The official specification of Perl 6 is a test suite called roast -- the Repository Of All Specification Tests. The latest version of a specific branch of roast defines a specific version of Perl 6. So far, there have only been two official branches/versions of roast, and therefore Perl 6. The first was/is
6.c
aka6.Christmas
. This was cut on Christmas day 2015 and has been deliberately left frozen since that day. The second is6.c.errata
, which very conservatively adds corrections to6.c
deemed backwards compatible enough and/or too important not to be available as the current official recommended version of Perl 6. An "officially compliant" Perl 6 compiler passes some official branch of roast. The Rakudo compiler passes6.c.errata
.If you read all the tests involving a feature in, say, the
6.c.errata
branch of roast, then you'll have read a full definition of the officially specified meaning of that feature for the6.c.errata
version of the Perl 6 language.