I am trying to parse a csv file, and I am trying to access names regex in proto regex in Perl6. It turns out to be Nil. What is the proper way to do it?
grammar rsCSV {
regex TOP { ( \s* <oneCSV> \s* \, \s* )* }
proto regex oneCSV {*}
regex oneCSV:sym<noQuote> { <-[\"]>*? }
regex oneCSV:sym<quoted> { \" .*? \" } # use non-greedy match
}
my $input = prompt("Enter csv line: ");
my $m1 = rsCSV.parse($input);
say "===========================";
say $m1;
say "===========================";
say "1 " ~ $m1<oneCSV><quoted>; # this fails; it is "Nil"
say "2 " ~ $m1[0];
say "3 " ~ $m1[0][2];
Thank you very much !!
lisprog
Detailed discussion complementing Christoph's answer
I am trying to parse a csv file
Perhaps you are focused on learning Perl 6 parsing and are writing some throwaway code. But if you want industrial strength CSV parsing out of the box, please be aware of the Text::CSV modules[1].
I am trying to access a named regex
If you are learning Perl 6 parsing, please be aware of jnthn's grammar tracer and debugger[2].
in proto regex in Perl6
Your issue is unrelated to it being a proto regex.
Instead the issue is that, while the match object corresponding to your named capture is stored in the overall match object you stored in $m1
, it is not stored precisely where you are looking for it.
Where do match objects corresponding to captures appear?
To see what's going on, I'll start by simulating what you were trying to do. I'll use a regex that declares just one capture, a "named" (aka "Associative") capture that matches the string ab
.
given 'ab'
{
my $m1 = m/ $<named-capture> = ( ab ) /;
say $m1<named-capture>;
# 「ab」
}
The match object corresponding to the named capture is stored where you'd presumably expect it to appear within $m1
, at $m1<named-capture>
.
But you were getting Nil with $m1<oneCSV>
. What gives?
Why your $m1<oneCSV>
did not work
There are two types of capture: named (aka "Associative") and numbered (aka "Positional"). The parens you wrote in your regex that surrounded <oneCSV>
introduced a numbered capture:
given 'ab'
{
my $m1 = m/ ( $<named-capture> = ( ab ) ) /; # extra parens added
say $m1[0]<named-capture>;
# 「ab」
}
The parens in / ( ... ) /
declare a single top level numbered capture. If it matches, then the corresponding match object is stored in $m1[0]
. (If your regex looked like / ... ( ... ) ... ( ... ) ... ( ... ) ... /
then another match object corresponding to what matches the second pair of parentheses would be stored in $m1[1]
, another in $m1[2]
for the third, and so on.)
The match result for $<named-capture> = ( ab )
is then stored inside $m1[0]
. That's why say $m1[0]<named-capture>
works.
So far so good. But this is only half the story...
Why $m1[0]<oneCSV>
in your code would not work either
While $m1[0]<named-capture>
in the immediately above code is working, you would still not get a match object in $m1[0]<oneCSV>
in your original code. This is because you also asked for multiple matches of the zeroth capture because you used a *
quantifier:
given 'ab'
{
my $m1 = m/ ( $<named-capture> = ( ab ) )* /; # * is a quantifier
say $m1[0][0]<named-capture>;
# 「ab」
}
Because the *
quantifier asks for multiple matches, Perl 6 writes a list of match objects into $m1[0]
. (In this case there's only one such match so you end up with a list of length 1, i.e. just $m1[0][0]
(and not $m1[0][1]
, $m1[0][2]
, etc.).)
Summary
captures nest;
a capture quantified by either *
or +
corresponds to two levels of nesting not just one.
In your original code, you'd have to write say $m1[0][0]<oneCSV>;
to get to the match object you're looking for.
[1] Install relevant modules and write use Text::CSV;
(for a pure Perl 6 implementation) or use Text::CSV:from<Perl5>;
(for a Perl 5 plus XS implementation) at the start of your code. (talk slides (click on top word, eg. "csv", to advance through slides), video, Perl 6 module, Perl 5 XS module.)
[2] Install relevant modules and write use Grammar::Tracer;
or use Grammar::Debugger;
at the start of your code`. (talk slides, video, modules.)
The match for <oneCSV>
lives within the scope of the capture group, which you get via $m1[0]
.
As the group is quantified with *
, the results will again be a list, ie you need another indexing operation to get at a match object, eg $m1[0][0]
for the first one.
The named capture can then be accessed by name, eg $m1[0][0]<oneCSV>
. This will already contain the match result of the appropriate branch of the protoregex.
If you want the whole list of matches instead of a specific one, you can use >>
or map
, eg $m1[0]>>.<oneCSV>
.