I am starting to write BibTeX parser. The first thing I would like to do is to parse a braced item. A braced item could be an author field or a title for example. There might be nested braces within the field. The following code does not handle nested braces:
use v6;
my $str = q:to/END/;
author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},
END
$str .= chomp;
grammar ExtractBraced {
rule TOP {
'author=' <braced-item> .*
}
rule braced-item { '{' <-[}]>* '}' }
}
ExtractBraced.parse( $str ).say;
Output:
「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」
braced-item => 「{Belayneh, M. and Geiger, S. and Matth{\"{a}」
Now, in order to make the parser accept nested braces, I would like to keep a counter of the number of opening braces currently parsed and when encountering a closing brace, we decrement the counter. If the counter reaches zero, we assume that we have parsed the complete item.
To follow this idea, I tried to split up the braced-item
regex, to implement an grammar action on each char. (The action method on the braced-item-char
regex below should then handle the brace-counter):
grammar ExtractBraced {
rule TOP {
'author=' <braced-item> .*
}
rule braced-item { '{' <braced-item-char>* '}' }
rule braced-item-char { <-[}]> }
}
However, suddenly now the parsing fails. Probably a silly mistake, but I cannot see why it should fail now?
Without knowing how you want the resultant data to look I would change it to look something like this:
If you want a bit more structure It might look a bit more like this:
Note that the
+
on<-[{}]>+
is an optimization, as well as<before '{'>
, both can be omitted and it will still work.