I am trying to write a macro to expand a set of rules into code that perform token matching, but am unable to generate the proper code without causing macro expansion errors. I know that I can handle this other ways, but the key question here is not how to parse tokens but rather how to write a macro that can recursively expand a token tree with match arms.
The idea is that we want to read a token from the string and print it out. More code need to be added to turn it into something more useful, but this example serves to illustrate the situation:
#[derive(Debug, PartialEq)]
enum Digit {
One,
Two,
Three,
Ten,
Eleven,
}
#[test]
fn test1(buf: &str) {
let buf = "111";
let token = parse!(buf, {
'1' => Digit::One,
'2' => Digit::Two,
'3' => Digit::Three,
});
assert_eq!(token, Some(Digit::One));
}
The code we want to generate from this example is:
fn test1(buf: &str) {
let token = {
let mut chars = buf.chars().peekable();
match chars.peek() {
Some(&'1') => {
chars.next().unwrap();
Digit::One
}
Some(&'2') => {
chars.next().unwrap();
Digit::Two
}
Some(&'3') => {
chars.next().unwrap();
Digit::Three
}
Some(_) | None => None,
}
};
assert_eq!(token, Some(Digit::One));
}
Ignore the fact that we do not read more tokens from the string and hence the chars.next().unwrap()
is not very useful. It will be useful later.
The macro for generating the above code is straightforward:
macro_rules! parse {
($e:expr, { $($p:pat => $t:expr),+ }) => {
{
let mut chars = $e.chars().peekable();
match chars.peek() {
$(Some(&$p) => {
chars.next().unwrap();
Some($t)
},)+
Some(_) | None => None
}
}
};
}
Let us now expand this example to handle a little more advanced matching and allow it to read multiple characters with lookahead, so only if the characters match certain patterns. If not, the extraneous characters should not be read. We create a token tree with match arms in a similar way to the previous example, but here we want to support a recursive structure:
#[test]
fn test2() {
let buf = "111";
let token = parse!(buf, {
'1' => {
'0' => Digit::Ten,
'1' => Digit::Eleven,
_ => Digit::One,
},
'2' => Digit::Two,
'3' => Digit::Three
});
assert_eq!(token, Some(Digit::Eleven));
}
The code we want to generate from this example is:
fn test2() {
let buf = "111";
let token = {
let mut chars = buf.chars().peekable();
match chars.peek() {
Some(&'1') => {
chars.next().unwrap();
match chars.peek() {
Some(&'0') => {
chars.next().unwrap();
Some(Digit::Ten)
},
Some(&'1') => {
chars.next().unwrap();
Some(Digit::Eleven)
},
Some(_) | None => Some(Digit::One)
}
},
Some(&'2') => {
chars.next().unwrap();
Some(Digit::Two)
},
Some(&'3') => {
chars.next().unwrap();
Some(Digit::Three)
},
Some(_) | None => None,
}
};
assert_eq!(token, Some(Digit::Eleven));
}
Trying to write a macro to handle this could work roughly like this:
macro_rules! expand {
($t:tt) => {{
chars.next().unwrap();
inner!($t)
}};
($e:expr) => {{
chars.next().unwrap();
Some($e)
}};
}
macro_rules! inner {
($i:ident, { $($p:pat => ???),+ }) => {
match $i.peek() {
$( Some(&$p) => expand!($i, ???), )+
Some(_) | None => None
}
};
}
macro_rules! parse {
($e:expr, $t:tt) => {
{
let mut chars = $e.chars().peekable();
inner!(chars, $t)
}
};
}
However, I am unable to find something to replace the ???
in the inner!
macro with something that matches either an expression or a token tree.
Something like
$e:expr
will not be able to match a token tree at this point.Something like
$t:tt
does not match the enum constantDigit::Two
, which is a perfectly valid expression.Something like
$($rest:tt)*
as a generic matcher will fail since the Kleene-star closure is greedy and will try to match the following comma.A recursive macro matching the items one by one, e.g., a pattern along the lines
{ $p:pat => $t:expr, $($rest:tt)* }
will not be possible to expand inside thematch
statement in theinner!
macro since that expect something that syntactically looks like... => ...
, so this expansion gives an error claiming that it expect a=>
after the macro:match $e.peek() { Some(&$p) => ...$t..., inner!($rest) ^ Expect => here }
This looks like one of the syntactic requirements mentioned in the book.
Changing the syntax of the matching part does not allow use of the pat
requirement since that need to be followed by a =>
(according to the macro chapter in the book).
When you need to branch based on different matches inside repetitions like this, you need to do incremental parsing.
So.
This is the entry point for the macro. It sets up the outer-most layer, and feeds the input into a general parsing rule. We pass down
chars
so the deeper layers can find it.Termination rule: once we run out of input (modulo some commas), dump the accumulated match arm code fragments into a
match
expression, and append the final catch-all arm.Alternately, if the catch-all arm is specified, use that.
This handles the recursion. If we see a block, we advance
$chars
and parse the contents of the block with an empty code accumulator. The result of all this is appended to the current accumulator (i.e.$($arms)
).The non-recursive case.
And, for completeness, the rest of the test code. Note that I had to change
test1
, as it wasn't a valid test.