Is this reserve declaration hard to understand or

2019-08-21 02:57发布

问题:

I've got a problem with using a reserve (backslash) declaration for priority disambiguation. Below is a self-contained example. The production 'Ipv4Address' is a strict subset of 'Domain0'. In parsing URL's, though, you want dotted-quad addresses to be handled differently than domain names, so you want to split 'Domain0' into two parts; 'Domain1' is one of those two parts. The test suite included, however, is failing at 't3()', where 'Domain1' is accepting an IP address, which looks like it should be excluded.

Is this a problem with the reserve declaration, or is this a defect in the current version of Rascal? I'm on the 0.10.x unstable branch at present, per advice to see if that corrected a different problem (with the Tutor). I haven't checked with the stable branch since keeping them both installed means parallel Eclipse environments, which I haven't been motivated to do.

module grammar_test

import ParseTree;

syntax Domain0 = { Subdomain '.' }+;
syntax Domain1 = Domain0 \ IPv4Address ;
lexical Subdomain = [0-9A-Za-z]+ | [0-9A-Za-z]+'-'[a-zA-Z0-9\-]*[a-zA-Z0-9] ;
lexical IPv4Address = DecimalOctet '.' DecimalOctet '.' DecimalOctet '.' DecimalOctet ; 
lexical DecimalOctet = [0-9] | [1-9][0-9] | '1'[0-9][0-9] | '2'[0-4][0-9] | '25'[0-5] ;

test bool t1()
{
    return parseAccept(#IPv4Address, "192.168.0.1");
}   

test bool t2()
{
    return parseAccept(#Domain0, "192.168.0.1");
}   

test bool t3()
{
    return parseReject(#Domain1, "192.168.0.1");
}   

bool parseAccept( type[&T<:Tree] begin, str input )
{
    try
    {
        parse(begin, input, allowAmbiguity=false);
    }
    catch ParseError(loc _):
    {
        return false;
    }
    return true;
}

bool parseReject( type[&T<:Tree] begin, str input )
{
    try
    {
        parse(begin, input, allowAmbiguity=false);
    }
    catch ParseError(loc _):
    {
        return true;
    }
    return false;
}

This example has been cut down from larger code. I first encountered the error in a larger scope. Using the rule "IPv4Address | Domain1" was throwing an Ambiguity exception, which I tracked down to the behavior that "Domain1" was accepting something it shouldn't be. Curiously "IPv4Address > Domain1" was also throwing Ambiguity, but I'm guessing this has the same root cause as the present isolated example.

回答1:

The difference operator for keyword reservations currently only works correctly if the right-hand side is a finite language expressed as disjunction of literal keywords like "if" | "then" | "while" or a non-terminal which is defined like that: lexical X = "if" | "then" | "while". And then you can writeA \ X` for some effect.

For other types of non-terminals the parser is just generated but the \ constraint has no effect. You wrote Domain0 \ IPv4Address and IPv3Address does not hold to the above assumption.

(We should either add a warning about that or generate a parser which can implement the full semantics of language difference; but that's for another time).

Admittedly such a powerful difference operator could be used to express an some order of preference between non-terminals. Alas.

Possible (sketches of) solutions:

  • stage two passes solution: parse the input using the more general Subdomain syntax, then pattern and match rewrite in a single pass all quadruples to IPv4Address
  • maximal munch solution: adapt the grammar using follow restrictions to implement eager behavior for the IPv4Address, like {Subdomain !>> [.][0-9] "."}+ or something in that vain.


标签: rascal