Detect parenthesis in BinaryExpression

2019-03-29 08:10发布

问题:

I am building a expression analyser from which I would like to generate database query code, I've gotten quite far but am stuck parsing BinaryExpressions accurately. It's quite easy to break them up into Left and Right but I need to detect parenthesis and generate my code accordingly and I cannot see how to do this.

An example [please ignore the flawed logic :)]:

a => a.Line2 != "1" && (a.Line2 == "a" || a.Line2 != "b") && !a.Line1.EndsWith("a")

I need to detect the 'set' in the middle and preserve their grouping but I cannot see any difference in the expression to a normal BinaryExpression during parsing (I would hate to check the string representation for parenthesis)

Any help would be appreciated.

(I should probably mention that I'm using C#)

--Edit-- I failed to mention that I'm using the standard .Net Expression classes to build the expressions (System.Linq.Expressions namespace)

--Edit2-- Ok I'm not parsing text into code, I'm parsing code into text. So my Parser class has a method like this:

void FilterWith<T>(Expression<Func<T, bool>> filterExpression);

which allows you to write code like this:

FilterWith<Customer>(c => c.Name =="asd" && c.Surname == "qwe");

which is quite easy to parse using the standard .Net classes, my challenge is parsing this expression:

FilterWith<Customer>(c => c.Name == "asd" && (c.Surname == "qwe" && c.Status == 1) && !c.Disabled)

my challenge is to keep the expressions between parenthesis as a single set. The .Net classes correctly splits the parenthesis parts from the others but gives no indication that it is a set due to the parenthesis.

回答1:

I haven't used Expression myself, but if it works anything like any other AST, then the problem is easier to solve than you make it out to be. As another commentor pointed out, just put parentheses around all of your binary expressions and then you won't have to worry about order of operations issues.

Alternatively, you could check to see if the expression you are generating is at a lower precedence than the containing expression and if so, put parenthesis around it. So if you have a tree like this [* 4 [+ 5 6]] (where tree nodes are represented recursively as [node left-subtree right-subtree]), you would know when writing out the [+ 4 5] tree that it was contained inside a * operation, which is higher precedence than a + operation and thus requires than any of its immediate subtrees be placed in parentheses. The pseudo-code could be something like this:

function parseBinary(node) {
    if(node.left.operator.precedence < node.operator.precedence)
        write "(" + parseBinary(node.left) + ")"
    else
        write parseBinary(node.left)
    write node.operator
    // and now do the same thing for node.right as you did for node.left above    
}

You'll need to have a table of precedence for the various operators, and a way to get at the operator itself to find out what it is and thence what its precedence is. However, I imagine you can figure that part out.



回答2:

When building a expression analyzer, you need first a parser, and for that you need a tokenizer.

A tokenizer is a piece of code that reading an expression, generates tokens (which can be valid or invalid), for a determined syntax.

So your parser, using the tokenizer, reads the expression in the established order (left-to right, right-to-left, top-to-bottom, whatever you choose) and creates a tree that maps the expression.

Then the analyzer interprets the tree into an expression, giving its definitive meaning.