Validate a Boolean expression with brackets in C#

2020-04-02 18:25发布

问题:

I want to validate a string in C# that contains a Boolean expression with brackets. The string should only contain numbers 1-9, round brackets, "OR" , "AND".

Examples of good strings:

"1 AND 2"

"2 OR 4"

"4 AND (3 OR 5)"

"2"

And so on...

I am not sure if Regular Expression are flexible enough for this task. Is there a nice short way of achieving this in C# ?

回答1:

It's probably simpler to do this with a simple parser. But you can do this with .NET regex by using balancing groups and realizing that if the brackets are removed from the string you always have a string matched by a simple expression like ^\d+(?:\s+(?:AND|OR)\s+\d+)*\z.

So all you have to do is use balancing groups to make sure that the brackets are balanced (and are in the right place in the right form).

Rewriting the expression above a bit:

(?x)^
OPENING
\d+
CLOSING
(?:
    \s+(?:AND|OR)\s+
    OPENING
    \d+
    CLOSING
)*
BALANCED
\z

((?x) makes the regex engine ignore all whitespace and comments in the pattern, so it can be made more readable.)

Where OPENING matches any number (0 included) of opening brackets:

\s* (?: (?<open> \( ) \s* )*

CLOSING matches any number of closing brackets also making sure that the balancing group is balanced:

\s* (?: (?<-open> \) ) \s* )*

and BALANCED performs a balancing check, failing if there are more open brackets then closed:

(?(open)(?!))

Giving the expression:

(?x)^
\s* (?: (?<open> \( ) \s* )*
\d+
\s* (?: (?<-open> \) ) \s* )*
(?:
    \s+(?:AND|OR)\s+
    \s* (?: (?<open> \( ) \s* )*
    \d+
    \s* (?: (?<-open> \) ) \s* )*
)*
(?(open)(?!))
\z

If you do not want to allow random spaces remove every \s*.

Example

See demo at IdeOne. Output:

matched: '2'
matched: '1 AND 2'
matched: '12 OR 234'
matched: '(1) AND (2)'
matched: '(((1)) AND (2))'
matched: '1 AND 2 AND 3'
matched: '1 AND (2 OR (3 AND 4))'
matched: '1 AND (2 OR 3) AND 4'
matched: ' ( 1    AND ( 2 OR  ( 3 AND    4 )  )'
matched: '((1 AND 7) OR 6) AND ((2 AND 5) OR (3 AND 4))'
matched: '(1)'
matched: '(((1)))'
failed:  '1 2'
failed:  '1(2)'
failed:  '(1)(2)'
failed:  'AND'
failed:  '1 AND'
failed:  '(1 AND 2'
failed:  '1 AND 2)'
failed:  '1 (AND) 2'
failed:  '(1 AND 2))'
failed:  '(1) AND 2)'
failed:  '(1)() AND (2)'
failed:  '((1 AND 7) OR 6) AND (2 AND 5) OR (3 AND 4))'
failed:  '((1 AND 7) OR 6) AND ((2 AND 5 OR (3 AND 4))'
failed:  ''


回答2:

If you just want to validate the input string, you can write a simple parser. Each method consumes a certain kind of input (digit, brackets, operator) and returns the remaining string after matching. An exception is thrown if no match can be made.

public class ParseException : Exception { }

public static class ExprValidator
{
    public static bool Validate(string str)
    {
        try
        {
            string term = Term(str);
            string stripTrailing = Whitespace(term);

            return stripTrailing.Length == 0;
        }
        catch(ParseException) { return false; }
    }

    static string Term(string str)
    {
        if(str == string.Empty) return str;
        char current = str[0];

        if(current == '(')
        {
            string term = LBracket(str);
            string rBracket = Term(term);
            string temp = Whitespace(rBracket);
            return RBracket(temp);
        }
        else if(Char.IsDigit(current))
        {
            string rest = Digit(str);
            try
            {
                //possibly match op term
                string op = Op(rest);
                return Term(op);
            }
            catch(ParseException) { return rest; }
        }
        else if(Char.IsWhiteSpace(current))
        {
            string temp = Whitespace(str);
            return Term(temp);
        }
        else throw new ParseException();
    }

    static string Op(string str)
    {
        string t1 = Whitespace_(str);
        string op = MatchOp(t1);
        return Whitespace_(op);
    }

    static string MatchOp(string str)
    {
        if(str.StartsWith("AND")) return str.Substring(3);
        else if(str.StartsWith("OR")) return str.Substring(2);
        else throw new ParseException();
    }

    static string LBracket(string str)
    {
        return MatchChar('(')(str);
    }

    static string RBracket(string str)
    {
        return MatchChar(')')(str);
    }

    static string Digit(string str)
    {
        return MatchChar(Char.IsDigit)(str);
    }

    static string Whitespace(string str)
    {
        if(str == string.Empty) return str;

        int i = 0;
        while(i < str.Length && Char.IsWhiteSpace(str[i])) { i++; }

        return str.Substring(i);
    }

    //match at least one whitespace character
    static string Whitespace_(string str)
    {
        string stripFirst = MatchChar(Char.IsWhiteSpace)(str);
        return Whitespace(stripFirst);
    }

    static Func<string, string> MatchChar(char c)
    {
        return MatchChar(chr => chr == c);
    }

    static Func<string, string> MatchChar(Func<char, bool> pred)
    {
        return input => {
            if(input == string.Empty) throw new ParseException();
            else if(pred(input[0])) return input.Substring(1);
            else throw new ParseException();
        };
    }
}


回答3:

Pretty simply:

At first stage you must determ lexems (digit, bracket or operator) with simple string comparsion.

At second stage you must define variable of count of closed bracket (bracketPairs), which can be calculated by the following algorithm for each lexem:

if current lexem is '(', then bracketPairs++;

if current lexem is ')', then bracketPairs--.

Else do not modify bracketPairs.

At the end if all lexems are known and bracketPairs == 0 then input expression is valid.

The task is a bit more complex, if it's necesery to build AST.



回答4:

what you want are "balanced groups", with them you can get all bracet definitions, then you just need a simple string parsing

http://blog.stevenlevithan.com/archives/balancing-groups

http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition



回答5:

If you consider a boolean expression as generated by a formal grammar writing a parser is easier.

I made an open source library to interpret simple boolean expressions. You can take a look at it on GitHub, in particular look at the AstParser class and Lexer.



回答6:

ANTLR Parser Generator?

a short way of achieving this in C#

Although it may be an overkill if its just numbers and OR + AND