Tokenizing math expression with functions in C#

2019-09-01 05:15发布

问题:

I figured this would be easy to find, but I haven't been successful.

I need to be able to tokenize the following expression

(4 + 5) + myfunc('two words', 3, 5)

into

(
4
+
5
+
myfunc
(
'two words'
,
3
,
5
)

It seems like this is probably a common need, however I haven't been able to find any good documentation on this out there. Is this something I could do using regex? Anybody know of an existing way to do this?

I'm using C#, but if you have the answer in another language, don't be shy.

Thanks in advance.

回答1:

If you are looking into a robust and powerful solution, you should definitively look into a lexical analyzer (like Antlr). However if what you need is just tokenization of simple expressions like the one you provided, you can achieve this result pretty easily:

// TODO Refactor and optimize this function
        public IList<string> TokenizeExpression(string expr)
        {
            // TODO Add all your delimiters here
            var delimiters = new[] { '(', '+', ')', ',' };
            var buffer = string.Empty;
            var ret = new List<string>();
            expr = expr.Replace(" ", "");
            foreach (var c in expr)
            {
                if (delimiters.Contains(c))
                {
                    if (buffer.Length > 0) ret.Add(buffer);
                    ret.Add(c.ToString(CultureInfo.InvariantCulture));
                    buffer = string.Empty;
                }
                else
                {
                    buffer += c;
                }
            }
            return ret;
        }

Example:

TokenizeExpression("(4 + 5) + myfunc('two words', 3, 5)") Count = 14

[0]: "("
[1]: "4"
[2]: "+"
[3]: "5"
[4]: ")"
[5]: "+"
[6]: "myfunc"
[7]: "("
[8]: "'twowords'"
[9]: ","
[10]: "3"
[11]: ","
[12]: "5"
[13]: ")"