I figured this would be easy to find, but I haven't been successful.
I need to be able to tokenize the following expression
(4 + 5) + myfunc('two words', 3, 5)
into
(
4
+
5
+
myfunc
(
'two words'
,
3
,
5
)
It seems like this is probably a common need, however I haven't been able to find any good documentation on this out there. Is this something I could do using regex? Anybody know of an existing way to do this?
I'm using C#, but if you have the answer in another language, don't be shy.
Thanks in advance.
If you are looking into a robust and powerful solution, you should definitively look into a lexical analyzer (like Antlr). However if what you need is just tokenization of simple expressions like the one you provided, you can achieve this result pretty easily:
// TODO Refactor and optimize this function
public IList<string> TokenizeExpression(string expr)
{
// TODO Add all your delimiters here
var delimiters = new[] { '(', '+', ')', ',' };
var buffer = string.Empty;
var ret = new List<string>();
expr = expr.Replace(" ", "");
foreach (var c in expr)
{
if (delimiters.Contains(c))
{
if (buffer.Length > 0) ret.Add(buffer);
ret.Add(c.ToString(CultureInfo.InvariantCulture));
buffer = string.Empty;
}
else
{
buffer += c;
}
}
return ret;
}
Example:
TokenizeExpression("(4 + 5) + myfunc('two words', 3, 5)") Count = 14
[0]: "("
[1]: "4"
[2]: "+"
[3]: "5"
[4]: ")"
[5]: "+"
[6]: "myfunc"
[7]: "("
[8]: "'twowords'"
[9]: ","
[10]: "3"
[11]: ","
[12]: "5"
[13]: ")"