Regular expression for String.Format-like utility

2020-04-16 08:33发布

问题:

I'm writing a class called StringTemplate, which allows to format objects like with String.Format, but with names instead of indexes for placeholders. Here's an example :

string s = StringTemplate.Format("Hello {Name}. Today is {Date:D}, and it is {Date:T}.",
                                 new { Name = "World", Date = DateTime.Now });

To achieve this result, I look for placeholders and replace them with indexes. I then pass the resulting format string to String.Format.

This works fine, except when there are doubled braces, which are an escape sequence. The desired behavior (which is the same as String.Format) is described below :

  • "Hello {Name}" should be formatted as "Hello World"
  • "Hello {{Name}}" should be formatted as "Hello {Name}"
  • "Hello {{{Name}}}" should be formatted as "Hello {World}"
  • "Hello {{{{Name}}}}" should be formatted as "Hello {{Name}}"

And so on...

But my current regular expression doesn't detect the escape sequence, and always considers the substring between brackets as a placeholder, so I get things like "Hello {0}".

Here's my current regular expression :

private static Regex _regex = new Regex(@"{(?<key>\w+)(?<format>:[^}]+)?}", RegexOptions.Compiled);

How can I modify this regular expression to ignore escaped braces ? What seems really hard is that I should detect placeholders depending on whether the number of brackets is odd or even... I can't think of a simple way to do it with a regular expression, is it even possible ?


For completeness, here's the full code of the StringTemplate class :

public class StringTemplate
{
    private string _template;
    private static Regex _regex = new Regex(@"{(?<key>\w+)(?<format>:[^}]+)?}", RegexOptions.Compiled);

    public StringTemplate(string template)
    {
        if (template == null)
            throw new ArgumentNullException("template");
        this._template = template;
    }

    public static implicit operator StringTemplate(string s)
    {
        return new StringTemplate(s);
    }

    public override string ToString()
    {
        return _template;
    }

    public string Format(IDictionary<string, object> values)
    {
        if (values == null)
        {
            throw new ArgumentNullException("values");
        }

        Dictionary<string, int> indexes = new Dictionary<string, int>();
        object[] array = new object[values.Count];
        int i = 0;
        foreach (string key in values.Keys)
        {
            array[i] = values[key];
            indexes.Add(key, i++);
        }

        MatchEvaluator evaluator = (m) =>
        {
            if (m.Success)
            {
                string key = m.Groups["key"].Value;
                string format = m.Groups["format"].Value;
                int index = -1;
                if (indexes.TryGetValue(key, out index))
                {
                    return string.Format("{{{0}{1}}}", index, format);
                }
            }
            return string.Format("{{{0}}}", m.Value);
        };

        string templateWithIndexes = _regex.Replace(_template, evaluator);
        return string.Format(templateWithIndexes, array);
    }

    private static IDictionary<string, object> MakeDictionary(object obj)
    {
        Dictionary<string, object> dict = new Dictionary<string, object>();
        foreach (var prop in obj.GetType().GetProperties())
        {
            dict.Add(prop.Name, prop.GetValue(obj, null));
        }
        return dict;
    }

    public string Format(object values)
    {
        return Format(MakeDictionary(values));
    }

    public static string Format(string template, IDictionary<string, object> values)
    {
        return new StringTemplate(template).Format(values);
    }


    public static string Format(string template, object values)
    {
        return new StringTemplate(template).Format(values);
    }
}

回答1:

You can use a regex to match a balanced pair, then figure out what to do with the braces. Remember that .NET regexs aren't "regular".

class Program {
    static void Main(string[] args) {
        var d = new Dictionary<string, string> { { "Name", "World" } };
        var t = new Test();
        Console.WriteLine(t.Replace("Hello {Name}", d));
        Console.WriteLine(t.Replace("Hello {{Name}}", d));
        Console.WriteLine(t.Replace("Hello {{{Name}}}", d));
        Console.WriteLine(t.Replace("Hello {{{{Name}}}}", d));
        Console.ReadKey();
    }
}

class Test {

    private Regex MatchNested = new Regex(
        @"\{ (?>
                ([^{}]+)
              | \{ (?<D>)
              | \} (?<-D>)
              )*
              (?(D)(?!))
           \}",
             RegexOptions.IgnorePatternWhitespace
           | RegexOptions.Compiled 
           | RegexOptions.Singleline);

    public string Replace(string input, Dictionary<string, string> vars) {
        Matcher matcher = new Matcher(vars);
        return MatchNested.Replace(input, matcher.Replace);
    }

    private class Matcher {

        private Dictionary<string, string> Vars;

        public Matcher(Dictionary<string, string> vars) {
            Vars = vars;
        }

        public string Replace(Match m) {
            string name = m.Groups[1].Value;
            int length = (m.Groups[0].Length - name.Length) / 2;
            string inner = (length % 2) == 0 ? name : Vars[name];
            return MakeString(inner, length / 2);
        }

        private string MakeString(string inner, int braceCount) {
            StringBuilder sb = new StringBuilder(inner.Length + (braceCount * 2));
            sb.Append('{', braceCount);
            sb.Append(inner);
            sb.Append('}', braceCount);
            return sb.ToString();
        }

    }

}


回答2:

Parity is generally very easy to decide using regular expressions. For example, this is an expression that matches any string with an even number of As, but not an odd number:

(AA)*

So all you need to do is find the expression that matches only an odd number of {s and }s.

{({{)*
}(}})* 

(escaping the characters notwithstanding). So adding this idea to you current expression will yield something like

{({{)*(?<key>\w+)(?<format>:[^}]+)?}(}})*

However, this doesn't match the cardinality of braces on both sides. In other words, {{{ will match }, because they're both odd. Regular expressions can't count things, so you're not going to be able to find an expression that matches cardinality like you want.

Really, what you should be doing is parsing the strings with a custom parser that reads the string and counts instances of { but not instances of {{ in order to match them against instances of } but not }} on the other side. I think you'll find this is how String formatters in .NET work behind the scenes anyway, as regular expressions aren't suited for parsing nested structures of any kind.

Or you can use both ideas in concert: match potential tokens with a regular expression, then validate their braces balance using a quick check on the resulting match. That would probably end up being confusing and indirect, though. You're usually better off writing your own parser for this kind of scenario.



回答3:

I eventually used a technique similar to what Gavin suggested.

I changed the regular expression so that it matches all braces around the placeholder :

private static Regex _regex = new Regex(@"(?<open>{+)(?<key>\w+)(?<format>:[^}]+)?(?<close>}+)", RegexOptions.Compiled);

And I changed the logic the MatchEvaluator so that it handles escaped braces properly :

        MatchEvaluator evaluator = (m) =>
        {
            if (m.Success)
            {
                string open = m.Groups["open"].Value;
                string close = m.Groups["close"].Value;
                string key = m.Groups["key"].Value;
                string format = m.Groups["format"].Value;

                if (open.Length % 2 == 0)
                    return m.Value;

                open = RemoveLastChar(open);
                close = RemoveLastChar(close);

                int index = -1;
                if (indexes.TryGetValue(key, out index))
                {
                    return string.Format("{0}{{{1}{2}}}{3}", open, index, format, close);
                }
                else
                {
                    return string.Format("{0}{{{{{1}}}{2}}}{3}", open, key, format, close);
                }
            }
            return m.Value;
        };

I rely on String.Format to throw a FormatException if necessary. I made a few unit tests, and so far it seems to work fine...

Thanks everyone for your help !



回答4:

I came across a similar problem. In my case the key was purely numeric and there was no format option. The following Regex was doing the trick:

Regex r = new Regex(@"
  (?<! { ) 
    { (?<before> (?: {{ )* ) 
      (?<key> \d+) 
    } (?<after>  (?: }} )* )
  (?! } )
", RegexOptions.Compiled | RegexOptions.IgnorePatternWhitespace);

It allowed me to simply replace a given numeric key, surrounded by curly braces, like this:

s = r.Replace(s, "${before}_replacement_${after}"));