Unescape escaped string?

2019-06-25 06:02发布

问题:

We store ContentDelimiter config (which we use to delimite the content) in Database as string (which could be "tab", i.e. \t , or new line \r\n)

Later on we would like to use this config, how would I go about converting \t (which is string, not chat) to tab char ?

Example:

string delimiterConfig =  config.GetDelimiter();
char[] delimiter = ConvertConfig(delimiterConfig);

How would ConvertConfig will look like so that it will parse all escaped strings back into chars so that "\t" string will become \tchar.

Any elegant solutions without using case statements and replace ?

回答1:

Here's an elegant solution with a switch statement, the Regex.Replace Method and a custom MatchEvaluator:

var input = @"This is indented:\r\n\tHello World";

var output = Regex.Replace(input, @"\\[rnt]", m =>
{
    switch (m.Value)
    {
    case @"\r": return "\r";
    case @"\n": return "\n";
    case @"\t": return "\t";
    default: return m.Value;
    }
});

Console.WriteLine(output);

Output:

This is indented:
        Hello World


回答2:

If by "better" solution, you mean faster:

static String Replace(String input)
    {
        if (input.Length <= 1) return input;

        // the input string can only get shorter
        // so init the buffer so we won't have to reallocate later
        char[] buffer = new char[input.Length];
        int outIdx = 0;
        for (int i = 0; i < input.Length; i++)
        {
            char c = input[i];
            if (c == '\\')
            {
                if (i < input.Length - 1)
                {
                    switch (input[i + 1])
                    {
                        case 'n':
                            buffer[outIdx++] = '\n';
                            i++;
                            continue;
                        case 'r':
                            buffer[outIdx++] = '\r';
                            i++;
                            continue;
                        case 't':
                            buffer[outIdx++] = '\t';
                            i++;
                            continue;
                    }
                }
            }

            buffer[outIdx++] = c;
        }

        return new String(buffer, 0, outIdx);
    }

This is significantly faster than using Regex. Especially when I tested against this input:

var input = new String('\\', 0x1000);

If by "better" you mean easier to read and maintain, then the Regex solution probably wins. There might also be bugs in my solution; I didn't test it very thoroughly.



回答3:

For the limited set of basic ASCII delimiters you also have a simple solution:

Regex.Unescape(input)

You can read all about it in the MSDN documentation, but basically it works with all of the Regex delimiters and whitespace literals.

Be aware that it throws on unknown escape sequences.



回答4:

If by better, you were referring to the lack escape sequences supported, then I suggest you check out my response to the question titled: Evaluate escaped string which handles standard escape sequences, octal escape sequences, and Unicode escape sequences. I hope you find this solution to be more elegant and fitting to your needs.



回答5:

What about ToCharArray method?

string x = "\r\n";
char[] delimeter = x.ToCharArray();