C# Extension Method - String Split that also accep

2019-01-24 10:47发布

I'd like to write an extension method for the .NET String class. I'd like it to be a special varation on the Split method - one that takes an escape character to prevent splitting the string when a escape character is used before the separator.

What's the best way to write this? I'm curious about the best non-regex way to approach it.
Something with a signature like...

public static string[] Split(this string input, string separator, char escapeCharacter)
{
   // ...
}

UPDATE: Because it came up in one the comments, the escaping...

In C# when escaping non-special characters you get the error - CS1009: Unrecognized escape sequence.

In IE JScript the escape characters are throw out. Unless you try \u and then you get a "Expected hexadecimal digit" error. I tested Firefox and it has the same behavior.

I'd like this method to be pretty forgiving and follow the JavaScript model. If you escape on a non-separator it should just "kindly" remove the escape character.

10条回答
做自己的国王
2楼-- · 2019-01-24 11:45
public static string[] Split(this string input, string separator, char escapeCharacter)
{
    Guid g = Guid.NewGuid();
    input = input.Replace(escapeCharacter.ToString() + separator, g.ToString());
    string[] result = input.Split(new string []{separator}, StringSplitOptions.None);
    for (int i = 0; i < result.Length; i++)
    {
        result[i] = result[i].Replace(g.ToString(), escapeCharacter.ToString() + separator);
    }

    return result;
}

Probably not the best way of doing it, but it's another alternative. Basically, everywhere the sequence of escape+seperator is found, replace it with a GUID (you can use any other random crap in here, doesn't matter). Then use the built in split function. Then replace the guid in each element of the array with the escape+seperator.

查看更多
Root(大扎)
3楼-- · 2019-01-24 11:47

I had this problem as well and didn't find a solution. So I wrote such a method myself:

    public static IEnumerable<string> Split(
        this string text, 
        char separator, 
        char escapeCharacter)
    {
        var builder = new StringBuilder(text.Length);

        bool escaped = false;
        foreach (var ch in text)
        {
            if (separator == ch && !escaped)
            {
                yield return builder.ToString();
                builder.Clear();
            }
            else
            {
                // separator is removed, escape characters are kept
                builder.Append(ch);
            }
            // set escaped for next cycle, 
            // or reset unless escape character is escaped.
            escaped = escapeCharacter == ch && !escaped;
        }
        yield return builder.ToString();
    }

It goes in combination with Escape and Unescape, which escapes the separator and escape character and removes escape characters again:

    public static string Escape(this string text, string controlChars, char escapeCharacter)
    {
        var builder = new StringBuilder(text.Length + 3);
        foreach (var ch in text)
        {
            if (controlChars.Contains(ch))
            {
                builder.Append(escapeCharacter);
            }
            builder.Append(ch);
        }
        return builder.ToString();
    }

    public static string Unescape(string text, char escapeCharacter)
    {
        var builder = new StringBuilder(text.Length);
        bool escaped = false;
        foreach (var ch in text)
        {
            escaped = escapeCharacter == ch && !escaped;
            if (!escaped)
            {
                builder.Append(ch);
            }
        }
        return builder.ToString();
    }

Examples for escape / unescape

separator = ','
escapeCharacter = '\\'
//controlCharacters is always separator + escapeCharacter

@"AB,CD\EF\," <=> @"AB\,CD\\EF\\\,"

Split:

@"AB,CD\,EF\\,GH\\\,IJ" => [@"AB", @"CD\,EF\\", @"GH\\\,IJ"]

So to use it, Escape before Join, and Unescape after Split.

查看更多
▲ chillily
4楼-- · 2019-01-24 11:50

Here is solution if you want to remove the escape character.

public static IEnumerable<string> Split(this string input, 
                                        string separator, 
                                        char escapeCharacter) {
    string[] splitted = input.Split(new[] { separator });
    StringBuilder sb = null;

    foreach (string subString in splitted) {
        if (subString.EndsWith(escapeCharacter.ToString())) {
            if (sb == null)
                sb = new StringBuilder();
            sb.Append(subString, 0, subString.Length - 1);
        } else {
            if (sb == null)
                yield return subString;
            else {
                sb.Append(subString);
                yield return sb.ToString();
                sb = null;
            }
        }
    }
    if (sb != null)
        yield return sb.ToString();
}
查看更多
干净又极端
5楼-- · 2019-01-24 11:50

You can try something like this. Although, I would suggest implementing with unsafe code for performance critical tasks.

public static class StringExtensions
{
    public static string[] Split(this string text, char escapeChar, params char[] seperator)
    {
        return Split(text, escapeChar, seperator, int.MaxValue, StringSplitOptions.None);
    }

    public static string[] Split(this string text, char escapeChar, char[] seperator, int count)
    {
        return Split(text, escapeChar, seperator, count, StringSplitOptions.None);
    }

    public static string[] Split(this string text, char escapeChar, char[] seperator, StringSplitOptions options)
    {
        return Split(text, escapeChar, seperator, int.MaxValue, options);
    }

    public static string[] Split(this string text, char escapeChar, char[] seperator, int count, StringSplitOptions options)
    {
        if (text == null)
        {
            throw new ArgumentNullException("text");
        }

        if (text.Length == 0)
        {
            return new string[0];
        }

        var segments = new List<string>();

        bool previousCharIsEscape = false;
        var segment = new StringBuilder();

        for (int i = 0; i < text.Length; i++)
        {
            if (previousCharIsEscape)
            {
                previousCharIsEscape = false;

                if (seperator.Contains(text[i]))
                {
                    // Drop the escape character when it escapes a seperator character.
                    segment.Append(text[i]);
                    continue;
                }

                // Retain the escape character when it escapes any other character.
                segment.Append(escapeChar);
                segment.Append(text[i]);
                continue;
            }

            if (text[i] == escapeChar)
            {
                previousCharIsEscape = true;
                continue;
            }

            if (seperator.Contains(text[i]))
            {
                if (options != StringSplitOptions.RemoveEmptyEntries || segment.Length != 0)
                {
                    // Only add empty segments when options allow.
                    segments.Add(segment.ToString());
                }

                segment = new StringBuilder();
                continue;
            }

            segment.Append(text[i]);
        }

        if (options != StringSplitOptions.RemoveEmptyEntries || segment.Length != 0)
        {
            // Only add empty segments when options allow.
            segments.Add(segment.ToString());
        }

        return segments.ToArray();
    }
}
查看更多
登录 后发表回答