在一个字符串,以最快的方式更换多个字符?(Replacing multiple characters

2019-06-26 23:12发布

我进口的多记录一些数字string领域从旧数据库到新数据库。 这似乎是很慢,我怀疑这是因为我这样做:

foreach (var oldObj in oldDB)
{
    NewObject newObj = new NewObject();
    newObj.Name = oldObj.Name.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š')
        .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć')
        .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ');
    newObj.Surname = oldObj.Surname.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š')
        .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć')
        .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ');
    newObj.Address = oldObj.Address.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š')
        .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć')
        .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ');
    newObj.Note = oldObj.Note.Trim().Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š')
        .Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć')
        .Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ');
    /*
    ... some processing ...
    */
}

现在,我已经阅读过一些网络帖子和文章,我已经看到了这个许多不同的想法。 有人说这是更好的,如果我愿意做的正则表达式与MatchEvaluator ,有人说这是最好的,以保留原样。

虽然这是可能的,它会很容易,我只是做一个基准的情况下我自己,我决定在这里提出一个问题,以防别人一直想知道同样的问题,或者万一有人事先知道。

那么,什么是用C#这样做的最快方法?

编辑

我已经发布了标杆这里 。 在第一次看到它看起来像理查德的方法可能是最快的。 然而,他的方式,也不是马克的,会做,因为错误的正则表达式的东西。 从校正图案之后

@"\^@\[\]`\}~\{\\" 

@"\^|@|\[|\]|`|\}|~|\{|\\" 

它看起来好像与链.Replace(老办法)调用是最快毕竟

Answer 1:

感谢您输入家伙。 我写了一个快速和肮脏的基准来测试你的输入。 我已经测试解析4串用500.000迭代和已经做了4次。 结果如下:

*** Pass 1
Old (Chained String.Replace()) way completed in 814 ms
logicnp (ToCharArray) way completed in 916 ms
oleksii (StringBuilder) way completed in 943 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2551 ms
Richard (Regex w/ MatchEvaluator) way completed in 215 ms
Marc Gravell (Static Regex) way completed in 1008 ms

*** Pass 2
Old (Chained String.Replace()) way completed in 786 ms
logicnp (ToCharArray) way completed in 920 ms
oleksii (StringBuilder) way completed in 905 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2515 ms
Richard (Regex w/ MatchEvaluator) way completed in 217 ms
Marc Gravell (Static Regex) way completed in 1025 ms

*** Pass 3
Old (Chained String.Replace()) way completed in 775 ms
logicnp (ToCharArray) way completed in 903 ms
oleksii (StringBuilder) way completed in 931 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2529 ms
Richard (Regex w/ MatchEvaluator) way completed in 214 ms
Marc Gravell (Static Regex) way completed in 1022 ms

*** Pass 4
Old (Chained String.Replace()) way completed in 799 ms
logicnp (ToCharArray) way completed in 908 ms
oleksii (StringBuilder) way completed in 938 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2592 ms
Richard (Regex w/ MatchEvaluator) way completed in 225 ms
Marc Gravell (Static Regex) way completed in 1050 ms

对于这个基准测试的代码如下。 请检查代码并确认@Richard已经拿到了最快的方式。 请注意,我没有检查,如果输出是正确的,我认为他们是。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace StringReplaceTest
{
    class Program
    {
        static string test1 = "A^@[BCD";
        static string test2 = "E]FGH\\";
        static string test3 = "ijk`l}m";
        static string test4 = "nopq~{r";

        static readonly Dictionary<char, string> repl =
            new Dictionary<char, string> 
            { 
                {'^', "Č"}, {'@', "Ž"}, {'[', "Š"}, {']', "Ć"}, {'`', "ž"}, {'}', "ć"}, {'~', "č"}, {'{', "š"}, {'\\', "Đ"} 
            };

        static readonly Regex replaceRegex;

        static Program() // static initializer 
        {
            StringBuilder pattern = new StringBuilder().Append('[');
            foreach (var key in repl.Keys)
                pattern.Append(Regex.Escape(key.ToString()));
            pattern.Append(']');
            replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled);
        }

        public static string Sanitize(string input)
        {
            return replaceRegex.Replace(input, match =>
            {
                return repl[match.Value[0]];
            });
        } 

        static string DoGeneralReplace(string input) 
        { 
            var sb = new StringBuilder(input);
            return sb.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ').ToString(); 
        }

        //Method for replacing chars with a mapping 
        static string Replace(string input, IDictionary<char, char> replacementMap)
        {
            return replacementMap.Keys
                .Aggregate(input, (current, oldChar)
                    => current.Replace(oldChar, replacementMap[oldChar]));
        } 

        static void Main(string[] args)
        {
            for (int i = 1; i < 5; i++)
                DoIt(i);
        }

        static void DoIt(int n)
        {
            Stopwatch sw = new Stopwatch();
            int idx = 0;

            Console.WriteLine("*** Pass " + n.ToString());
            // old way
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = test1.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ');
                string result2 = test2.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ');
                string result3 = test3.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ');
                string result4 = test4.Replace('^', 'Č').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', 'Ć').Replace('`', 'ž').Replace('}', 'ć').Replace('~', 'č').Replace('{', 'š').Replace('\\', 'Đ');
            }
            sw.Stop();
            Console.WriteLine("Old (Chained String.Replace()) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            Dictionary<char, char> replacements = new Dictionary<char, char>();
            replacements.Add('^', 'Č');
            replacements.Add('@', 'Ž');
            replacements.Add('[', 'Š');
            replacements.Add(']', 'Ć');
            replacements.Add('`', 'ž');
            replacements.Add('}', 'ć');
            replacements.Add('~', 'č');
            replacements.Add('{', 'š');
            replacements.Add('\\', 'Đ');

            // logicnp way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                char[] charArray1 = test1.ToCharArray();
                for (int i = 0; i < charArray1.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test1[i], out newChar))
                        charArray1[i] = newChar;
                }
                string result1 = new string(charArray1);

                char[] charArray2 = test2.ToCharArray();
                for (int i = 0; i < charArray2.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test2[i], out newChar))
                        charArray2[i] = newChar;
                }
                string result2 = new string(charArray2);

                char[] charArray3 = test3.ToCharArray();
                for (int i = 0; i < charArray3.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test3[i], out newChar))
                        charArray3[i] = newChar;
                }
                string result3 = new string(charArray3);

                char[] charArray4 = test4.ToCharArray();
                for (int i = 0; i < charArray4.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test4[i], out newChar))
                        charArray4[i] = newChar;
                }
                string result4 = new string(charArray4);
            }
            sw.Stop();
            Console.WriteLine("logicnp (ToCharArray) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // oleksii way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = DoGeneralReplace(test1);
                string result2 = DoGeneralReplace(test2);
                string result3 = DoGeneralReplace(test3);
                string result4 = DoGeneralReplace(test4);
            }
            sw.Stop();
            Console.WriteLine("oleksii (StringBuilder) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // André Christoffer Andersen way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = Replace(test1, replacements);
                string result2 = Replace(test2, replacements);
                string result3 = Replace(test3, replacements);
                string result4 = Replace(test4, replacements);
            }
            sw.Stop();
            Console.WriteLine("André Christoffer Andersen (Lambda w/ Aggregate) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // Richard way
            sw.Reset();
            sw.Start();
            Regex reg = new Regex(@"\^|@|\[|\]|`|\}|~|\{|\\");
            MatchEvaluator eval = match =>
            {
                switch (match.Value)
                {
                    case "^": return "Č";
                    case "@": return "Ž";
                    case "[": return "Š";
                    case "]": return "Ć";
                    case "`": return "ž";
                    case "}": return "ć";
                    case "~": return "č";
                    case "{": return "š";
                    case "\\": return "Đ";
                    default: throw new Exception("Unexpected match!");
                }
            };
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = reg.Replace(test1, eval);
                string result2 = reg.Replace(test2, eval);
                string result3 = reg.Replace(test3, eval);
                string result4 = reg.Replace(test4, eval);
            }
            sw.Stop();
            Console.WriteLine("Richard (Regex w/ MatchEvaluator) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // Marc Gravell way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = Sanitize(test1);
                string result2 = Sanitize(test2);
                string result3 = Sanitize(test3);
                string result4 = Sanitize(test4);
            }
            sw.Stop();
            Console.WriteLine("Marc Gravell (Static Regex) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms\n");
        }
    }
}


Answer 2:

最快的方法

唯一的办法是自己比较性能。 尝试作为Q,使用StringBuilderRegex.Replace

但微基准不考虑整个系统的范围。 如果这种方法只对整个系统性能的一小部分可能并不重要整体应用程序的性能。

一些注意事项:

  1. 使用String如上(我认为)将创造大量的中间字符串:对于GC更多的工作。 但它很简单。
  2. 使用StringBuilder允许相同的底层数据与每个替换进行修改。 这产生更少的垃圾。 这几乎是使用简单String
  3. 使用regex是最复杂的(因为你需要有代码摸出更换),但允许一个单一的表达。 我希望这是更慢,除非更换的名单是非常大的,置换是在输入字符串中罕见(即最替代方法调用替换没什么,只是成本通过串搜索)。

我期待#2会稍快过重复使用(上千次),是由于较少的GC负载。

对于正则表达式的方法,你需要这样的东西:

newObj.Name = Regex.Replace(oldObj.Name.Trim(), @"[@^\[\]`}~{\\]", match => {
  switch (match.Value) {
    case "^": return "Č";
    case "@": return "Ž";
    case "[": return "Š";
    case "]": return "Ć";
    case "`": return "ž";
    case "}": return "ć";
    case "~": return "č";
    case "{": return "š";
    case "\\": return "Đ";
    default: throw new Exception("Unexpected match!");
  }
});

这可以在一个可重用的方式通过用参数化来完成Dictionary<char,char>持有的替代和可重复使用的MatchEvaluator



Answer 3:

试试这个:

Dictionary<char, char> replacements = new Dictionary<char, char>();
// populate replacements

string str = "mystring";
char []charArray = str.ToCharArray();

for (int i = 0; i < charArray.Length; i++)
{
    char newChar;
    if (replacements.TryGetValue(str[i], out newChar))
    charArray[i] = newChar;
}

string newStr = new string(charArray);


Answer 4:

一个可能的解决方案是使用一个StringBuilder类此。

您可以将代码重构首先向一个单一的方法

public string DoGeneralReplace(string input)
{
    var sb = new StringBuilder(input);
    sb.Replace("^", "Č")
      .Replace("@", "Ž") ...;
}


//usage
foreach (var oldObj in oldDB)
{
    NewObject newObj = new NewObject();
    newObj.Name = DoGeneralReplace(oldObj.Name);
    ...
}


Answer 5:

你可以使用lambda表达式的字符地图上的这个使用骨料:

  //Method for replacing chars with a mapping
  static string Replace(string input, IDictionary<char, char> replacementMap) {
      return replacementMap.Keys
          .Aggregate(input, (current, oldChar) 
              => current.Replace(oldChar, replacementMap[oldChar]));
  }

您可以按以下运行以下命令:

  private static void Main(string[] args) {
      //Char to char map using <oldChar, newChar>
      var charMap = new Dictionary<char, char>();
      charMap.Add('-', 'D'); charMap.Add('|', 'P'); charMap.Add('@', 'A');

      //Your input string
      string myString = "asgjk--@dfsg||jshd--f@jgsld-kj|rhgunfh-@-nsdflngs";

      //Your own replacement method
      myString = Replace(myString, charMap);

      //out: myString = "asgjkDDAdfsgPPjshdDDfAjgsldDkjPrhgunfhDADnsdflngs"
  }


Answer 6:

好吧,我会尝试做这样的事情:

    static readonly Dictionary<char, string> replacements =
       new Dictionary<char, string>
    {
        {']',"Ć"}, {'~', "č"} // etc
    };
    static readonly Regex replaceRegex;
    static YourUtilityType() // static initializer
    {
        StringBuilder pattern = new StringBuilder().Append('[');
        foreach(var key in replacements.Keys)
            pattern.Append(Regex.Escape(key.ToString()));
        pattern.Append(']');
        replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled);
    }
    public static string Sanitize(string input)
    {
        return replaceRegex.Replace(input, match =>
        {
            return replacements[match.Value[0]];
        });
    }

这有一个地方,以保持(顶部),并建立一个预编译的Regex来处理替代品。 所有的开销做只有一个(因此static )。



Answer 7:

使用IndexOfAny混合StringBuilder的方法:

protected String ReplaceChars(String sIn)
{
    int replChar = sIn.IndexOfAny(badChars);
    if (replChar < 0)
        return sIn;

    // Don't even bother making a copy unless you know you have something to swap
    StringBuilder sb = new StringBuilder(sIn, 0, replChar, sIn.Length + 10);
    while (replChar >= 0 && replChar < sIn.Length)
    {
        char? c = sIn[replChar];
        string s = null;
        // This approach lets you swap a char for a string or to remove some
        // If you had a straight char for char swap, you could just have your repl chars in an array with the same ordinals and do it all in 2 lines matching the ordinals.
        switch (c)
        {
            case "^": c = "Č";
            ...
            case '\ufeff': c = null; break;
        }
        if (s != null) sb.Append(s);
        else if (c != null) sb.Append(c);

        replChar++; // skip over what we just replaced
        if (replChar < sIn.Length)
        {
            int nextRepChar = sIn.IndexOfAny(badChars, replChar);
            sb.Append(sIn, replChar, (nextRepChar > 0 ? nextRepChar : sIn.Length) - replChar);
            replChar = nextRepChar;
        }
    }
    return sb.ToString();
}


文章来源: Replacing multiple characters in a string, the fastest way?