Fastest way to implement Duplicate Character Remov

2020-02-08 09:01发布

In C#, what is the fastest way to detect duplicate characters in a String and remove them (removal including 1st instance of the duplicated character)?

Example Input: nbHHkRvrXbvkn

Example Output: RrX

标签: c# .net string
5条回答
放我归山
2楼-- · 2020-02-08 09:39

Here is a pretty fast one preserving order. But I'd be a little worried about how LINQ does Group and Where:

string s = "nbHHkRvrXbvkn";
Console.WriteLine( 
    s.ToCharArray()
        .GroupBy(c => c)
        .Where(g => g.Count() == 1)
        .Aggregate(new StringBuilder(), (b, g) => b.Append(g.Key)));

Edit: This one beats Luke's in some cases still slower than dtb's, but it preserves the order

private static string MyMethod(string s)
{
    StringBuilder sb = new StringBuilder(s.Length);
    foreach (var g in s.ToCharArray().GroupBy(c => c))
        if (g.Count() == 1) sb.Append(g.Key);

    return sb.ToString();
}
查看更多
女痞
3楼-- · 2020-02-08 09:42

Fastest as in fewest-lines-of-code:

var s = "nbHHkRvrXbvkn";
var duplicates = s.Where(ch => s.Count(c => c == ch) > 1);
var result = new string(s.Except(duplicates).ToArray()); // = "RrX"

Fastest as in fastest-performance would probably be something like this (does not preserve order):

var h1 = new HashSet<char>();
var h2 = new HashSet<char>();

foreach (var ch in "nbHHkRvrXbvkn")
{
    if (!h1.Add(ch))
    {
        h2.Add(ch);
    }
}

h1.ExceptWith(h2); // remove duplicates

var chars = new char[h1.Count];
h1.CopyTo(chars);
var result = new string(chars); // = "RrX"

Performance test

When in doubt -- test it :)

Yuriy Faktorovich's answer        00:00:00.2360900
Luke's answer                     00:00:00.2225683
My 'few lines' answer             00:00:00.5318395
My 'fast' answer                  00:00:00.1842144
查看更多
家丑人穷心不美
4楼-- · 2020-02-08 09:47

This one should be pretty quick (and it preserves the original order):

public static string RemoveDuplicates(string source)
{
    HashSet<char> found = new HashSet<char>();
    HashSet<char> dupes = new HashSet<char>();

    foreach (char c in source)
    {
        if (!found.Add(c))
        {
            dupes.Add(c);
        }
    }

    StringBuilder sb = new StringBuilder(source.Length);
    foreach (char c in source)
    {
        if (!dupes.Contains(c))
        {
            sb.Append(c);
        }
    }
    return sb.ToString();
}
查看更多
疯言疯语
5楼-- · 2020-02-08 09:52

This preserves order and, based on my tests, is 4 times faster than using a HashSet. This assumes your character range is 0-255 but you can extend that easily. If you're planning on using this in a loop, move the int[] c = new int[255]; out and in the function do an Array.Clear(c,0,255).


        private static string RemoveDuplicates(string s)
        {
            int[] c = new int[255];
            for (int i = 0; i < s.Length; i++)
            {
                c[s[i]]++;
            }
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < s.Length; i++)
            {
                if (c[s[i]] == 1) sb.Append(s[i]);
            }
            return sb.ToString();
        }
查看更多
手持菜刀,她持情操
6楼-- · 2020-02-08 09:58

This algorithm is general, can be applied to any language

  1. create a map(HashTable) char->int that holds the count of each character found, initially empty
  2. scan the string once to populate the map.
  3. create a new empty string that'll hold the output, may be you'll need to use a StringBuilder.
  4. scan the string(orthe map, whichever is shorter) copying only characters which an occurrence of 1 to the output string/StringBuilder
查看更多
登录 后发表回答