How to replace multiple white spaces with one whit

2019-01-03 02:14发布

Let's say I have a string such as:

"Hello     how are   you           doing?"

I would like a function that turns multiple spaces into one space.

So I would get:

"Hello how are you doing?"

I know I could use regex or call

string s = "Hello     how are   you           doing?".replace("  "," ");

But I would have to call it multiple times to make sure all sequential whitespaces are replaced with only one.

Is there already a built in method for this?

2楼-- · 2019-01-03 02:52
string cleanedString = System.Text.RegularExpressions.Regex.Replace(dirtyString,@"\s+"," ");
3楼-- · 2019-01-03 02:54

A fast extra whitespace remover... This is the fastest one and is based on Felipe Machado's in-place copy.

static string InPlaceCharArray(string str)
    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;
    bool lastWasWS = false;
    for (int i = 0; i < len; i++)
        var ch = src[i];
        if (src[i] == '\u0020')
            if (lastWasWS == false)
                src[dstIdx++] = ch;
                lastWasWS = true;
            lastWasWS = false;
            src[dstIdx++] = ch;
    return new string(src, 0, dstIdx);

The benchmarks...

InPlaceCharArraySpaceOnly by Felipe Machado on CodeProject 2015 and modified by Sunsetquest for multi-space removal. Time: 3.75 Ticks

InPlaceCharArray by Felipe Machado 2015 and slightly modified by Sunsetquest for multi-space removal. Time 6.50 Ticks (supports tabs also)

SplitAndJoinOnSpace by Jon Skeet. Time: 13.25 Ticks

StringBuilder by fubo Time: 13.5 Ticks (supports tabs also)

Regex with compile by Jon Skeet. Time: 17 Ticks

StringBuilder by David S 2013 Time: 30.5 Ticks

Regex with non-compile by Brandon Time: 63.25 Ticks

StringBuilder by user214147 Time: 77.125 Ticks

Regex with non-compile Tim Hoolihan Time: 147.25 Ticks

The Benchmark code...

using System;
using System.Text.RegularExpressions;
using System.Diagnostics;
using System.Threading;
using System.Text;

static class Program
    public static void Main(string[] args)
    long seed = ConfigProgramForBenchmarking();

    Stopwatch sw = new Stopwatch();

    string warmup = "This is   a Warm  up function for best   benchmark results." + seed;
    string input1 = "Hello World,    how are   you           doing?" + seed;
    string input2 = "It\twas\t \tso    nice  to\t\t see you \tin 1950.  \t" + seed;
    string correctOutput1 = "Hello World, how are you doing?" + seed;
    string correctOutput2 = "It\twas\tso nice to\tsee you in 1950. " + seed;
    string output1,output2;

    //warm-up timer function

    long baseVal = sw.ElapsedTicks;

    // InPlace Replace by Felipe Machado but modified by Ryan for multi-space removal (
    output1 = InPlaceCharArraySpaceOnly (warmup);
    output1 = InPlaceCharArraySpaceOnly (input1);
    output2 = InPlaceCharArraySpaceOnly (input2);
    Console.WriteLine("InPlaceCharArraySpaceOnly : " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    // InPlace Replace by Felipe R. Machado and slightly modified by Ryan for multi-space removal (
    output1 = InPlaceCharArray(warmup);
    output1 = InPlaceCharArray(input1);
    output2 = InPlaceCharArray(input2);
    Console.WriteLine("InPlaceCharArray: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //Regex with non-compile Tim Hoolihan (
    string cleanedString = 
    output1 = Regex.Replace(warmup, @"\s+", " ");
    output1 = Regex.Replace(input1, @"\s+", " ");
    output2 = Regex.Replace(input2, @"\s+", " ");
    Console.WriteLine("Regex by Tim Hoolihan: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //Regex with compile by Jon Skeet (
    output1 = MultipleSpaces.Replace(warmup, " ");
    output1 = MultipleSpaces.Replace(input1, " ");
    output2 = MultipleSpaces.Replace(input2, " ");
    Console.WriteLine("Regex with compile by Jon Skeet: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //Split And Join by Jon Skeet (
    output1 = SplitAndJoinOnSpace(warmup);
    output1 = SplitAndJoinOnSpace(input1);
    output2 = SplitAndJoinOnSpace(input2);
    Console.WriteLine("Split And Join by Jon Skeet: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //Regex by Brandon (
    output1 = Regex.Replace(warmup, @"\s{2,}", " ");
    output1 = Regex.Replace(input1, @"\s{2,}", " ");
    output2 = Regex.Replace(input2, @"\s{2,}", " ");
    Console.WriteLine("Regex by Brandon: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //StringBuilder by user214147 (
    output1 = user214147(warmup);
    output1 = user214147(input1);
    output2 = user214147(input2);
    Console.WriteLine("StringBuilder by user214147: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //StringBuilder by fubo (
    output1 = fubo(warmup);
    output1 = fubo(input1);
    output2 = fubo(input2);
    Console.WriteLine("StringBuilder by fubo: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

    //StringBuilder by David S 2013 (
    output1 = SingleSpacedTrim(warmup);
    output1 = SingleSpacedTrim(input1);
    output2 = SingleSpacedTrim(input2);
    Console.WriteLine("StringBuilder(SingleSpacedTrim) by David S: " + (sw.ElapsedTicks - baseVal));
    Console.WriteLine("  Trial1:(spaces only) " + (output1 == correctOutput1 ? "PASS " : "FAIL "));
    Console.WriteLine("  Trial2:(spaces+tabs) " + (output2 == correctOutput2 ? "PASS " : "FAIL "));

// InPlace Replace by Felipe Machado and slightly modified by Ryan for multi-space removal (
static string InPlaceCharArray(string str)
    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;
    bool lastWasWS = false;
    for (int i = 0; i < len; i++)
        var ch = src[i];
        if (src[i] == '\u0020')
            if (lastWasWS == false)
                src[dstIdx++] = ch;
                lastWasWS = true;
            lastWasWS = false;
            src[dstIdx++] = ch;
    return new string(src, 0, dstIdx);

// InPlace Replace by Felipe R. Machado but modified by Ryan for multi-space removal (
static string InPlaceCharArraySpaceOnly (string str)
    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;
    bool lastWasWS = false; //Added line
    for (int i = 0; i < len; i++)
        var ch = src[i];
        switch (ch)
            case '\u0020': //SPACE
            case '\u00A0': //NO-BREAK SPACE
            case '\u1680': //OGHAM SPACE MARK
            case '\u2000': // EN QUAD
            case '\u2001': //EM QUAD
            case '\u2002': //EN SPACE
            case '\u2003': //EM SPACE
            case '\u2004': //THREE-PER-EM SPACE
            case '\u2005': //FOUR-PER-EM SPACE
            case '\u2006': //SIX-PER-EM SPACE
            case '\u2007': //FIGURE SPACE
            case '\u2008': //PUNCTUATION SPACE
            case '\u2009': //THIN SPACE
            case '\u200A': //HAIR SPACE
            case '\u202F': //NARROW NO-BREAK SPACE
            case '\u205F': //MEDIUM MATHEMATICAL SPACE
            case '\u3000': //IDEOGRAPHIC SPACE
            case '\u2028': //LINE SEPARATOR
            case '\u2029': //PARAGRAPH SEPARATOR
            case '\u0009': //[ASCII Tab]
            case '\u000A': //[ASCII Line Feed]
            case '\u000B': //[ASCII Vertical Tab]
            case '\u000C': //[ASCII Form Feed]
            case '\u000D': //[ASCII Carriage Return]
            case '\u0085': //NEXT LINE
                if (lastWasWS == false) //Added line
                    src[dstIdx++] = ch; //Added line
                    lastWasWS = true; //Added line
                lastWasWS = false; //Added line 
                src[dstIdx++] = ch;
    return new string(src, 0, dstIdx);

static readonly Regex MultipleSpaces =
    new Regex(@" {2,}", RegexOptions.Compiled);

//Split And Join by Jon Skeet (
static string SplitAndJoinOnSpace(string input)
    string[] split = input.Split(new char[] { ' '}, StringSplitOptions.RemoveEmptyEntries);
    return string.Join(" ", split);

//StringBuilder by user214147 (
public static string user214147(string S)
    string s = S.Trim();
    bool iswhite = false;
    int iwhite;
    int sLength = s.Length;
    StringBuilder sb = new StringBuilder(sLength);
    foreach (char c in s.ToCharArray())
        if (Char.IsWhiteSpace(c))
            if (iswhite)
                //Continuing whitespace ignore it.
                //New WhiteSpace

                //Replace whitespace with a single space.
                sb.Append(" ");
                //Set iswhite to True and any following whitespace will be ignored
                iswhite = true;
            //reset iswhitespace to false
            iswhite = false;
    return sb.ToString();

//StringBuilder by fubo (
public static string fubo(this string Value)
    StringBuilder sbOut = new StringBuilder();
    if (!string.IsNullOrEmpty(Value))
        bool IsWhiteSpace = false;
        for (int i = 0; i < Value.Length; i++)
            if (char.IsWhiteSpace(Value[i])) //Comparison with WhiteSpace
                if (!IsWhiteSpace) //Comparison with previous Char
                    IsWhiteSpace = true;
                IsWhiteSpace = false;
    return sbOut.ToString();

//David S. 2013 (
public static String SingleSpacedTrim(String inString)
    StringBuilder sb = new StringBuilder();
    Boolean inBlanks = false;
    foreach (Char c in inString)
        switch (c)
            case '\r':
            case '\n':
            case '\t':
            case ' ':
                if (!inBlanks)
                    inBlanks = true;
                    sb.Append(' ');
                inBlanks = false;
    return sb.ToString().Trim();

/// <summary>
/// We want to run this item with max priory to lower the odds of
/// the OS from doing program context switches in the middle of our code. 
/// source: 
/// </summary>
/// <returns>random seed</returns>
private static long ConfigProgramForBenchmarking()
    //prevent the JIT Compiler from optimizing Fkt calls away
    long seed = Environment.TickCount;
    //use the second Core/Processor for the test
    Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
    //prevent "Normal" Processes from interrupting Threads
    Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
    //prevent "Normal" Threads from interrupting this thread
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    return seed;


Benchmark notes: Release Mode, no-debugger attached, i7 processor, avg of 4 runs, only short strings tested

4楼-- · 2019-01-03 02:55

Here is the Solution i work with. Without RegEx and String.Split.

public static string TrimWhiteSpace(this string Value)
    StringBuilder sbOut = new StringBuilder();
    if (!string.IsNullOrEmpty(Value))
        bool IsWhiteSpace = false;
        for (int i = 0; i < Value.Length; i++)
            if (char.IsWhiteSpace(Value[i])) //Comparion with WhiteSpace
                if (!IsWhiteSpace) //Comparison with previous Char
                    IsWhiteSpace = true;
                IsWhiteSpace = false;
    return sbOut.ToString();

so you can:

string cleanedString = dirtyString.TrimWhiteSpace();
5楼-- · 2019-01-03 02:58


Linha.Split(" ").ToList().Where(Function(x) x <> " ").ToArray


Linha.Split(" ").ToList().Where(x => x != " ").ToArray();

Enjoy the power of LINQ =D

6楼-- · 2019-01-03 03:02

A regular expressoin would be the easiest way. If you write the regex the correct way, you wont need multiple calls.

Change it to this:

string s = System.Text.RegularExpressions.Regex.Replace(s, @"\s{2,}", " "); 
7楼-- · 2019-01-03 03:06

While the existing answers are fine, I'd like to point out one approach which doesn't work:

public static string DontUseThisToCollapseSpaces(string text)
    while (text.IndexOf("  ") != -1)
        text = text.Replace("  ", " ");
    return text;

This can loop forever. Anyone care to guess why? (I only came across this when it was asked as a newsgroup question a few years ago... someone actually ran into it as a problem.)

登录 后发表回答