How to replace multiple white spaces with one whit

2019-01-03 02:14发布

Let's say I have a string such as:

"Hello     how are   you           doing?"

I would like a function that turns multiple spaces into one space.

So I would get:

"Hello how are you doing?"

I know I could use regex or call

string s = "Hello     how are   you           doing?".replace("  "," ");

But I would have to call it multiple times to make sure all sequential whitespaces are replaced with only one.

Is there already a built in method for this?

13条回答
狗以群分
2楼-- · 2019-01-03 02:40

Smallest solution:

var regExp=/\s+/g, newString=oldString.replace(regExp,' ');

查看更多
放我归山
3楼-- · 2019-01-03 02:41

This question isn't as simple as other posters have made it out to be (and as I originally believed it to be) - because the question isn't quite precise as it needs to be.

There's a difference between "space" and "whitespace". If you only mean spaces, then you should use a regex of " {2,}". If you mean any whitespace, that's a different matter. Should all whitespace be converted to spaces? What should happen to space at the start and end?

For the benchmark below, I've assumed that you only care about spaces, and you don't want to do anything to single spaces, even at the start and end.

Note that correctness is almost always more important than performance. The fact that the Split/Join solution removes any leading/trailing whitespace (even just single spaces) is incorrect as far as your specified requirements (which may be incomplete, of course).

The benchmark uses MiniBench.

using System;
using System.Text.RegularExpressions;
using MiniBench;

internal class Program
{
    public static void Main(string[] args)
    {

        int size = int.Parse(args[0]);
        int gapBetweenExtraSpaces = int.Parse(args[1]);

        char[] chars = new char[size];
        for (int i=0; i < size/2; i += 2)
        {
            // Make sure there actually *is* something to do
            chars[i*2] = (i % gapBetweenExtraSpaces == 1) ? ' ' : 'x';
            chars[i*2 + 1] = ' ';
        }
        // Just to make sure we don't have a \0 at the end
        // for odd sizes
        chars[chars.Length-1] = 'y';

        string bigString = new string(chars);
        // Assume that one form works :)
        string normalized = NormalizeWithSplitAndJoin(bigString);


        var suite = new TestSuite<string, string>("Normalize")
            .Plus(NormalizeWithSplitAndJoin)
            .Plus(NormalizeWithRegex)
            .RunTests(bigString, normalized);

        suite.Display(ResultColumns.All, suite.FindBest());
    }

    private static readonly Regex MultipleSpaces = 
        new Regex(@" {2,}", RegexOptions.Compiled);

    static string NormalizeWithRegex(string input)
    {
        return MultipleSpaces.Replace(input, " ");
    }

    // Guessing as the post doesn't specify what to use
    private static readonly char[] Whitespace =
        new char[] { ' ' };

    static string NormalizeWithSplitAndJoin(string input)
    {
        string[] split = input.Split
            (Whitespace, StringSplitOptions.RemoveEmptyEntries);
        return string.Join(" ", split);
    }
}

A few test runs:

c:\Users\Jon\Test>test 1000 50
============ Normalize ============
NormalizeWithSplitAndJoin  1159091 0:30.258 22.93
NormalizeWithRegex        26378882 0:30.025  1.00

c:\Users\Jon\Test>test 1000 5
============ Normalize ============
NormalizeWithSplitAndJoin  947540 0:30.013 1.07
NormalizeWithRegex        1003862 0:29.610 1.00


c:\Users\Jon\Test>test 1000 1001
============ Normalize ============
NormalizeWithSplitAndJoin  1156299 0:29.898 21.99
NormalizeWithRegex        23243802 0:27.335  1.00

Here the first number is the number of iterations, the second is the time taken, and the third is a scaled score with 1.0 being the best.

That shows that in at least some cases (including this one) a regular expression can outperform the Split/Join solution, sometimes by a very significant margin.

However, if you change to an "all whitespace" requirement, then Split/Join does appear to win. As is so often the case, the devil is in the detail...

查看更多
三岁会撩人
4楼-- · 2019-01-03 02:43

I'm sharing what I use, because it appears I've come up with something different. I've been using this for a while and it is fast enough for me. I'm not sure how it stacks up against the others. I uses it in a delimited file writer and run large datatables one field at a time through it.

    public static string NormalizeWhiteSpace(string S)
    {
        string s = S.Trim();
        bool iswhite = false;
        int iwhite;
        int sLength = s.Length;
        StringBuilder sb = new StringBuilder(sLength);
        foreach(char c in s.ToCharArray())
        {
            if(Char.IsWhiteSpace(c))
            {
                if (iswhite)
                {
                    //Continuing whitespace ignore it.
                    continue;
                }
                else
                {
                    //New WhiteSpace

                    //Replace whitespace with a single space.
                    sb.Append(" ");
                    //Set iswhite to True and any following whitespace will be ignored
                    iswhite = true;
                }  
            }
            else
            {
                sb.Append(c.ToString());
                //reset iswhitespace to false
                iswhite = false;
            }
        }
        return sb.ToString();
    }
查看更多
我欲成王,谁敢阻挡
5楼-- · 2019-01-03 02:44

As already pointed out, this is easily done by a regular expression. I'll just add that you might want to add a .trim() to that to get rid of leading/trailing whitespace.

查看更多
ゆ 、 Hurt°
6楼-- · 2019-01-03 02:44
Regex regex = new Regex(@"\W+");
string outputString = regex.Replace(inputString, " ");
查看更多
做自己的国王
7楼-- · 2019-01-03 02:51

There is no way built in to do this. You can try this:

private static readonly char[] whitespace = new char[] { ' ', '\n', '\t', '\r', '\f', '\v' };
public static string Normalize(string source)
{
   return String.Join(" ", source.Split(whitespace, StringSplitOptions.RemoveEmptyEntries));
}

This will remove leading and trailing whitespce as well as collapse any internal whitespace to a single whitespace character. If you really only want to collapse spaces, then the solutions using a regular expression are better; otherwise this solution is better. (See the analysis done by Jon Skeet.)

查看更多
登录 后发表回答