Best way to specify whitespace in a String.Split o

2019-01-01 06:40发布

问题:

I am splitting a string based on whitespace as follows:

string myStr = \"The quick brown fox jumps over the lazy dog\";

char[] whitespace = new char[] { \' \', \'\\t\' };
string[] ssizes = myStr.Split(whitespace);

It\'s irksome to define the char[] array everywhere in my code I want to do this. Is there more efficent way that doesn\'t require the creation of the character array (which is prone to error if copied in different places)?

回答1:

If you just call:

string[] ssize = myStr.Split(null);

or:

string[] ssize = myStr.Split(new char[0]);

then white-space is assumed to be the splitting character. From the string.Split(char[]) method\'s documentation page.

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

Always, always, always read the documentation!



回答2:

Yes, There is need for one more answer here!

All the solutions thus far address the rather limited domain of canonical input, to wit: a single whitespace character between elements (though tip of the hat to @cherno for at least mentioning the problem). But I submit that in all but the most obscure scenarios, splitting all of these should yield identical results:

string myStrA = \"The quick brown fox jumps over the lazy dog\";
string myStrB = \"The  quick  brown  fox  jumps  over  the  lazy  dog\";
string myStrC = \"The quick brown fox      jumps over the lazy dog\";
string myStrD = \"   The quick brown fox jumps over the lazy dog\";

String.Split (in any of the flavors shown throughout the other answers here) simply does not work well unless you attach the RemoveEmptyEntries option with either of these:

myStr.Split(new char[0], StringSplitOptions.RemoveEmptyEntries)
myStr.Split(new char[] {\' \',\'\\t\'}, StringSplitOptions.RemoveEmptyEntries)

As the illustration reveals, omitting the option yields four different results (labeled A, B, C, and D) vs. the single result from all four inputs when you use RemoveEmptyEntries:

\"String.Split

Of course, if you don\'t like using options, just use the regex alternative :-)

Regex.Split(myStr, @\"\\s+\").Where(s => s != string.Empty)


回答3:

According to the documentation :

If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

So just call myStr.Split(); There\'s no need to pass in anything because separator is a params array.



回答4:

Why dont you use?:

string[] ssizes = myStr.Split(\' \', \'\\t\');


回答5:

Note that adjacent whitespace will NOT be treated as a single delimiter, even when using String.Split(null). If any of your tokens are separated with multiple spaces or tabs, you\'ll get empty strings returned in your array.

From the documentation:

Each element of separator defines a separate delimiter character. If two delimiters are adjacent, or a delimiter is found at the beginning or end of this instance, the corresponding array element contains Empty.



回答6:

So don\'t copy and paste! Extract a function to do your splitting and reuse it.

public static string[] SplitWhitespace (string input)
{
    char[] whitespace = new char[] { \' \', \'\\t\' };
    return input.Split(whitespace);
}

Code reuse is your friend.



回答7:

Why don\'t you just do this:

var ssizes = myStr.Split(\" \\t\".ToCharArray());

It seems there is a method String.ToCharArray() in .NET 4.0!

EDIT: As VMAtm has pointed out, the method already existed in .NET 2.0!



回答8:

You can just do:

string myStr = \"The quick brown fox jumps over the lazy dog\";
string[] ssizes = myStr.Split(\' \');

MSDN has more examples and references:

http://msdn.microsoft.com/en-us/library/b873y76a.aspx



回答9:

Can\'t you do it inline?

var sizes = subject.Split(new char[] { \' \', \'\\t\' });

Otherwise, if you do this exact thing often, you could always create constant or something containing that char array.

As others have noted you can according to the documentation also use null or an empty array. When you do that it will use whitespace characters automatically.

var sizes = subject.Split(null);


回答10:

If repeating the same code is the issue, write an extension method on the String class that encapsulates the splitting logic.



标签: c# string