Filter all types of whitespace in PHP

I know that there are many types of space (em space, en space, thin space, non-breaking space, etc), but, all these, that I refered, have HTML entities (at least, PHP's htmlentities() return something like &emsp;.

But, what about those spaces that have no HTML entities?
Example: http://iorbix.com/1001-p-Nuno-Peralta
Look at the nickname of this account. It has many " " (spaces) at the front, which are visible for us (this doesn't happen with the  ).

I tried already filter with regular expressions, using \x escape, filter with str_replace(), with the space as the argument, and no luck at all!

Do you have any suggestion on how to filter ALL types of whitespace?

回答1:

\s by default, will not match whitespace characters with values greater than 128. To get at those, you can instead make good use of other UTF-8-aware sequences.

^{(Standard disclaimer: I'm skimming the PCRE source code to compile the lists below, I may miss a character or type something incorrectly. Please forgive me.)}

\p{Zs} matches:

U+0020 Space
U+00A0 No-break space
U+1680 Ogham space mark
U+180E Mongolian vowel separator
U+2000 En quad
U+2001 Em quad
U+2002 En space
U+2003 Em space
U+2004 Three-per-em space
U+2005 Four-per-em space
U+2006 Six-per-em space
U+2007 Figure space
U+2008 Punctuation space
U+2009 Thin space
U+200A Hair space
U+202F Narrow no-break space
U+205F Medium mathematical space
U+3000 Ideographic space

\h (Horizontal whitespace) matches the same as \p{Zs} above, plus

U+0009 Horizontal tab.

Similarly for matching vertical whitespace there are a few options.

\p{Zl} matches U+2028 Line separator.

\p{Zp} matches U+2029 Paragraph separator.

\v (Vertical whitespace) matches \p{Zl}, \p{Zp} and the following

U+000A Linefeed
U+000B Vertical tab
U+000C Formfeed
U+000D Carriage return
U+0085 Next line

Going back to the beginning, in UTF-8 mode (i.e. using the u pattern modifier) \s will match any character that \p{Z} matches (which is anything that \p{Zs}, \p{Zl} and \p{Zp} will match), plus

U+0009 Horizontal tab
U+000A Linefeed
U+000C Formfeed
U+000D Carriage return

To cut a long story short (I bet you read all of the above, didn't you?) you might want to use \s but make sure to be in UTF-8 mode like /\s/u. Putting that to some practical use, to filter out those matching whitespace characters from a string you would do something like

$new_string = preg_replace('/\s/u', '', $old_string);

Finally, if you really, really care about the vertical whitespaces which aren't included in \s (LF and NEL) then you can use the character class [\s\v] to match all 26 of the whitespace characters listed above.

回答2:

They are all plain spaces (returning character code 32) that can be caught with regular expressions or trim().

Try this:

preg_replace("/\s{2,}/", " ", $text);

回答3:

$result = preg_replace('/\s/', '', $yourString)

See http://www.php.net/manual/en/regexp.reference.backslash.php for more infos on the \s