Trim whitespace ASCII character “194” from string

2020-06-09 17:17发布

Recently ran into a very odd issue where my database contains strings with what appear to be normal whitespace characters but are in fact something else.

For instance, applying trim() to the string:

"TEST "

is getting me:

"TEST "

as a result. So I copy and paste the last character in the string and:

echo ord(' ');
194

194? According to ASCII tables that should be . So I'm just confused at this point. Why does this character appear to be whitespace and how can I trim() characters like this when trim() fails?

6条回答
够拽才男人
2楼-- · 2020-06-09 18:03

It's more likely to be a two-byte 194 160 sequence, which is the UTF-8 encoding of a NO-BREAK SPACE codepoint (the equivalent of the   entity in HTML).

It's really not a space, even though it looks like one. (You'll see it won't word-wrap, for instance.) A regular expression match for \s would match it, but a plain comparison with a space won't; nor will trim() remove it.

To replace NO-BREAK spaces with a normal space, you should be able to do something like:

$string = str_replace("\u{c2a0}", " ", $string);

or

$string = str_replace("\u{c2a0}", "", $string);

to remove them

查看更多
做自己的国王
3楼-- · 2020-06-09 18:06

Had the same issue. Solved it with

trim($str, ' ' . chr(194) . chr(160))
查看更多
Summer. ? 凉城
4楼-- · 2020-06-09 18:13
php -r 'print_r(json_encode(" "));'
"\u00a0"
$string = str_replace("\u{00a0}", "", $string); //not \u{c2a0}
查看更多
甜甜的少女心
5楼-- · 2020-06-09 18:14

You can try with :

PHP trim

$foo = "TEST ";
$foo = trim($foo);

PHP str_replace

$foo = "TEST ";
$foo = str_replace(chr(194), '', $foo);

IMPORTANT: You can try with chr(194).chr(160) or '\u00A0'

PHP preg_replace

$foo = "TEST ";
$foo = preg_replace('#(^\s+|\s+$)#', '', $foo);

OR (i'm not sure if it will work well)

$foo = "TEST ";
$foo = preg_replace('#[\xC2\xA0]#', '', $foo);
查看更多
够拽才男人
6楼-- · 2020-06-09 18:15

You probably got the original data from Excel/CSV.. I'm importing from such format to my mysql db and it took me hours to figure out why it came padded and trim didn't appear to work (had to check every character in each CSV column string) but in fact it seems Excel adds chr(32) + chr (194) + chr(160) to "fill" the column, which at first sight, looks like all spaces at the end. This is what worked for me to have a pretty, perfect string to load into the db:

  // convert to utf8
  $value = iconv("ISO-8859-15", "UTF-8",$data[$c]);
  // excel adds 194+160 to fill up!
  $value = rtrim($value,chr(32).chr(194).chr(160));
  // sanitize (escape etc)
  $value = $dbc->sanitize($value);
查看更多
一夜七次
7楼-- · 2020-06-09 18:17

Thought I should contribute an answer of my own since it has now become clear to me what was happening. The problem originates dealing with html which contains a non-breaking space entity,  . Once you load the content in php's DOMDocument(), all entities are converted to their decoded values and upon parsing the it you end up with a non-breaking space character. In any event, even in a different scenario, the following method is another option for converting these to regular spaces:

$foo = str_replace(' ',' ',htmlentities($foo));

This works by first converting the non-breaking space into it's html entity, and then to a regular space. The contents of $foo can now be easily trimmed as normal.

查看更多
登录 后发表回答