Recently ran into a very odd issue where my database contains strings with what appear to be normal whitespace characters but are in fact something else.
For instance, applying trim()
to the string:
"TEST "
is getting me:
"TEST "
as a result. So I copy and paste the last character in the string and:
echo ord(' ');
194
194? According to ASCII tables that should be ┬
. So I'm just confused at this point. Why does this character appear to be whitespace and how can I trim()
characters like this when trim()
fails?
It's more likely to be a two-byte
194
160
sequence, which is the UTF-8 encoding of a NO-BREAK SPACE codepoint (the equivalent of the
entity in HTML).It's really not a space, even though it looks like one. (You'll see it won't word-wrap, for instance.) A regular expression match for \s would match it, but a plain comparison with a space won't; nor will
trim()
remove it.To replace NO-BREAK spaces with a normal space, you should be able to do something like:
or
to remove them
Had the same issue. Solved it with
You can try with :
PHP trim
PHP str_replace
PHP preg_replace
OR (i'm not sure if it will work well)
You probably got the original data from Excel/CSV.. I'm importing from such format to my mysql db and it took me hours to figure out why it came padded and trim didn't appear to work (had to check every character in each CSV column string) but in fact it seems Excel adds chr(32) + chr (194) + chr(160) to "fill" the column, which at first sight, looks like all spaces at the end. This is what worked for me to have a pretty, perfect string to load into the db:
Thought I should contribute an answer of my own since it has now become clear to me what was happening. The problem originates dealing with html which contains a non-breaking space entity,
. Once you load the content in php'sDOMDocument()
, all entities are converted to their decoded values and upon parsing the it you end up with a non-breaking space character. In any event, even in a different scenario, the following method is another option for converting these to regular spaces:This works by first converting the non-breaking space into it's html entity, and then to a regular space. The contents of
$foo
can now be easily trimmed as normal.