Well, the subject says everything. I'm using json_encode to convert some UTF8 data to JSON and I need to transfer it to some layer that is currently ASCII-only. So I wonder whether I need to make it UTF-8 aware, or can I leave it as it is.
Looking at JSON rfc, UTF8 is also valid charset in JSON output, although not recommended, i.e. some implemenatations can leave UTF8 data inside. The question is whether PHP's implementation dumps everthing as ASCII or opts to leave something as UTF-8.
Unlike JSON support in other languages,
json_encode()
does not have the ability to generate anything other than ASCII.According to the JSON article in Wikipedia, Unicode characters in strings are always
The examples in the PHP Manual on
json_encode()
seem to confirm this.So any UTF-8 character outside ASCII/ANSI should be escaped like this:
\u0027
(note, as @Ignacio points out in the comments, that this is the recommended way to deal with those characters, not a required one)However, I suppose
json_decode()
will convert the characters back to their byte values? You may get in trouble there.If you need to be sure, take a look at iconv() that could convert your UTF-8 String into ASCII (dropping any unsupported characters) beforehand.
Well,
json_encode
returns a string. According to the PHP documentation for string:So for the time being you do not need to worry about making it UTF-8 aware. Of course you still might want to think about this anyway, to future-proof your code.