Well, the subject says everything. I'm using json_encode to convert some UTF8 data to JSON and I need to transfer it to some layer that is currently ASCII-only. So I wonder whether I need to make it UTF-8 aware, or can I leave it as it is.
Looking at JSON rfc, UTF8 is also valid charset in JSON output, although not recommended, i.e. some implemenatations can leave UTF8 data inside. The question is whether PHP's implementation dumps everthing as ASCII or opts to leave something as UTF-8.
Unlike JSON support in other languages, json_encode()
does not have the ability to generate anything other than ASCII.
According to the JSON article in Wikipedia, Unicode characters in strings are always
double-quoted Unicode with backslash escaping
The examples in the PHP Manual on json_encode()
seem to confirm this.
So any UTF-8 character outside ASCII/ANSI should be escaped like this: \u0027
(note, as @Ignacio points out in the comments, that this is the recommended way to deal with those characters, not a required one)
However, I suppose json_decode()
will convert the characters back to their byte values? You may get in trouble there.
If you need to be sure, take a look at iconv() that could convert your UTF-8 String into ASCII (dropping any unsupported characters) beforehand.
Well, json_encode
returns a string. According to the PHP documentation for string:
A string is series of characters. Before PHP 6, a character is the same as a byte. That is, there are exactly 256 different characters possible. This also implies that PHP has no native support of Unicode. See utf8_encode() and utf8_decode() for some basic Unicode functionality.
So for the time being you do not need to worry about making it UTF-8 aware. Of course you still might want to think about this anyway, to future-proof your code.