This is my environment: Client -> iOS App, Server ->PHP and MySQL.
The data from client to server is done via HTTP POST.
The data from server to client is done with json.
I would like to add support for emojis or any utf8mb4 character in general. I'm looking for the right way for dealing with this under my scenario.
My questions are the following:
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols? If so, which encoding method should I use so that it works flawlessly in Objective-C and PHP (and java for the future android version)?
Right now I have the DB with utf8mb4 but I get errors when trying to store a raw emoji. On the other hand, I can store non-utf8 symbols such ¿
or á
.
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8
(if I get them in utf8mb4 the json_decode
function doesn't work), then such symbols are encoded (e.g., ¿
is encoded to \u00bf
).
MySQL's
utf8
charset is not actually UTF-8, it's a subset of UTF-8 only supporting the basic plane (characters up to U+FFFF). Most emoji use code points higher than U+FFFF. MySQL'sutf8mb4
is actual UTF-8 which can encode all those code points. Outside of MySQL there's no such thing as "utf8mb4", there's just UTF-8. So:Again, no such thing as "utf8mb4". HTTP POST requests support any raw bytes, if your client sends UTF-8 encoded data you're fine.
Yes.
God no, use raw UTF-8 (
utf8mb4
) for all that is holy.Well, there's your problem; channeling your data through MySQL's
utf8
charset will discard any characters above U+FFFF. Useutf8mb4
all the way through MySQL.You'll have to specify what that means exactly. PHP's JSON functions should be able to handle any Unicode code point just fine, as long as it's valid UTF-8:
Use utf8mb4 throughout MySQL:
SET NAMES utf8mb4
CHARACTER SET utf8mb4
Use UTF-8 throughout other things:
¿
orá
are (or at least can be) encoded in utf8 (utf8mb4)