I have a string that looks like this:
Now, when my app shoves this string into its utf8 mysql database column, it looks like this in the MySQL CLI:
If I select convert(mystring using utfmb4)
it still looks like this.
And if I turn it to hex using select hex(mystring) from mytable;
, it looks like this:
C3A2CB9CE282ACC3AFC2B8C28FC3B0C5B8C592CB86C3B0C5B8C592C5A0C3B0C5B8C592C281C3B0C5B8E280A1C2ACC3B0C5B8E280A1C2A7
Now, let's say I want to find strings with that emoji wave in it. Well, the hex for the wave emoji is F09F8C8A
. But F09F8C8A
isn't in the hex above so something like select * from mytable where hex(mystring) like '%F09F8C8A%';
doesn't work.
Any suggestions?
I call that "double encoding". Your client claimed it was getting latin1 characters, but told MySQL that they should be utf8, so a 3-byte utf8 character got converted to 6 bytes in the database.
You need to fix both the client and the data in the table(s).
This link discusses it: http://mysql.rjweb.org/doc.php/charcoll . (Sorry, there is no brief summary of how to fix your problems.)The issues and the fixes.