I have a Mysql table with multiple languages, one language a field.
My character set is utf_general_ci
When I look into the table with phpMyAdmin I have a bulgarian page which looks like this:
За наÑ
This is a title. This same title shows up in the website like this:
За нас (this is correct)
What am I doing wrong?
OK, try to execute these queries before your actual fetching of the records:
Afterwards proceed with execution of your queries. The above queries, if course, must be in context of your current database connection.
What character set do the fields in your table use ? Can you please share the relevant part of the SHOW CREATE TABLE command for these fields ?
Since ISO-8859-1 is the default database charset for mysql and it's mostly not doing any conversions people use it as BINARY and just store UTF-8 encoded Cyrillic into it. This works well with web development tools, because they bind to the field and receive the data as UTF-8 encoded binary bytes and then, without conversion, put it in a web page that says it uses utf-8 encoding for its output. So data just pass through without being properly encoded for the database to use. Of course this causes all kinds of problems when you do operations inside the database (e.g. get the character vs. byte length and try to sort properly). But for basic store/retrieve operations it looks like it's working. This is a very typical behavior for non-localized web apps that assume they're working with ASCII or ISO-8859-1 at most. The remedy to that is to create new set of tables using the UTF-8 encoding and then explicitly transcode the wrongly encoded utf-8 data to wide chars and then put these into the utf-8 table so the database is aware of the right encoding used.
This looks like the data is UTF-8 encoded and hence works well on a web page declared as UTF-8 encoded but not when a program cannot handle or has not been set to apply UTF-8.
For example, the characters °Ñ that occur twice are U+00B0 U+00D1. The bytes 0xB0 and 0xD1 are the UTF-8 form of the cyrillic small letter a, U+0430, which appears in the corresponding positions in the correct text. So apparently UTF-8 data is being misinterpreted according to ISO-8859-1, Windows-1252, or some similar 8-bit encoding.