After viewing my prod logs, I have some error mentionning :
[2012-08-31 15:56:43] request.CRITICAL: Doctrine\DBAL\DBALException:
An exception occurred while executing 'SELECT t0.username ....... FROM fos_user t0 WHERE t0.username = ?'
with params {"1":"Nrv\u29e7Kasi"}:
SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT)
and (utf8_general_ci,COERCIBLE) for operation '='
Alghout i have UTF-8 default under the doctrine cfg :
doctrine:
dbal:
charset: UTF8
It seems that all my MySQL Tables are in latin1_swedish_ci
, so my question is :
Can I manually change the collation to utf8_general_ci
for all my tables without any complications/precautions ?
Thats right. I ran into this problem and the best quick and fast solution is
Simply convert table's character set by command as follows,
It is helpful to understand the following definitions:
A character encoding details how each symbol is represented in binary (and therefore stored in the computer). For example, the symbol
é
(U+00E9, latin small letter E with acute) is encoded as0xc3a9
in UTF-8 (which MySQL callsutf8
) and0xe9
in Windows-1252 (which MySQL callslatin1
).A character set is the alphabet of symbols that can be represented using a given character encoding. Confusingly, the term is also used to mean the same as character encoding.
A collation is an ordering on a character set, so that strings can be compared. For example: MySQL's
latin1_swedish_ci
collation treats most accented variations of a character as equivalent to the base character, whereas itslatin1_general_ci
collation will order them before the next base character but not equivalent (there are other, more significant, differences too: such as the order of characters likeå
,ä
,ö
andß
).MySQL will decide which collation should be applied to a given expression as documented under Collation of Expressions: in particular, the collation of a column takes precedence over that of a string literal.
The
WHERE
clause of your query compares the following strings:a value in
fos_user.username
, encoded in the column's character set (Windows-1252) and expressing a preference for its collationlatin1_swedish_ci
(with a coercibility value of 2); withthe string literal
'Nrv⧧Kasi'
, encoded in the connection's character set (UTF-8, as configured by Doctrine) and expressing a preference for the connection's collationutf8_general_ci
(with a coercibility value of 4).Since the first of these strings has a lower coercibility value than the second, MySQL attempts to perform the comparison using that string's collation:
latin1_swedish_ci
. To do so, MySQL attempts to convert the second string tolatin1
—but since the⧧
character does not exist in that character set, the comparison fails.Warning
One should pause for a moment to consider how the column is currently encoded: you are attempting to filter for records where
fos_user.username
is equal to a string that contains a character which cannot exist in that column!If you believe that the column does contain such characters, then you probably wrote to the column whilst the connection character encoding was set to something (e.g.
latin1
) that caused MySQL to interpret the received byte sequence as characters which are all in the Windows-1252 character set.If this is the case, before continuing any further you should fix your data!
convert such columns to the character encoding that was used on data insertion, if different to the incumbent encoding:
drop the encoding information associated with such columns by converting them to the
binary
character set:associate with such columns the encoding in which data was actually transmitted by converting them to the relevant character set.
Note that, if converting from a multi-byte encoding, you may need to increase the size of the column (or even change its type) in order to accomodate the maximum possible length of the converted string.
Once one is certain that the columns are correctly encoded, one could force the comparison to be conducted using a Unicode collation by either—
explicitly converting the value
fos_user.username
to a Unicode character set:forcing the string literal to have a lower coercibility value than the column (will cause an implicit conversion of the column's value to UTF-8):
Or one could, as you say, permanently convert the column(s) to a Unicode encoding and set its collation appropriately.
The principle consideration is that Unicode encodings take up more space than single-byte character sets, so:
more storage may be required;
comparisons may be slower; and
index prefix lengths may need to be adjusted (note that the maximum is in bytes, so may represent fewer characters than previously).
Also, be aware that, as documented under
ALTER TABLE
Syntax: