I have an arabic string say
txt = u'Arabic (\u0627\u0644\u0637\u064a\u0631\u0627\u0646)'
I want to write this text arabic converted into mySql database. I tried using
txt = smart_str(txt)
or
txt = text.encode('utf-8')
both of these din't work as they coverted the string to
u'Arabic (\xd8\xa7\xd9\x84\xd8\xb7\xd9\x8a\xd8\xb1\xd8\xa7\xd9\x86)'
Also my database character set is already set to utf-8
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
So due to this new unicodes, my database is displaying the characters related to the encoded text. Please help. I want my arabic text to be retained.
Also does quick export of this arabic text from MySQL database write the same arabic text into files or will it again convert it back to unicode?
I used the foolowing code to insert
cur.execute("INSERT INTO tab1(id, username, text, created_at) VALUES (%s, %s, %s, %s)", (smart_str(id), smart_str(user_name), smart_str(text), date))
Earlier to this when I didn't use smart_str, it throws an error saying only 'latin-1' is allowed.
To clarify a few things, because it will help you along in the future as well.
This is not an Arabic string. This is a unicode object, with unicode codepoints. If you were to simply print it, and if your terminal supports Arabic you would get output like this:
Now, to get the same output like
Arabic (الطيران)
in your database, you need to encode the string.Encoding is taking these code points; and converting them to bytes so that computers know what to do with them.
So the most common encoding is
utf-8
, because it supports all the characters of English, plus a lot of other languages (including Arabic). There are others too, for example,windows-1256
also supports Arabic. There are some that don't have references for those numbers (called code points), and when you try to encode, you'll get an error like this:What that is telling you is that some number in the unicode object does not exist in the table
latin-1
, so the program doesn't know how to convert it to bytes.Computers store bytes. So when storing or transmitting information you need to always encode/decode it correctly.
This encode/decode step is sometimes called the unicode sandwich - everything outside is bytes, everything inside is unicode.
With that out of the way, you need to encode the data correctly before you send it to your database; to do that, encode it:
To confirm that it is being inserted correctly, make sure you are using mysql from a terminal or application that supports Arabic; otherwise - even if its inserted correctly, when it is displayed by your program - you will see garbage characters.
Just execute
SET names utf8
before executing yourINSERT
:Your question is very similar to this SO post, which you should read.