How to correctly insert utf-8 characters into a My

I am extremely confused and puzzled by how I store strings with unusual characters (to someone who is used to dealing with a UK English character set) in them.

Here is my example.

I have this name: Bientôt l'été

This is how I created my table:

CREATE TABLE MyTable(
    'my_id' INT(10) unsigned NOT NULL,
    'my_name' TEXT CHARACTER SET utf8 NOT NULL,
    PRIMARY KEY(`my_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Using this simplified python script I am trying to insert the string into a MySQL database and table:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import MySQLdb

mystring = "Bientôt l'été"

myinsert = [ { "name" : mystring.encode("utf-8").strip()[:65535], "id" : 1 } ]

con = None
con = MySQLdb.connect('localhost', 'abc', 'def', 'ghi');
cur = con.cursor()
sql = "INSERT INTO 'MyTable' ( 'my_id', 'my_name' ) VALUES ( %(id)s, %(name)s ) ; "
cur.executemany( sql, myinsert )
con.commit()
if con: con.close()

If I then try to read the name in the database it is stored as: BientÃ´t l'Ã©tÃ©

I want it to read: Bientôt l'été

How do I get the python script/MySQL database to do this? I think this is something to do with the character set and how it is set but I can't find a simple web page that explains this without any technical jargon. I've been struggling with this for hours!

I have looked at this and I see character_set_server is set as latin1 but I don't know if this is the problem or how to change it:

mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

标签： python mysql utf-8

4条回答

闹够了就滚

2楼-- · 2020-06-09 08:30

Your problem is with how you display the data when you read it from the database. You are looking at UTF-8 data mis-interpreted as Latin 1.

>>> "Bient\xf4t l'\xe9t\xe9"
"Bientôt l'été"
>>> "Bient\xf4t l'\xe9t\xe9".encode('utf8').decode('latin1')
"BientÃ´t l'Ã©tÃ©"

The above encoded a unicode string to UTF-8, then misinterprets it as Latin 1 (ISO 8859-1), and the ô and é codepoints, which were encoded to two UTF-8 bytes each, are re-interpreted as two latin-1 code points each.

Since you are running Python 2, you shouldn't need to .encode() already encoded data. It'd be better if you inserted unicode objects instead; so you want to decode instead:

myinsert = [ { "name" : mystring.decode("utf-8").strip()[:65535], "id" : 1 } ]

By calling .encode() on the encoded data, you are asking Python to first decode the data (using the default encoding) so that it then can encode for you. If the default on your python has been changed to latin1 you would see the same effect; UTF-8 data interpreted as Latin 1 before being re-encoded to Latin-1.

You may want to read up on Python and Unicode:

The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2020-06-09 08:37

Set the default client character set:

<?php
$con=mysqli_connect("localhost","my_user","my_password","my_db");
// Check connection
if (mysqli_connect_errno())
  {
  echo "Failed to connect to MySQL: " . mysqli_connect_error();
  }

// Change character set to utf8
mysqli_set_charset($con,"utf8");
mysqli_close($con);
?>

0人赞添加讨论(0) 举报

放我归山

4楼-- · 2020-06-09 08:39

<?php
//Set Beginning of php code:
header("Content-Type: text/html; charset=UTF-8");
mysql_query("SET NAMES 'utf8'"); 
mysql_query('SET CHARACTER SET utf8');

//then create the connection 
$CNN=mysql_connect("localhost","usr_urdu","123") or die('Unable to Connect');
$DB=mysql_select_db('db_urdu',$CNN)or die('Unable to select DB');

0人赞添加讨论(0) 举报

三岁会撩人

5楼-- · 2020-06-09 08:47

Did you try, this query set names utf8;

#!/usr/bin/python
# -*- coding: utf-8 -*-

import MySQLdb

mystring = "Bientôt l'été"

myinsert = [{ "name": mystring.encode("utf-8").strip()[:65535], "id": 1 }]

con = MySQLdb.connect('localhost', 'abc', 'def', 'ghi');
cur = con.cursor()

cur.execute("set names utf8;")     # <--- add this line,

sql = "INSERT INTO 'MyTable' ( 'my_id', 'my_name' ) VALUES ( %(id)s, %(name)s ) ; "
cur.executemany( sql, myinsert )
con.commit()
if con: con.close()

0人赞添加讨论(0) 举报

How to correctly insert utf-8 characters into a My

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间