I run SimpleHTTPServer in Python3.6.4 64bit by this command:
python -m http.server --cgi
then I make a form in test.py, submit it to test_form_action.py to print the input text.
cgi-bin/test.py
# coding=utf-8
from __future__ import unicode_literals, absolute_import
print("Content-Type: text/html") # HTML is following
print()
reshtml = '''<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8"/>
</head>
<body>
<div style="text-align: center;">
<form action="/cgi-bin/test_form_action.py" method="POST"
target="_blank">
输入:<input type="text" id= "id" name="name"/></td>
<button type="submit">Submit</button>
</form>
</div>
</body>
</html>'''
print(reshtml)
cgi-bin/test_form_action.py
# coding=utf-8
from __future__ import unicode_literals, absolute_import
# Import modules for CGI handling
import cgi, cgitb
cgitb.enable()
if __name__ == '__main__':
print("Content-Type: text/html") # HTML is following
print()
form = cgi.FieldStorage()
print(form)
id = form.getvalue("id")
name = form.getvalue("name")
print(id)
When I visit http://127.0.0.1:8000/cgi-bin/test.py, The Chinese Character "输入" doesn't show right, it look like "����", I have to manually change the Text Encoding of this page from "Unicode" to "Chinese Simplified" in Firefox to make Chinese Character look normal.
It's weird, since I put charset="utf-8" in cgi-bin/test.py.
Further more, when I put some Chinese in input form, and submit. But cgi-bin/test_form_action.py is blank.
meanwhile some error show in windows terminal where I run SimpleHTTPServer:
127.0.0.1 - - [23/Mar/2018 23:43:32] b'Error in sys.excepthook:\r\nTraceback (most recent call last):\r\n File "E:\Python\Python36\Lib\cgitb.py", line 26 8, in call\r\n
self.handle((etype, evalue, etb))\r\n File "E:\Python\Python36\Lib\cgitb.py", line 288, in handle\r\n
self.file.write(doc + \'\ n\')\r\nUnicodeEncodeError: \'gbk\' codec can\'t encode character \'\ufffd\' in position 1894: illegal multibyte sequence\r\n\r\nOriginal exception was:\r\nT raceback (most recent call last):\r\n File "G:\Python\Project\VideoHelper\cgi-bin\test_form_action.py", line 13, in \r\n print(form)\r\nUnico deEncodeError: \'gbk\' codec can\'t encode character \'\ufffd\' in position 52: illegal multibyte sequence\r\n' 127.0.0.1 - - [23/Mar/2018 23:43:32] CGI script exit status 0x1
When you use the
print()
expression, Python converts the strings to bytes, ie. itencode
s them using a default codec. The choice of this default value depends on the environment – in your case it seems to be GBK (judging from the error message).In the HTML page your CGI script returns, you specify the codec ("charset") as UTF-8. You can of course change this to GBK, but it will only solve your first problem (display of test.py), not the second one (encoding error in test_form_action.py). Instead, it's probably better to get Python to send UTF-8-encoded data on STDOUT.
One approach is to replace all occurrences of
with
Alternatively, you can replace
sys.stdout
with a re-encoded wrapper, without changing theprint()
occurrences:Note: These two solutions don't work in Python 2.x (you'd have to omit the
.buffer
part there). I'm writing this because your code hasfrom __future__ import
statements, which have no use in code that is run with Python 3 exclusively.