So for the past few days I've been trying to learn Python in App Engine. However, I've been encountering a number of problems with ASCII and UTF encoding. The freshest issue is as follows:
I have the following piece of code of a simplistic chatroom from the book 'Code in the Cloud'
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import datetime
# START: MainPage
class ChatMessage(object):
def __init__(self, user, msg):
self.user = user
self.message = msg
self.time = datetime.datetime.now()
def __str__(self):
return "%s (%s): %s" % (self.user, self.time, self.message)
Messages = []
class ChatRoomPage(webapp.RequestHandler):
def get(self):
self.response.headers["Content-Type"] = "text/html"
self.response.out.write("""
<html>
<head>
<title>MarkCC's AppEngine Chat Room</title>
</head>
<body>
<h1>Welcome to MarkCC's AppEngine Chat Room</h1>
<p>(Current time is %s)</p>
""" % (datetime.datetime.now()))
# Output the set of chat messages
global Messages
for msg in Messages:
self.response.out.write("<p>%s</p>" % msg)
self.response.out.write("""
<form action="" method="post">
<div><b>Name:</b>
<textarea name="name" rows="1" cols="20"></textarea></div>
<p><b>Message</b></p>
<div><textarea name="message" rows="5" cols="60"></textarea></div>
<div><input type="submit" value="Send ChatMessage"></input></div>
</form>
</body>
</html>
""")
# END: MainPage
# START: PostHandler
def post(self):
chatter = self.request.get("name")
msg = self.request.get("message")
global Messages
Messages.append(ChatMessage(chatter, msg))
# Now that we've added the message to the chat, we'll redirect
# to the root page, which will make the user's browser refresh to
# show the chat including their new message.
self.redirect('/')
# END: PostHandler
# START: Frame
chatapp = webapp.WSGIApplication([('/', ChatRoomPage)])
def main():
run_wsgi_app(chatapp)
if __name__ == "__main__":
main()
# END: Frame
It works ok in English. However, the moment I add some non-standard characters all sorts of problems start
First of all, in order for the thing to be actually able to display characters in HTML I add meta tag - charset=UTF-8" etc
Curiously, if you enter non-standard letters, the program processes them nicely, and displays them with no issues. However, it fails to load if I enter any non-ascii letters to the web layout iteself withing the script. I figured out that adding utf-8 encoding line would work. So I added (# -- coding: utf-8 --). This was not enough. Of course I forgot to save the file in UTF-8 format. Upon that the program started running.
That would be the good end to the story, alas....
It doesn't work
Long story short this code:
# -*- coding: utf-8 -*-
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
import datetime
# START: MainPage
class ChatMessage(object):
def __init__(self, user, msg):
self.user = user
self.message = msg
self.time = datetime.datetime.now()
def __str__(self):
return "%s (%s): %s" % (self.user, self.time, self.message)
Messages = []
class ChatRoomPage(webapp.RequestHandler):
def get(self):
self.response.headers["Content-Type"] = "text/html"
self.response.out.write("""
<html>
<head>
<title>Witaj w pokoju czatu MarkCC w App Engine</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<h1>Witaj w pokoju czatu MarkCC w App Engine</h1>
<p>(Dokladny czas Twojego logowania to: %s)</p>
""" % (datetime.datetime.now()))
# Output the set of chat messages
global Messages
for msg in Messages:
self.response.out.write("<p>%s</p>" % msg)
self.response.out.write("""
<form action="" method="post">
<div><b>Twój Nick:</b>
<textarea name="name" rows="1" cols="20"></textarea></div>
<p><b>Twoja Wiadomość</b></p>
<div><textarea name="message" rows="5" cols="60"></textarea></div>
<div><input type="submit" value="Send ChatMessage"></input></div>
</form>
</body>
</html>
""")
# END: MainPage
# START: PostHandler
def post(self):
chatter = self.request.get(u"name")
msg = self.request.get(u"message")
global Messages
Messages.append(ChatMessage(chatter, msg))
# Now that we've added the message to the chat, we'll redirect
# to the root page, which will make the user's browser refresh to
# show the chat including their new message.
self.redirect('/')
# END: PostHandler
# START: Frame
chatapp = webapp.WSGIApplication([('/', ChatRoomPage)])
def main():
run_wsgi_app(chatapp)
if __name__ == "__main__":
main()
# END: Frame
Fails to process anything I write in the chat application when it's running. It loads but the moment I enter my message (even using only standard characters) I receive
File "D:\Python25\lib\StringIO.py", line 270, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 64: ordinal not in range(128)
error message. In other words, if I want to be able to use any characters within the application I cannot put non-English ones in my interface. Or the other way round, I can use non-English characters within the app only if I don't encode the file in utf-8. How to make it all work together?
Your strings contain unicode characters, but they're not unicode strings, they're byte strings. You need to prefix each one with
u
(as inu"foo"
) in order to make them into unicode strings. If you ensure all your strings are Unicode strings, you should eliminate that error.You should also specify the encoding in the
Content-Type
header rather than a meta tag, like this:Note your life would be a lot easier if you used a templating system instead of writing HTML inline with your Python code.
@Thomas K. Thank you for your guidance here. Thanks to you I was able to come up with, maybe - as you said - a little roudabout solution - so the credit for the answer should go to you. The following line of code:
Should look like this:
Basically I have to encode all the utf-8 string to ascii.