Google App Engine TextProperty and UTF-8: When to

2019-04-01 00:05发布

问题:

I am on Google App Engine 2.5 with Django Template and Webapp Frame.

The db.TextProperty and UTF-8 and Unicode and Decode/Encode have confused me so much. I would really appreciate some experts can offer some suggestions. I have googled for the whole night and still have so many questions.

What I am trying to do:

[utf-8 form input] => [Python, Store in db.TextProperty] => [When Needed, Replace Japanese with English] => [HTML, UTF-8]

According to this answer Zipping together unicode strings in Python

# -*- coding: utf-8 -*-

and all .py files saved in utf-8 format

Here is my code:

#Model.py
class MyModel(db.Model):
  content = db.TextProperty()

#Main.py
def post(self):
    content=cgi.escape(self.request.get('content'))
    #what is the type of content? Unicode? Str? or Other?
    obj = MyModel(content=content)
    #obj = MyModel(content=unicode(content))
    #obj = MyModel(content=unicode(content,'utf-8'))
    #which one is the best?
    obj.put()

#Replace one Japanese word with English word in the content
content=obj.content
#what is the type of content here? db.Text? Unicode? Str? or Other?
#content=unicode(obj.content, 'utf-8') #Is this necessary?
content=content.replace(u'ひと',u'hito')

#Output to HTML
self.response.out.write(template.render(path, {'content':content})
#self.response.out.write(template.render(path, {'content':content.encode('utf-8')})

Hope some Google App Engine engineer can see this question and offer some help. Thanks a lot!

回答1:

First, read this. And this.

In a nutshell, whenever you're dealing with a text string in your app, it should be a unicode string. You should encode into a byte string (an instance of 'str' instead of 'unicode') when you want to send data as bytes - for instance, over HTTP, and you should decode from a byte string when you receive bytes that represent text (and you know their encoding). The only operations you should ever be doing on a byte string that contains encoded text are to decode or encode them.

Fortunately, most frameworks get this right; webapp and webapp2, for instance (I can see you're using webapp) should return unicode strings from all the request methods, and encode any strings you pass to them appropriately. Make sure all the strings you're responsible for are unicode, and you should be fine.

Note that a byte string can store any sort of data - encoded text, an executable, an image, random bytes, encrypted data, and so forth. Without metadata, such as the knowledge that it's text and what encoding it's in, you cannot sensibly do anything with it other than store and retrieve it.

Don't ever try to decode a unicode string, or encode a byte string; it will not do what you expect, and things will go horribly wrong.

Regarding the datastore, db.Text is a subclass of unicode; to all intents and purposes it is a unicode string - it's only different so the datastore can tell it shouldn't be indexed. Likewise, db.Blob is a subclass of str, for storing byte strings.



回答2:

Try

db.Text("text", encoding="utf-8")

it helps me to save the utf-8 text into the TextProperty()

for the details, please refer to the following link : https://developers.google.com/appengine/docs/python/datastore/typesandpropertyclasses?hl=en#Text