mongodb insertion shows 'strings in documents

2019-07-24 13:20发布

问题:

this is my code

        for code, data in dict_data.items(): 

            try:
                collection2.insert({'_id':code,'data':data})

            except Exception as e:
                print code,'>>>>>>>', str(e)
                sys.exit()

it exited with

         524715 >>>>>>> strings in documents must be valid UTF-8

I could find out the error only by the try catch method. dict_data is a large dictionary which contains calculated values from other collection.

how can i fix this?

thanks

回答1:

If you are using PyMongo and Python 2.x, you should use str in utf-8 or unicode strings. See: http://api.mongodb.org/python/current/tutorial.html#a-note-on-unicode-strings

If datais a dict with multiple strings you can convert all of them to unicode using following function:

def convert2unicode(mydict):
    for k, v in mydict.iteritems():
        if isinstance(v, str):
            mydict[k] = unicode(v, errors = 'replace')
        elif isinstance(v, dict):
            convert2unicode(v)

for code, data in dict_data.items(): 
    try:
        convert2unicode(data)
        collection2.insert({'_id':code,'data': data})
    except Exception as e:
        print code,'>>>>>>>', str(e)
        sys.exit()

Previous code will convert all str values in unicode, the "keys" keep untouched, depending on root cause you should also convert the "keys".