pymongo error: bson.errors.InvalidBSON: 'utf8&

2019-05-06 23:14发布

问题:

tasks = list(self.collection.find().sort('_id',pymongo.DESCENDING).limit(1000))

I had a trouble when i use pymongo to solve a program.

File "D:\Python27\lib\site-packages\pymongo-3.2.1-py2.7-win-amd64.egg\pymongo\cursor.py", line 1097, in next File "D:\Python27\lib\site-packages\pymongo-3.2.1-py2.7-win-amd64.egg\pymongo\cursor.py", line 1039, in _refresh File "D:\Python27\lib\site-packages\pymongo-3.2.1-py2.7-win-amd64.egg\pymongo\cursor.py", line 903, in __send_message File "D:\Python27\lib\site-packages\pymongo-3.2.1-py2.7-win-amd64.egg\pymongo\helpers.py", line 133, in _unpack_response bson.errors.InvalidBSON: 'utf8' codec can't decode byte 0xa1 in position 25: invalid start byte

tasks =self.collection.find().sort('_id',pymongo.DESCENDING).limit(1000)
for task in tasks:  #If i use this way,it will also touch this problem

task = self.collection.find_one()#It would do so,too

I step into pymongo to find out the reason.I find that the problem maybe cause by follow codes

    result = {"cursor_id": struct.unpack("<q", response[4:12])[0],
          "starting_from": struct.unpack("<i", response[12:16])[0],
          "number_returned": struct.unpack("<i", response[16:20])[0],
          "data": bson.decode_all(response[20:], codec_options)}

in pymongo helper.py 133 line in bson.decode_all it show the problem cause by the failed decode about 'oid','oid' is the _id in mongo.Then I copy the document and make a same document with a new _id,then i success get the document .

How can i solve the problem with the "for task in tasks:" style.

pymongo version used: 3.2.1

回答1:

I recently had a similar error message, and it is quite hard to find help about it.

Fast resolve

I solved my problem downgrading version of pymongo under 3.0. The changelog of pymongo advertise "A rewritten pure Python BSON implementation" at version 3.0. I found that the new implementation is having trouble managing python utf8 and unicode encoding when serializing in BSON format.

Analysing

It seams that the error comes from invalid bson from your DB ... similar to this. Maybe you should post your error there.



回答2:

I'm using Python 3.6, pymongo 3.4.0.

According to the documentation, you can clone a collection with the 'with_options' method, which does the trick for me:

 col_article = col_article.with_options(codec_options = bson.CodecOptions(unicode_decode_error_handler="ignore"))


回答3:

You need to pass the unicode_decode_error_handler argument to MongoClient and use pymongo 3.5.1 at least.

import pymongo
import json
from pymongo import MongoClient

if __name__ == '__main__':

    client = MongoClient(
        host="whatever_your_host_is",
        maxPoolSize=50,
        unicode_decode_error_handler='ignore'
    )


    my_db=client['my_db']
    collection=my_db['my_collection']

    cursor = collection.find({"whatever": "some_stuff"})

    for document in cursor:
          print(document)

Looks like that 'ignore' is set by default on Python 2.7, but in Python 3.6.1 you have to do it yourself. This will ignore the Unicode errors and let the cursor continue iterating, pymongo will try to do its best to reconstruct the JSON data.