multiprocessing.pool.MaybeEncodingError: Error sen

2020-07-27 03:06发布

问题:

I get the following error:

multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f758760d6a0>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object",)'

When running this code:

from operator import itemgetter
from multiprocessing import Pool
import wget


def f(args):
    print(args[1])
    wget.download(args[1], "tests/" + target + '/' + str(args[0]), bar=None)

if __name__ == "__main__":
    a = Pool(2)
    a.map(f, list(enumerate(urls))) #urls is a list of urls.

What does the error mean and how can I fix it?

回答1:

First couple of advices:

You should always check how well is project maintained. Apparently wget package is not.
You should check which libs is package using, in case something like this happens.

Now, to the issue.

Apparently wget uses urllib.request for making request. After some testing, I concluded that it doesn't handle all HTTP status codes. More specifically, it somehow breaks when HTTP status is, for example, 304. This is why you have to use libraries with higher level interface. Even the urllib.request says this in official documentation:

The Requests package is recommended for a higher-level HTTP client interface.

So, without further ado, here is the working snippet.

You can just update with where you want to save files.

from multiprocessing import Pool

import shutil
import requests


def f(args):
    print(args)
    req = requests.get(args[1], stream=True)
    with open(str(args[0]), 'wb') as f:
        shutil.copyfileobj(req.raw, f)

if __name__ == "__main__":
    a = Pool(2)
    a.map(f, enumerate(urls))  # urls is a list of urls.

shutil lib is used for file manipulation. In this case, to stream the data to a file object.