I get the following error:
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f758760d6a0>'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object",)'
When running this code:
from operator import itemgetter
from multiprocessing import Pool
import wget
def f(args):
print(args[1])
wget.download(args[1], "tests/" + target + '/' + str(args[0]), bar=None)
if __name__ == "__main__":
a = Pool(2)
a.map(f, list(enumerate(urls))) #urls is a list of urls.
What does the error mean and how can I fix it?
First couple of advices:
- You should always check how well is project maintained. Apparently
wget
package is not.
- You should check which libs is package using, in case something like this happens.
Now, to the issue.
Apparently wget
uses urllib.request
for making request. After some testing, I concluded that it doesn't handle all HTTP status codes. More specifically, it somehow breaks when HTTP status is, for example, 304
. This is why you have to use libraries with higher level interface. Even the urllib.request
says this in official documentation:
The Requests package is recommended for a higher-level HTTP client interface.
So, without further ado, here is the working snippet.
You can just update with where you want to save files.
from multiprocessing import Pool
import shutil
import requests
def f(args):
print(args)
req = requests.get(args[1], stream=True)
with open(str(args[0]), 'wb') as f:
shutil.copyfileobj(req.raw, f)
if __name__ == "__main__":
a = Pool(2)
a.map(f, enumerate(urls)) # urls is a list of urls.
shutil
lib is used for file manipulation. In this case, to stream the data to a file object.