I have a soup from BeautifulSoup
that I cannot pickle. When I try to pickle the object the python interpreter silently crashes (such that it cannot be handled as an exception). I have to be able to pickle the object in order to return the object using the multiprocessing
package (which pickles objects to pass them between processes). How can I troubleshoot/work around the problem? Unfortunately, I cannot post the html for the page (it is not publicly available), and I have been unable to find a reproducible example of the problem. I have tried to isolate the problem by looping over the soup and pickling individual components, the smallest thing that produces the error is <class 'BeautifulSoup.NavigableString'>
. When I print the object it prints out u'\n'
.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
In fact, as suggested by dekomote, you have only to take advantadge that you can allways convert a soup to an unicode string and then back again the unicode string to a soup.
So IMHO you should not try to pass soup object through the multiprocessing package, but simply the strings representing the soups.
If you do not need the beautiful soup object itself, but some product of the soup, i.e. a text string, you can remove BeautifulSoup attributes from your larger object before pickling by adding the following code to your class definition:
The class
NavigableString
is not serializable withpickle
orcPickle
, whichmultiprocessing
uses. You should be able to serialize this class withdill
, however.dill
has a superset of thepickle
interface, and can serialize most of python.multiprocessing
will still fail, unless you use a fork ofmultiprocessing
which usesdill
, calledpathos.multiprocessing
.Get the code here: https://github.com/uqfoundation.
For more information see: What can multiprocessing and dill do together?
http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
http://nbviewer.ipython.org/gist/minrk/5241793