I have noticed that it may be caused by beautifulsoup or recursive data structure. however, the data structure that cause error seems no problem:
class Movie:
def __init__(self, name="", dscore=0, mscore=0, durl="", murl=""):
self.name = name
self.dscore = float(dscore)
self.mscore = float(mscore)
self.durl = durl
self.murl = murl
def __str__(self):
return unicode(self.name) + u' / ' + unicode(self.dscore) + u' / ' + unicode(self.mscore) \
+ u' / ' + unicode(self.durl) + u' / ' + unicode(self.murl)
The statement causing the problem is:
DataDict['MovieInfo'] = MovieInfo
and
pickle.dump(DataDict, f, True)
following is the function:
def SaveData():
global LinkUrlQueue
global MovieSet
global MovieInfo
global LinkUrlSet
global MovieUrlQueue
DataDict = {}
DataDict['LinkUrlSet'] = LinkUrlSet
DataDict['MovieSet'] = MovieSet
#DataDict['MovieInfo'] = MovieInfo
DataDict['LinkUrlQueue'] = LinkUrlQueue
DataDict['MovieUrlQueue'] = MovieUrlQueue
f = open('MovieInfo.txt', 'wb')
for item in MovieInfo:
f.write(item.__str__().encode('utf8') + '\n'.encode('utf8'))
f.close()
try:
print 'saving data...'
f = open('spider.dat', 'wb')
pickle.dump(DataDict, f, True)
f.close()
except IOError as e:
print 'IOError, error no: %d' % e.no
print 'saved to spider2.dat'
pickle.dump(DataDict, open('spider2.dat', 'wb'))
time.sleep(10)
my complete source code:
spider.py: http://paste.ubuntu.com/7149731/
fetch.py: http://paste.ubuntu.com/7149732/
You can just download and run.
Besides, welcome any coding style suggestions
Well... I finally solve the problem by myself...
The reason for this problem is that pickle cannot handle BEAUTIFULSOUP!!! Generally, it cannot handle html parser.
I realize that when passing arguments into my functions, I should convert them into str() or unicode() then do assignments, instead of remaining them as beautifulsoup objects...
thanks for everyone~