Python cPickle: load fails with UnpicklingError

2019-05-31 17:15发布

问题:

I've made a pickle file using the following.

from PIL import Image
import pickle
import os
import numpy
import time

trainpixels = numpy.empty([80000,6400])
trainlabels = numpy.empty(80000)
validpixels = numpy.empty([10000,6400])
validlabels = numpy.empty(10000)
testpixels = numpy.empty([10408,6400])
testlabels = numpy.empty(10408)

i=0
tr=0
va=0
te=0
for (root, dirs, filenames) in os.walk(indir1):
    print 'hello'
    for f in filenames:
            try:
                    im = Image.open(os.path.join(root,f))
                    Imv=im.load()
                    x,y=im.size
                    pixelv = numpy.empty(6400)
                    ind=0
                    for ii in range(x):
                            for j in range(y):
                                    temp=float(Imv[j,ii])
                                    temp=float(temp/255.0)
                                    pixelv[ind]=temp
                                    ind+=1
                    if i<40000:
                            trainpixels[tr]=pixelv
                            tr+=1
                    elif i<45000:
                            validpixels[va]=pixelv
                            va+=1
                    else:
                            testpixels[te]=pixelv
                            te+=1
                    print str(i)+'\t'+str(f)
                    i+=1
            except IOError:
                    continue
trainimage=(trainpixels,trainlabels)
validimage=(validpixels,validlabels)
testimage=(testpixels,testlabels)

output=open('data.pkl','wb')

pickle.dump(trainimage,output)
pickle.dump(validimage,output)
pickle.dump(testimage,output)

Now I'm unpickling with load_data() function of the following code: http://www.deeplearning.net/tutorial/code/logistic_sgd.py which is called by running http://www.deeplearning.net/tutorial/code/rbm.py

but it returns the following error.

cPickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.

It seems like data structure is unmatched, but I can' figure out how it should be..

For reference, the size of the pickle file is over 16GB, with its gzip over 1GB

回答1:

I've found that pickling and unpickling is smart. Here you don't unpickle the same way you pickle, so it cannot work. In your code you pickle objects one after the other in the same file. You pickled three times to the same file. If you want to read them back, you have to make sequential reading. What you have to do is open the file for unpickling, then pickle.load each of your objects sequentially.

with gzip.open(dataset, 'rb') as f:
    train_set = cPickle.load(f)
    valid_set = cPickle.load(f)
    test_set = cPickle.load(f)

You might want to try a simpler code where train_set, valid_set, test_set (do the pickling and unpickling with gzip) are simple picklable objects, just to be sure.