Unable to retrieve data after using dill or pickle

2020-05-07 05:28发布

问题:

I dumped a Jupyter Notebook session using dill.dump_session(filename), and at one point it told me that the disk storage was full. However, I made some space on the disk and tried again. Now, I am unable to load back the session using, dill.load_session(filename).

I get the following error:

~/.local/lib/python3.6/site-packages/dill/_dill.py in load_session(filename, main)
    408         unpickler._main = main
    409         unpickler._session = True
--> 410         module = unpickler.load()
    411         unpickler._session = False
    412         main.__dict__.update(module.__dict__)

 EOFError: Ran out of input

And the file (i.e. filename) is about 30 gigs in size of data.

How can I retrieve my data from the file?

BTW, I’m running all this on Google Cloud, and it’s costing me a fortune to keep the instance up and running.

I have tried using undill, and other unpickle methods.

For example I tried this:

 open(file, 'a').close()
      try:
     with open(file, "rb") as Score_file:
            unpickler = pickle.Unpickler(Score_file)
            scores = unpickler.load()
            return scores

But got this error:

      `6         with open(file, "rb") as Score_file:
       7             unpickler = pickle.Unpickler(Score_file);
 ----> 8             scores = unpickler.load();
       9 
      10             return scores

   ModuleNotFoundError: No module named '__builtin__'`

回答1:

I know this probably isn't the answer you want to hear, but... it sounds like you may have a corrupt pickle file. If that's the case, you can get the data back only if you edit it by hand, and can understand what the pickled strings are and how they are structured. Note that there are some very rare cases that an object will dump, but not load -- however, it's much more likely you have a corrupt file. Either way, the resolution is the same... a hand edit is the only way to potentially save what you have pickled.

Also, note that if you use dump_session, you really should use load_session (as it does a sequence of steps on top of a standard load, reversing what is done in dump_session) -- that's really irrelevant for the issue however, the issue likely is having an incomplete or corrupt pickle file.