Design of a python pickleable object that describe

2019-05-09 21:41发布

I would like to create a class that describes a file resource and then pickle it. This part is straightforward. To be concrete, let's say that I have a class "A" that has methods to operate on a file. I can pickle this object if it does not contain a file handle. I want to be able to create a file handle in order to access the resource described by "A". If I have an "open()" method in class "A" that opens and stores the file handle for later use, then "A" is no longer pickleable. (I add here that opening the file includes some non-trivial indexing which cannot be cached--third party code--so closing and reopening when needed is not without expense). I could code class "A" as a factory that can generate file handles to the described file, but that could result in multiple file handles accessing the file contents simultaneously. I could use another class "B" to handle the opening of the file in class "A", including locking, etc. I am probably overthinking this, but any hints would be appreciated.

标签: python pickle
3条回答
We Are One
2楼-- · 2019-05-09 22:25

The question isn't too clear; what it looks like is that:

  • you have a third-party module which has picklable classes
  • those classes may contain references to files, which makes the classes themselves not picklable because open files aren't picklable.

Essentially, you want to make open files picklable. You can do this fairly easily, with certain caveats. Here's an incomplete but functional sample:

import pickle
class PicklableFile(object):
    def __init__(self, fileobj):
        self.fileobj = fileobj

    def __getattr__(self, key):
        return getattr(self.fileobj, key)

    def __getstate__(self):
        ret = self.__dict__.copy()
        ret['_file_name'] = self.fileobj.name
        ret['_file_mode'] = self.fileobj.mode
        ret['_file_pos'] = self.fileobj.tell()
        del ret['fileobj']
        return ret

    def __setstate__(self, dict):
        self.fileobj = open(dict['_file_name'], dict['_file_mode'])
        self.fileobj.seek(dict['_file_pos'])
        del dict['_file_name']
        del dict['_file_mode']
        del dict['_file_pos']
        self.__dict__.update(dict)

f = PicklableFile(open("/tmp/blah"))
print f.readline()
data = pickle.dumps(f)
f2 = pickle.loads(data)
print f2.read()

Caveats and notes, some obvious, some less so:

  • This class should operate directly on the file object you got from open. If you're using wrapper classes on files, like gzip.GzipFile, those should go above this, not below it. Logically, treat this as a decorator class on top of file.
  • If the file doesn't exist when you unpickle, it can't be unpickled and will throw an exception.
  • If it's a different file, the behavior may or may not make sense.
  • If the file mode includes file creation ('w+'), and the file doesn't exist, it'll be created; we don't know what file permissions to use, since that's not stored with the file. If this is important--it probably shouldn't be--then store the correct permissions in the class when you first create it.
  • If the file isn't seekable, trying to seek to the old position may raise IOError; if you're using a file like that you'll need to decide how to handle that.
  • The file classes in Python 2 and Python 3 are different; there's no file class in Python 3. Even if you're only using Python 2 right now, don't subclass file.

I'd steer away from doing this; having pickled data dependent on external files not changing and staying in the same place is brittle. This makes it difficult to even relocate files, since your pickled data won't make sense.

查看更多
3楼-- · 2019-05-09 22:30

If you open a pointer to a file, pickle it, then attempt to reconstitute is later, there is no guarantee that file will still be available for opening.

To elaborate, the file pointer really represents a connection to the file. Just like a database connection, you can't "pickle" the other end of the connection, so this won't work.

Is it possible to keep the file pointer around in memory in its own process instead?

查看更多
再贱就再见
4楼-- · 2019-05-09 22:39

It sounds like you know you can't pickle the handle, and you're ok with that, you just want to pickle the part that can be pickled. As your object stands now, it can't be pickled because it has the handle. Do I have that right? If so, read on.

The pickle module will let your class describe its own state to pickle, for exactly these cases. You want to define your own __getstate__ method. The pickler will invoke it to get the state to be pickled, only if the method is missing does it go ahead and do the default thing of trying to pickle all the attributes.

查看更多
登录 后发表回答