Smart caching of expensive objects in Python

2019-06-07 23:03发布

I have a directory of images in order. Typically my code will be using data from a sequential subset of images (e.g. images 5-10), and the naive options for accessing these are:

  1. Create a wrapper object with a method that loads the image when needed and reads my data (e.g. a pixel value). This has little memory overhead but will be slow as it will need to load each image every time.

  2. Store all the images in memory. This will be fast but obviously there's a limit to how many images we can store.

I would like to find:

  • Some method by which I can define how to read the image corresponding to an index or a path, and then allows me to access, say magic_image_collection[index] without me having to worry about whether it's going to return the object in memory or read it afresh. This would ideally keep the appropriate images or the n most recently accessed images in memory.

2条回答
We Are One
2楼-- · 2019-06-07 23:37

You can extend the default dict and use __missing__ method to call a loading function if the key is missing:

class ImageDict(dict):
    def __missing__(self, key):
        self[key] = img = self.load(key)
        return img
    def load(self, key):
        # create a queue if not exist (could be moved to __init__)
        if not hasattr(self, '_queue'):
            self._queue = []
        # pop the oldest entry in the list and the dict
        if len(self._queue) >= 100:
            self.pop(self._queue.pop(0))
        # append this key as a newest entry in the queue
        self._queue.append(key)
        # implement image loading here and return the image instance
        print 'loading', key
        return 'Image for %s' % key

And the output (the loading happen only when the key doesn't exist yet.)

>>> d = ImageDict()
>>> d[3]
loading 3
'Image for 3'
>>> d[3]
'Image for 3'
>>> d['bleh']
loading bleh
'Image for bleh'
>>> d['bleh']
'Image for bleh'

One evolution would be to store only the N last element in the dict, and purge the oldest entries. You can implement it by keeping a list of keys for ordering.

查看更多
Animai°情兽
3楼-- · 2019-06-07 23:43

Weakrefs aren't what you want -- weakrefs are a way to reference an item that allows the garbage collector to collect (i.e. destroy) the referent if only weakrefs to it exist. In other words, if you create and store only weakrefs to some object, it is likely to be garbage collected quickly, and you won't have benefitted from it.

I'd go with your option #1 above. On modern operating systems, the OS maintains an in-memory cache of recently accessed files (or parts thereof), which will mean that you'll have to bear the cost of loading the file from disk once, but after that, subsequent accesses to the file will be as fast (or nearly so) as if it were in memory in your application. The FS cache is usually a LRU-style cache, so frequently-accessed items will tend to stay in memory, while infrequently accessed items will tend to be evicted (and will subsequently be loaded from disk if needed). In most cases, it is sufficient to rely on the operating system's implementation of this sort of logic, rather than writing your own (especially since you don't have to write and maintain code to do it!)

查看更多
登录 后发表回答