Is there an established memoize on-disk decorator

2019-03-20 14:57发布

问题:

I have been searching a bit for a python module that offers a memoize decorator with the following capabilities:

  • Stores cache on disk to be reused among subsequent program runs.
  • Works for any pickle-able arguments, most importantly numpy arrays.
  • (Bonus) checks whether arguments are mutated in function calls.

I found a few small code snippets for this task and could probably implement one myself, but I would prefer having an established package for this task. I also found incpy, but that does not seem to work with the standard python interpreter.

Ideally, I would like to have something like functools.lru_cache plus cache storage on disk. Can someone point me to a suitable package for this?

回答1:

I don't know of any memoize decorator that takes care of all that, but you might want to have a look at ZODB. It's a persistence system built on top of pickle that provides some additional features including being able move objects from memory to disk when they aren't being used and the ability to save only objects that have been modified.

Edit: As a follow-up for the comment. A memoization decorator isn't supported out of the box by ZODB. However, I think you can:

  • Implement your own persistent class
  • Use a memoization decorator in the methods you need (any standard implementation should work, but it probably needs to be modified to make sure that the dirty bit is set)

After that, if you create an object of that class and add it to a ZODB database, when you execute one of the memoized methods, the object will be marked as dirty and changes will be saved to the database in the next transaction commit operation.



回答2:

I realize this is a 2-year-old question, and that this wouldn't count as an "established" decorator, but…

This is simple enough that you really don't need to worry about only using established code. The module's docs link to the source because, in addition to being useful in its own right, it works as sample code.

So, what do you need to add? Add a filename parameter. At run time, pickle.load the filename into the cache, using {} if it fails. Add a cache_save function that just pickle.saves the cache to the file under the lock. Attach that function to wrapper the same as the existing ones (cache_info, etc.).

If you want to save the cache automatically, instead of leaving it up to the caller, that's easy; it's just a matter of when to do so. Any option you come up with—atexit.register, adding a save_every argument so it saves every save_every misses, …—is trivial to implement. In this answer I showed how little work it takes. Or you can get a complete working version (to customize, or to use as-is) on GitHub.

There are other ways you could extend it—put some save-related statistics (last save time, hits and misses since last save, …) in the cache_info, copy the cache and save it in a background thread instead of saving it inline, etc. But I can't think of anything that would be worth doing that wouldn't be easy.