memoize to disk - python - persistent memoization

2019-02-03 00:26发布

Is there a way to memoize the output of a function to disk?

I have a function

def getHtmlOfUrl(url):
    ... # expensive computation

and would like to do something like:

def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")

and then call getHtmlMemoized(url), so as to do the expensive computation only once for each url.

7条回答
Anthone
2楼-- · 2019-02-03 01:06

You can use the cache_to_disk package:

    from cache_to_disk import cache_to_disk

    @cache_to_disk(3)
    def my_func(a, b, c, d=None):
        results = ...
        return results

This will cache the results for 3 days, specific to the arguments a, b, c and d. The results are stored in a pickle file on your machine, and unpickled and returned next time the function is called. After 3 days, the pickle file is deleted until the function is re-run. The function will be re-run whenever the function is called with new arguments. More info here: https://github.com/sarenehan/cache_to_disk

查看更多
在下西门庆
3楼-- · 2019-02-03 01:14

Something like this should do:

import json

class Memoize(object):
    def __init__(self, func):
        self.func = func
        self.memo = {}

    def load_memo(filename):
        with open(filename) as f:
            self.memo.update(json.load(f))

    def save_memo(filename):
        with open(filename, 'w') as f:
            json.dump(self.memo, f)

    def __call__(self, *args):
        if not args in self.memo:
            self.memo[args] = self.func(*args)
        return self.memo[args]

Basic usage:

your_mem_func = Memoize(your_func)
your_mem_func.load_memo('yourdata.json')
#  do your stuff with your_mem_func

If you want to write your "cache" to a file after using it -- to be loaded again in the future:

your_mem_func.save_memo('yournewdata.json')
查看更多
戒情不戒烟
4楼-- · 2019-02-03 01:16

Assuming that you data is json serializable, this code should work

import os, json

def json_file(fname):
    def decorator(function):
        def wrapper(*args, **kwargs):
            if os.path.isfile(fname):
                with open(fname, 'r') as f:
                    ret = json.load(f)
            else:
                with open(fname, 'w') as f:
                    ret = function(*args, **kwargs)
                    json.dump(ret, f)
            return ret
        return wrapper
    return decorator

decorate getHtmlOfUrl and then simply call it, if it had been run previously, you will get your cached data.

Checked with python 2.x and python 3.x

查看更多
倾城 Initia
5楼-- · 2019-02-03 01:21

A cleaner solution powered by Python's Shelve module. The advantage is the cache gets updated in real time with out well-known dict syntax, also it's which is exception proof(no need to handle annoying KeyError).

import shelve
def shelve_it(file_name):
    d = shelve.open(file_name)

    def decorator(func):
        def new_func(param):
            if param not in d:
                d[param] = func(param)
            return d[param]

        return new_func

    return decorator

@shelve_it('cache.shelve')
def expensive_funcion(param):
    pass

This will facilitate the function to be computed just once. Next subsequent calls for the same param will return the stored result.

查看更多
The star\"
6楼-- · 2019-02-03 01:23

Python offers a very elegant way to do this - decorators. Basically, a decorator is a function that wraps another function to provide additional functionality without changing the function source code. Your decorator can be written like this:

import json

def persist_to_file(file_name):

    def decorator(original_func):

        try:
            cache = json.load(open(file_name, 'r'))
        except (IOError, ValueError):
            cache = {}

        def new_func(param):
            if param not in cache:
                cache[param] = original_func(param)
                json.dump(cache, open(file_name, 'w'))
            return cache[param]

        return new_func

    return decorator

Once you've got that, 'decorate' the function using @-syntax and you're ready.

@persist_to_file('cache.dat')
def html_of_url(url):
    your function code...

Note that this decorator is intentionally simplified and may not work for every situation, for example, when the source function accepts or returns data that cannot be json-serialized.

More on decorators: How to make a chain of function decorators?

And here's how to make the decorator save the cache just once, at exit time:

import json, atexit

def persist_to_file(file_name):

    try:
        cache = json.load(open(file_name, 'r'))
    except (IOError, ValueError):
        cache = {}

    atexit.register(lambda: json.dump(cache, open(file_name, 'w')))

    def decorator(func):
        def new_func(param):
            if param not in cache:
                cache[param] = func(param)
            return cache[param]
        return new_func

    return decorator
查看更多
劳资没心,怎么记你
7楼-- · 2019-02-03 01:28

The Artemis library has a module for this. (you'll need to pip install artemis-ml)

You decorate your function:

from artemis.fileman.disk_memoize import memoize_to_disk

@memoize_to_disk
def fcn(a, b, c = None):
    results = ...
    return results

Internally, it makes a hash out of input arguments and saves memo-files by this hash.

查看更多
登录 后发表回答