Say I have a function that runs a SQL query and returns a dataframe:
import pandas.io.sql as psql
import sqlalchemy
query_string = "select a from table;"
def run_my_query(my_query):
# username, host, port and database are hard-coded here
engine = sqlalchemy.create_engine('postgresql://{username}@{host}:{port}/{database}'.format(username=username, host=host, port=port, database=database))
df = psql.read_sql(my_query, engine)
return df
# Run the query (this is what I want to memoize)
df = run_my_query(my_query)
I would like to:
- Be able to memoize my query above with one cache entry per value of
query_string
(i.e. per query) - Be able to force a cache reset on demand (e.g. based on some flag), e.g. so that I can update my cache if I think that the database has changed.
Yes, you can do this with joblib (this example basically pastes itself):
You can clear the cache using
memory.clear()
.Note you could also use
lru_cache
or even "manually" with a simple dict:You could clear the cache with
run_my_query.func_defaults[0].clear()
(not sure I'd recommend this though, just thought it was a fun example).