Is there an implementation for python pandas that cache the data on disk so I can avoid to reproduce it every time?
In particular is there a caching method for get_yahoo_data
for financial?
A very plus would be:
- very few lines of code to write
- possibility to integrate the persisted series when new data is downloaded for the same source
There are many ways to achieve this, however probably the easiest way is to use the build in methods for writing and reading Python pickles. You can use
pandas.DataFrame.to_pickle
to store the DataFrame to disk andpandas.read_pickle
to read the stored DataFrame from disk.An example for a
pandas.DataFrame
:The same methods also work for
pandas.Series
:Depend on different requirements, there are a dozen of methods to do that, to and fro, in CSV, Excel, JSON, Python Pickle Format, HDF5 and even SQL with DB, etc.
In terms of code lines,
to/read
many of these formats are just one line of code for each direction. Python and Pandas already make the code as clean as possible, so you could worry less about that.I think there is no single solution to fit all requirements, really case by case:
And if you want to daily update the stock prices and for later usage, I prefer Pandas with SQL Queries, of course this will add few lines of code to set up DB connection: