Python pandas persistent cache

Is there an implementation for python pandas that cache the data on disk so I can avoid to reproduce it every time?

In particular is there a caching method for get_yahoo_data for financial?

A very plus would be:

very few lines of code to write
possibility to integrate the persisted series when new data is downloaded for the same source

标签： pandas caching persistence financial

2条回答

Rolldiameter

2楼-- · 2019-07-05 03:27

There are many ways to achieve this, however probably the easiest way is to use the build in methods for writing and reading Python pickles. You can use pandas.DataFrame.to_pickle to store the DataFrame to disk and pandas.read_pickle to read the stored DataFrame from disk.

An example for a pandas.DataFrame:

# Store your DataFrame
df.to_pickle('cached_dataframe.pkl') # will be stored in current directory

# Read your DataFrame
df = pandas.read_pickle('cached_dataframe.pkl') # read from current directory

The same methods also work for pandas.Series:

# Store your Series
series.to_pickle('cached_series.pkl') # will be stored in current directory

# Read your DataFrame
series = pandas.read_pickle('cached_series.pkl') # read from current directory

0人赞添加讨论(0) 举报

再贱就再见

3楼-- · 2019-07-05 03:33

Depend on different requirements, there are a dozen of methods to do that, to and fro, in CSV, Excel, JSON, Python Pickle Format, HDF5 and even SQL with DB, etc.

In terms of code lines, to/read many of these formats are just one line of code for each direction. Python and Pandas already make the code as clean as possible, so you could worry less about that.

I think there is no single solution to fit all requirements, really case by case:

for human readability of saved data: CSV, Excel
for binary python object serialization (use-cases): Pickle
for data-interchange: JSON
for long-time and incrementally updating: SQL
etc.

And if you want to daily update the stock prices and for later usage, I prefer Pandas with SQL Queries, of course this will add few lines of code to set up DB connection:

from sqlalchemy import create_engine

new_data = getting_daily_price()
# You can also choose other db drivers instead of `sqlalchemy`
engine = create_engine('sqlite:///:memory:')
with engine.connect() as conn:
    new_data.to_sql('table_name', conn) # To Write
    df = pd.read_sql_table('sql_query', conn) # To Read

0人赞添加讨论(0) 举报

Python pandas persistent cache

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间