Which is better, using Pandas built-in method or pickle.dump
?
The standard pickle method looks like this:
pickle.dump(my_dataframe, open('test_pickle.p', 'wb'))
The Pandas built-in method looks like this:
my_dataframe.to_pickle('test_pickle.p')
Thanks to @qwwqwwq I discovered that pandas has a built-in to_pickle
method for dataframes. I did a quick time test:
In [1]: %timeit pickle.dump(df, open('test_pickle.p', 'wb'))
10 loops, best of 3: 91.8 ms per loop
In [2]: %timeit df.to_pickle('testpickle.p')
10 loops, best of 3: 88 ms per loop
So it seems that the built-in is only narrowly better (to me, this is useful because it means it's probably not worth refactoring code to use the built-in) - hope this helps someone!