msgpack
in Pandas is supposed to be a replacement for pickle
.
Per the Pandas docs on msgpack:
This is a lightweight portable binary format, similar to binary JSON, that is highly space efficient, and provides good performance both on the writing (serialization), and reading (deserialization).
I find, however, that its performance does not appear to stack up against pickle.
df = pd.DataFrame(np.random.randn(10000, 100))
>>> %timeit df.to_pickle('test.p')
10 loops, best of 3: 22.4 ms per loop
>>> %timeit df.to_msgpack('test.msg')
10 loops, best of 3: 36.4 ms per loop
>>> %timeit pd.read_pickle('test.p')
100 loops, best of 3: 10.5 ms per loop
>>> %timeit pd.read_msgpack('test.msg')
10 loops, best of 3: 24.6 ms per loop
Question: Asides from potential security issues with pickle, what are the benefits of msgpack over pickle? Is pickle still the preferred method of serializing data, or do better alternatives currently exist?