可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a pandas data frame, called df.

I want to save this in a gzipped format. One way to do this is the following:

import gzip
import pandas

df.save('filename.pickle')
f_in = open('filename.pickle', 'rb')
f_out = gzip.open('filename.pickle.gz', 'wb')
f_out.writelines(f_in)
f_in.close()
f_out.close()

However, this requires me to first create a file called filename.pickle. Is there a way to do this more directly, i.e., without creating the filename.pickle?

When I want to load the dataframe that has been gzipped I have to go through the same step of creating filename.pickle. For example, to read a file filename2.pickle.gzip, which is a gzipped pandas dataframe, I know of the following method:

f_in = gzip.open('filename2.pickle.gz', 'rb')
f_out = gzip.open('filename2.pickle', 'wb')
f_out.writelines(f_in)
f_in.close()
f_out.close()

df2 = pandas.load('filename2.pickle')

Can this be done without creating filename2.pickle first?

回答1:

We plan to add better serialization with compression eventually. Stay tuned to pandas development

回答2:

Better serialization with compression has recently been added to Pandas. (Starting in pandas 0.20.0.) Here is an example of how it can be used:

df.to_csv("my_file.gz", compression="gzip")

For more information, such as different forms of compression available, check out the docs.

回答3:

For some reason, the Python zlib module has the ability to decompress gzip data, but it does not have the ability to directly compress to that format. At least as far as what is documented. This is despite the remarkably misleading documentation page header "Compression compatible with gzip".

You can compress to the zlib format instead using zlib.compress or zlib.compressobj, and then strip the zlib header and trailer and add a gzip header and trailer, since both the zlib and gzip formats use the same compressed data format. This will give you data in the gzip format. The zlib header is fixed at two bytes and the trailer at four bytes, so those are easy to strip. Then you can prepend a basic gzip header of ten bytes: "\x1f\x8b\x08\0\0\0\0\0\0\xff" (C string format) and append a four-byte CRC in little-endian order. The CRC can be computed using zlib.crc32.

回答4:

You can dump dataframe into string using pickle.dumps and then write it on disk with import gzip

file = gzip.GzipFile('filename.pickle.gz', 'wb', 3)
file.write(pickle.dumps(df))
file.close()

How to save a pandas dataframe in gzipped format d

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

How to save a pandas dataframe in gzipped format d

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮