Python: fastest way to write pandas DataFrame to E

2019-01-25 11:39发布

I need to export 24 pandas data frames ( 140 columns x 400 rows) to Excel, each into a different sheet.

I am using pandas’ built-in ExcelWriter. Running 24 scenarios, it takes:

51 seconds to write to an .xls file (using xlwt)

86 seconds to write to an .xlsx file (using XlsxWriter)

141 seconds to write to an .xlsm file (using openpyxl)

21 seconds to just run the program (no Excel output)

The problem with writing to .xls is that the spreadsheet contains no formatting styles, so if I open it in Excel, select a column, and click on the ‘comma’ button to format the numbers, it tells me: ‘style comma not found’. I don’t get this problem writing to an .xlsx, but that’s even slower.

Any suggestions on how to make the exporting faster? I can’t be the first one to have this problem, yet after hours of searching forums and websites I haven’t found any definite solution.

The only thing I can think of is to use Python to export to csv files, and then write an Excel macro to merge all the CSVs into a single spreadsheet.

The .xls file is 10 MB, and the .xlsx 5.2 MB

Thanks!

2条回答
兄弟一词,经得起流年.
2楼-- · 2019-01-25 12:10

For what it's worth, this is how I format the output in xlwt. The documentation is (or at least was) pretty spotty so I had to guess most of this!

import xlwt

style = xlwt.XFStyle()
style.font.name = 'Courier'
style.font.height = 180
style.num_format_str = '#,##0'

# ws0 is a worksheet
ws0.write( row, col, value, style )

Also, I believe I duplicated your error message when attempting to format the resulting spreadsheet in excel (office 2010 version). It's weird, but some of the drop down tool bar format options work and some don't. But it looks like they all work fine if I go to "format cells" via a right click.

查看更多
老娘就宠你
3楼-- · 2019-01-25 12:29

Here is a benchmark for different Python to Excel modules.

And here is the output for 140 columns x (400 x 24) rows using the latest version of the modules at the time of posting:

Versions:
    python      : 2.7.7
    openpyxl    : 2.0.5
    pyexcelerate: 0.6.3
    xlsxwriter  : 0.5.7
    xlwt        : 0.7.5

Dimensions:
    Rows = 9600 (400 x 24)
    Cols = 140

Times:
    pyexcelerate          :  11.85
    xlwt                  :  17.64
    xlsxwriter (optimised):  21.63
    xlsxwriter            :  26.76
    openpyxl   (optimised):  95.18
    openpyxl              : 119.29

As with any benchmark the results will depend on Python/module versions, CPU, RAM and Disk I/O and on the benchmark itself. So make sure to verify these results for your own setup.

Also, since you asked specifically about Pandas, please note that PyExcelerate isn't supported.

查看更多
登录 后发表回答