I need to export 24 pandas data frames ( 140 columns x 400 rows) to Excel, each into a different sheet.
I am using pandas’ built-in ExcelWriter
. Running 24 scenarios, it takes:
51 seconds to write to an .xls
file (using xlwt
)
86 seconds to write to an .xlsx
file (using XlsxWriter
)
141 seconds to write to an .xlsm
file (using openpyxl
)
21 seconds to just run the program (no Excel output)
The problem with writing to .xls
is that the spreadsheet contains no formatting styles, so if I open it in Excel, select a column, and click on the ‘comma’ button to format the numbers, it tells me: ‘style comma not found’. I don’t get this problem writing to an .xlsx
, but that’s even slower.
Any suggestions on how to make the exporting faster? I can’t be the first one to have this problem, yet after hours of searching forums and websites I haven’t found any definite solution.
The only thing I can think of is to use Python to export to csv files, and then write an Excel macro to merge all the CSVs into a single spreadsheet.
The .xls
file is 10 MB, and the .xlsx
5.2 MB
Thanks!
For what it's worth, this is how I format the output in xlwt. The documentation is (or at least was) pretty spotty so I had to guess most of this!
Also, I believe I duplicated your error message when attempting to format the resulting spreadsheet in excel (office 2010 version). It's weird, but some of the drop down tool bar format options work and some don't. But it looks like they all work fine if I go to "format cells" via a right click.
Here is a benchmark for different Python to Excel modules.
And here is the output for 140 columns x (400 x 24) rows using the latest version of the modules at the time of posting:
As with any benchmark the results will depend on Python/module versions, CPU, RAM and Disk I/O and on the benchmark itself. So make sure to verify these results for your own setup.
Also, since you asked specifically about Pandas, please note that PyExcelerate isn't supported.