csv & xlsx files import to pandas data frame: spee

2019-04-10 16:43发布

问题:

Reading data (just 20000 numbers) from a xlsx file takes forever:

import pandas as pd
xlsxfile = pd.ExcelFile("myfile.xlsx")
data = xlsxfile.parse('Sheet1', index_col = None, header = None)

takes about 9 seconds.

If I save the same file in csv format it takes ~25ms:

import pandas as pd
csvfile = "myfile.csv"
data = pd.read_csv(csvfile, index_col = None, header = None)

Is this an issue of openpyxl or am I missing something? Are there any alternatives?

回答1:

xlrd has support for .xlsx files, and this answer suggests that at least the beta version of xlrd with .xlsx support was quicker than openpyxl.

The current stable version of Pandas (11.0) uses openpyxl for .xlsx files, but this has been changed for the next release. If you want to give it a go, you can download the dev version from GitHub