How to obtain sheet names from XLS files without l

2019-01-31 17:39发布

问题:

I'm currently using pandas to read an Excel file and present its sheet names to the user, so he can select which sheet he would like to use. The problem is that the files are really big (70 columns x 65k rows), taking up to 14s to load on a notebook (the same data in a CSV file is taking 3s).

My code in panda goes like this:

xls = pandas.ExcelFile(path)
sheets = xls.sheet_names

I tried xlrd before, but obtained similar results. This was my code with xlrd:

xls = xlrd.open_workbook(path)
sheets = xls.sheet_names

So, can anybody suggest a faster way to retrieve the sheet names from an Excel file than reading the whole file?

回答1:

you can use the xlrd library and open the workbook with the "on_demand=True" flag, so that the sheets won't be loaded automaticaly.

Than you can retrieve the sheet names in a similar way to pandas:

import xlrd
xls = xlrd.open_workbook(r'<path_to_your_excel_file>', on_demand=True)
print xls.sheet_names() # <- remeber: xlrd sheet_names is a function, not a property


回答2:

As pandas uses xlrd for reading Excel sheets and passes all keyword arguments from pandas.ExcelFile to xlrd.open_workbook, pd.ExcelFile accepts the on_demand flag as well. Therefore, it is not necessary to load xlrd separately:

xls = pandas.ExcelFile(path, on_demand = True)
sheets = xls.sheet_names