I'm currently using pandas to read an Excel file and present its sheet names to the user, so he can select which sheet he would like to use. The problem is that the files are really big (70 columns x 65k rows), taking up to 14s to load on a notebook (the same data in a CSV file is taking 3s).
My code in panda goes like this:
xls = pandas.ExcelFile(path)
sheets = xls.sheet_names
I tried xlrd before, but obtained similar results. This was my code with xlrd:
xls = xlrd.open_workbook(path)
sheets = xls.sheet_names
So, can anybody suggest a faster way to retrieve the sheet names from an Excel file than reading the whole file?
you can use the xlrd library and open the workbook with the "on_demand=True" flag, so that the sheets won't be loaded automaticaly.
Than you can retrieve the sheet names in a similar way to pandas:
import xlrd
xls = xlrd.open_workbook(r'<path_to_your_excel_file>', on_demand=True)
print xls.sheet_names() # <- remeber: xlrd sheet_names is a function, not a property
As pandas
uses xlrd
for reading Excel sheets and passes all keyword arguments from pandas.ExcelFile
to xlrd.open_workbook
, pd.ExcelFile
accepts the on_demand
flag as well. Therefore, it is not necessary to load xlrd
separately:
xls = pandas.ExcelFile(path, on_demand = True)
sheets = xls.sheet_names