I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this:
import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', skip_cols=['col_a', 'col_b',...,'col_zz'])
There is nothing related to this in the documentation. is there any workaround for this?
If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:
This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.
You can use the following technique:
and then