I have a large SPSS-file (containing a little over 1 million records, with a little under 150 columns) that I want to convert to a Pandas DataFrame.
It takes a few minutes to convert the file to a list, than another couple of minutes to convert it to a dataframe, than another few minutes to set the columnheaders.
Are there any optimizations possible, that I'm missing?
import pandas as pd
import numpy as np
import savReaderWriter as spss
raw_data = spss.SavReader('largefile.sav', returnHeader = True) # This is fast
raw_data_list = list(raw_data) # this is slow
data = pd.DataFrame(raw_data_list) # this is slow
data = data.rename(columns=data.loc[0]).iloc[1:] # setting columnheaders, this is slow too.