I have this code to create a swarmplot from data from a DataFrame:
df = pd.DataFrame({"Refined__Some_ID":some_id_list,
"Refined_Age":age_list,
"Name":name_list
}
)
#Creating dataframe with strings from the lists
select = df.apply(lambda row : any([isinstance(e, str) for e in row ]),axis=1)
#Exlcluding data from select in a new dataframe
dfAnalysis = df[~select]
dfAnalysis['Refined_Age'].replace('', np.nan, inplace=True)
dfAnalysis = dfAnalysis.dropna()
dfAnalysis['Refined_Age'] = dfAnalysis['Refined_Age'].apply(int)
# print dfAnalysis
print type(dfAnalysis['Refined_Patient_Age'][1])
g = sns.swarmplot(x = dfAnalysis['Refined_ID'],y = dfAnalysis['Refined_Age'], hue = dfAnalysis['Name'], orient="v")
g.set_xticklabels(g.get_xticklabels(),rotation=30)
# print g
It's taking a crazy amount of time to run (14 hours and counting!). How can I speed it up? Also, why is the code so slow in the first place?
The 3 lists being included in the dataframe are from a Couchdb database with about 320k documents.
UPDATE 1
I had intended to view the first 20 categories only but excluded the code to do so.
The line should have been:
x = dfAnalysis['Refined_ID'].iloc[:20]