huge_list
parameter is something like [[12,12,14],[43,356,23]]
. And my code to convert list to set is:
cpdef list_to_set(list huge_list):
cdef list ids
cdef list final_ids=[]
for ids in huge_list:
final_ids.append(set(ids))
return final_ids
I have 2800 list elements, each has 30,000 id. It takes around 19 second. How to improve performance?
EDIT 1:
Instead of set
I used unique
in numpy
as below and numpy
speeds up by ~7 seconds:
df['ids'] = df['ids'].apply(lambda x: numpy.unique(x))
Now it takes 14 seconds (Previously it was ~20 seconds). I don't think this time is acceptable yet. :|