huge_list
parameter is something like [[12,12,14],[43,356,23]]
. And my code to convert list to set is:
cpdef list_to_set(list huge_list):
cdef list ids
cdef list final_ids=[]
for ids in huge_list:
final_ids.append(set(ids))
return final_ids
I have 2800 list elements, each has 30,000 id. It takes around 19 second. How to improve performance?
EDIT 1:
Instead of set
I used unique
in numpy
as below and numpy
speeds up by ~7 seconds:
df['ids'] = df['ids'].apply(lambda x: numpy.unique(x))
Now it takes 14 seconds (Previously it was ~20 seconds). I don't think this time is acceptable yet. :|
Cython cannot speed up anything. The most time is spent building sets, e.g. calculating hash values of your elements and storing them in maps. This is already done in C, so no speed up possible. The pure python version:
whould lead to the same result.
If you just want to convert the nested lists to set you can simply use
map
function :