why converting list into set in Cython takes so mu

2019-09-13 23:09发布

问题:

huge_list parameter is something like [[12,12,14],[43,356,23]]. And my code to convert list to set is:

cpdef list_to_set(list huge_list):
    cdef list ids
    cdef list final_ids=[]
    for ids in huge_list:
        final_ids.append(set(ids))

    return final_ids

I have 2800 list elements, each has 30,000 id. It takes around 19 second. How to improve performance?


EDIT 1:
Instead of set I used unique in numpy as below and numpy speeds up by ~7 seconds:

df['ids'] = df['ids'].apply(lambda x: numpy.unique(x))

Now it takes 14 seconds (Previously it was ~20 seconds). I don't think this time is acceptable yet. :|

回答1:

Cython cannot speed up anything. The most time is spent building sets, e.g. calculating hash values of your elements and storing them in maps. This is already done in C, so no speed up possible. The pure python version:

final_ids = [set(ids) for ids in huge_list]

whould lead to the same result.



回答2:

If you just want to convert the nested lists to set you can simply use map function :

final_ids=map(set,huge_list)