问题:

huge_list parameter is something like [[12,12,14],[43,356,23]]. And my code to convert list to set is:

cpdef list_to_set(list huge_list):
    cdef list ids
    cdef list final_ids=[]
    for ids in huge_list:
        final_ids.append(set(ids))

    return final_ids

I have 2800 list elements, each has 30,000 id. It takes around 19 second. How to improve performance?

EDIT 1:
Instead of set I used unique in numpy as below and numpy speeds up by ~7 seconds:

df['ids'] = df['ids'].apply(lambda x: numpy.unique(x))

Now it takes 14 seconds (Previously it was ~20 seconds). I don't think this time is acceptable yet. :|

回答1:

Cython cannot speed up anything. The most time is spent building sets, e.g. calculating hash values of your elements and storing them in maps. This is already done in C, so no speed up possible. The pure python version:

final_ids = [set(ids) for ids in huge_list]

whould lead to the same result.

回答2:

If you just want to convert the nested lists to set you can simply use map function :

final_ids=map(set,huge_list)

why converting list into set in Cython takes so mu

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮