As a follow up of this question here (thanks MSeifert for your help) I came up with the problem that I have to mask a numpy array new_values
with an index array new_vals_idx
before passing the masked array to update val_dict
.
To the proposed solutions in answer of MSeifert in the old post I tried to apply the array masking, but the performance is not satisfying.
The arrays and dicts I used for the following examples are:
import numpy as np
val_dict = {'a': 5.0, 'b': 18.8, 'c': -55/2}
for i in range(200):
val_dict[str(i)] = i
val_dict[i] = i**2
keys = ('b', 123, '89', 'c') # dict keys to update
new_values = np.arange(1, 51, 1) / 1.0 # array with new values which has to be masked
new_vals_idx = np.array((0, 3, 5, -1)) # masking array
valarr = np.zeros((new_vals_idx.shape[0])) # preallocation for masked array
length = new_vals_idx.shape[0]
To make my code-snippets easier to compare with my old question, I'll stick to the function naming of MSeifert's answer. These are my tries to get the best performance out of python/cython (the other answers were left out because of too poor performance):
def old_for(val_dict, keys, new_values, new_vals_idx, length):
for i in range(length):
val_dict[keys[i]] = new_values[new_vals_idx[i]]
%timeit old_for(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.6 µs per loop
def old_for_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length):
valarr = new_values[new_vals_idx]
for i in range(length):
val_dict[keys[i]] = valarr[i]
%timeit old_for_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length)
# 100000 loops, best of 3: 2.33 µs per loop
def new2_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length):
valarr = new_values[new_vals_idx].tolist()
for key, val in zip(keys, valarr):
val_dict[key] = val
%timeit new2_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length)
# 100000 loops, best of 3: 2.01 µs per loop
Cython functions:
%load_ext cython
%%cython
import numpy as np
cimport numpy as np
cpdef new3_cy(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
cdef Py_ssize_t i
cdef double val # this gives about 10 µs speed boost compared to directly assigning it to val_dict
for i in range(length):
val = new_values[new_vals_idx[i]]
val_dict[keys[i]] = val
%timeit new3_cy(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.38 µs per loop
cpdef new3_cy_mview(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
cdef Py_ssize_t i
cdef int[:] mview_idx = new_vals_idx
cdef double [:] mview_vals = new_values
for i in range(length):
val_dict[keys[i]] = mview_vals[mview_idx[i]]
%timeit new3_cy_mview(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.38 µs per loop
# NOT WORKING:
cpdef new2_cy_mview(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
cdef double [new_vals_idx] masked_vals = new_values
for key, val in zip(keys, masked_vals.tolist()):
val_dict[key] = val
cpdef new2_cy_mask(dict val_dict, tuple keys, double[:] new_values, valarr, int[:] new_vals_idx, Py_ssize_t length):
valarr = new_values[new_vals_idx]
for key, val in zip(keys, valarr.tolist()):
val_dict[key] = val
The Cython functions new3_cy
and new3_cy_mview
do not seem to be considerably faster than old_for
. Passing valarr
to avoid array construction inside the function (as it is going to be called several million times) even seems to slow it down.
Masking in new2_cy_mask
with the new_vals_idx
array in Cython gives me the error: 'Invalid index for memoryview specified, type int[:]'. Is there any type like Py_ssize_t
for arrays of indexes?
Trying to create a masked memoryview in new2_cy_mview
gives me the error 'Cannot assign type 'double[:]' to 'double [__pyx_v_new_vals_idx]''. Is there even something like masked memoryviews? I wasn't able to find information on this topic...
Comparing the timing results with those from my old question I guess that the array masking is the process taking up most of the time. And as it is most likely already highly optimized in numpy, there is probably not much to do. But the slow-down is so huge, that there must be (hopefully) a better way to do it.
Any help is appreciated! Thanks in advance!