I converted to cython a python function by just adding some types and compiling it.
I was getting small numerical differences between the results of the python and cython functions.
After some work I found that the differences came from accessing a numpy array using unsigned int instead of int.
I was using unsigned int indices to speed up access according to: http://docs.cython.org/src/userguide/numpy_tutorial.html#tuning-indexing-further
anyway I thought it was harmless to use unsigned ints.
See this code:
cpdef function(np.ndarray[np.float32_t, ndim=2] response, max_loc):
cdef unsigned int x, y
x, y = int(max_loc[0]), int(max_loc[1])
x2, y2 = int(max_loc[0]), int(max_loc[1])
print response[y,x], type(response[y,x]), response.dtype
print response[y2,x2], type(response[y2,x2]), response.dtype
print 2*(response[y,x] - min(response[y,x-1], response[y,x+1]))
print 2*(response[y2,x2] - min(response[y2,x2-1], response[y2,x2+1]))
prints:
0.959878861904 <type 'float'> float32
0.959879 <type 'numpy.float32'> float32
1.04306024313
1.04306030273
Why does this happen?!!! is it a bug?
Ok, as requested here is a SSCCE with the same types and values that I used in my original function
cpdef function():
cdef unsigned int x, y
max_loc2 = np.asarray([ 15., 25.], dtype=float)
cdef np.ndarray[np.float32_t, ndim=2] response2 = np.zeros((49,49), dtype=np.float32)
x, y = int(max_loc2[0]), int(max_loc2[1])
x2, y2 = int(max_loc2[0]), int(max_loc2[1])
response2[y,x] = 0.959878861904
response2[y,x-1] = 0.438348740339
response2[y,x+1] = 0.753262758255
print response2[y,x], type(response2[y,x]), response2.dtype
print response2[y2,x2], type(response2[y2,x2]), response2.dtype
print 2*(response2[y,x] - min(response2[y,x-1], response2[y,x+1]))
print 2*(response2[y2,x2] - min(response2[y2,x2-1], response2[y2,x2+1]))
prints
0.959878861904 <type 'float'> float32
0.959879 <type 'numpy.float32'> float32
1.04306024313
1.04306030273
I use python 2.7.3 cython 0.18 and msvc9 express
I modified the example in the question to make it simpler to read the generated C source for the module. I'm only interested in seeing the logic that creates Python float
objects instead of getting np.float32
objects from the response
array.
I'm using pyximport
to compile the extension module. It saves the generated C file in a subdirectory of ~/.pyxbld
(probably %userprofile%\.pyxbld
on Windows).
import numpy as np
import pyximport
pyximport.install(setup_args={'include_dirs': [np.get_include()]})
open('_tmp.pyx', 'w').write('''
cimport numpy as np
cpdef function(np.ndarray[np.float32_t, ndim=2] response, max_loc):
cdef unsigned int p_one, q_one
p_one = int(max_loc[0])
q_one = int(max_loc[1])
p_two = int(max_loc[0])
q_two = int(max_loc[1])
r_one = response[q_one, p_one]
r_two = response[q_two, p_two]
''')
import _tmp
assert(hasattr(_tmp, 'function'))
Here's the generated C code for the section of interest (a bit reformatted to make it easier to read). It turns out that when you use C unsigned int
index variables, the generated code grabs the data directly from the array buffer and calls PyFloat_FromDouble
, which coerces it to double
. On the other hand, when you use Python int
index variables, it takes the generic approach. It forms a tuple and calls PyObject_GetItem
. This way allows the ndarray
to correctly honor the np.float32
dtype.
#define __Pyx_BufPtrStrided2d(type, buf, i0, s0, i1, s1) \
(type)((char*)buf + i0 * s0 + i1 * s1)
/* "_tmp.pyx":9
* p_two = int(max_loc[0])
* q_two = int(max_loc[1])
* r_one = response[q_one, p_one] # <<<<<<<<<<<<<<
* r_two = response[q_two, p_two]
*/
__pyx_t_3 = __pyx_v_q_one;
__pyx_t_4 = __pyx_v_p_one;
__pyx_t_5 = -1;
if (unlikely(__pyx_t_3 >= (size_t)__pyx_bshape_0_response))
__pyx_t_5 = 0;
if (unlikely(__pyx_t_4 >= (size_t)__pyx_bshape_1_response))
__pyx_t_5 = 1;
if (unlikely(__pyx_t_5 != -1)) {
__Pyx_RaiseBufferIndexError(__pyx_t_5);
{
__pyx_filename = __pyx_f[0];
__pyx_lineno = 9;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;
}
}
__pyx_t_1 = PyFloat_FromDouble((
*__Pyx_BufPtrStrided2d(
__pyx_t_5numpy_float32_t *,
__pyx_bstruct_response.buf,
__pyx_t_3, __pyx_bstride_0_response,
__pyx_t_4, __pyx_bstride_1_response)));
if (unlikely(!__pyx_t_1)) {
__pyx_filename = __pyx_f[0];
__pyx_lineno = 9;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;
}
__Pyx_GOTREF(__pyx_t_1);
__pyx_v_r_one = __pyx_t_1;
__pyx_t_1 = 0;
/* "_tmp.pyx":10
* q_two = int(max_loc[1])
* r_one = response[q_one, p_one]
* r_two = response[q_two, p_two] # <<<<<<<<<<<<<<
*/
__pyx_t_1 = PyTuple_New(2);
if (unlikely(!__pyx_t_1)) {
__pyx_filename = __pyx_f[0];
__pyx_lineno = 10;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;
}
__Pyx_GOTREF(((PyObject *)__pyx_t_1));
__Pyx_INCREF(__pyx_v_q_two);
PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_v_q_two);
__Pyx_GIVEREF(__pyx_v_q_two);
__Pyx_INCREF(__pyx_v_p_two);
PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_v_p_two);
__Pyx_GIVEREF(__pyx_v_p_two);
__pyx_t_2 = PyObject_GetItem(
((PyObject *)__pyx_v_response),
((PyObject *)__pyx_t_1));
if (!__pyx_t_2) {
__pyx_filename = __pyx_f[0];
__pyx_lineno = 10;
__pyx_clineno = __LINE__;
goto __pyx_L1_error;
}
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(((PyObject *)__pyx_t_1));
__pyx_t_1 = 0;
__pyx_v_r_two = __pyx_t_2;
__pyx_t_2 = 0;
Playing around with this on my machine, I don't see a difference. I'm using the ipython notebook with cython magic:
In [1]:
%load_ext cythonmagic
In [12]:
%%cython
import numpy as np
cimport numpy as np
cpdef function(np.ndarray[np.float32_t, ndim=2] response, max_loc):
cdef unsigned int x, y
x, y = int(max_loc[0]), int(max_loc[1])
x2, y2 = int(max_loc[0]), int(max_loc[1])
#return 2*(response[y,x] - min(response[y,x-1], response[y,x+1])), 2*(response[y2,x2] - min(response[y2,x2-1], response[y2,x2+1]))
print response[y,x], type(response[y,x]), response.dtype
print response[y2,x2], type(response[y2,x2]), response.dtype
print 2*(response[y,x] - min(response[y,x-1], response[y,x+1]))
print 2*(response[y2,x2] - min(response[y2,x2-1], response[y2,x2+1]))
In [13]:
a = np.random.normal(size=(10,10)).astype(np.float32)
m = [3,2]
function(a,m)
0.586090564728 <type 'float'> float32
0.586091 <type 'numpy.float32'> float32
4.39655685425
4.39655685425
The first pair of results, the difference is just the output precision of the print statement. What version of Cython are you using? The indices are extremely unlikely to effect the answer since it is just accessing a fixed length of memory that the data attribute of the numpy array is storing.