I am a bit struggled with so many int
data types in cython.
np.int, np.int_, np.int_t, int
I guess int
in pure python is equivalent to np.int_
, then where does np.int
come from? I cannot find the document from numpy? Also, why does np.int_
exist given we do already have int
?
In cython, I guess int
becomes a C type when used as cdef int
or ndarray[int]
, and when used as int()
it stays as the python caster?
Is np.int_
equivalent to long
in C? so cdef long
is the identical to cdef np.int_
?
Under what circumstances should I use np.int_t
instead of np.int
? e.g. cdef np.int_t
, ndarray[np.int_t]
...
Can someone briefly explain how the wrong use of those types would affect the performance of compiled cython code?
It's a bit complicated because the names have different meanings depending on the context.
int
In Python
The
int
is normally just a Python type, it's of arbitrary precision, meaning that you can store any conceivable integer inside it (as long as you have enough memory).However, when you use it as
dtype
for a NumPy array it will be interpreted asnp.int_
1. Which is not of arbitrary precision, it will have the same size as C'slong
:That also means the following two are equivalent:
As Cython type identifier it has another meaning, here it stands for the c type
int
. It's of limited precision (typically 32bits). You can use it as Cython type, for example when defining variables withcdef
:As return value or argument value for
cdef
orcpdef
functions:As "generic" for
ndarray
:For type casting:
And probably many more.
In Cython but as Python type. You can still call
int
and you'll get a "Python int" (of arbitrary precision), or use it forisinstance
or asdtype
argument fornp.array
. Here the context is important, so converting to a Pythonint
is different from converting to a C int:np.int
Actually this is very easy. It's just an alias for
int
:So everything from above applies to
np.int
as well. However you can't use it as a type-identifier except when you use it on thecimport
ed package. In that case it represents the Python integer type.This will expect
obj
to be a Python integer not a NumPy type:My advise regarding
np.int
: Avoid it whenever possible. In Python code it's equivalent toint
and in Cython code it's also equivalent to Pythonsint
but if used as type-identifier it will probably confuse you and everyone who reads the code! It certainly confused me...np.int_
Actually it only has one meaning: It's a Python type that represents a scalar NumPy type. You use it like Pythons
int
:Or you use it to specify the
dtype
, for example withnp.array
:But you cannot use it as type-identifier in Cython.
cnp.int_t
It's the type-identifier version for
np.int_
. That means you can't use it as dtype argument. But you can use it as type forcdef
declarations:This example (hopefully) shows that the type-identifier with the trailing
_t
actually represents the type of an array using the dtype without the trailingt
. You can't interchange them in Cython code!Notes
There are several more numeric types in NumPy I'll include a list containing the NumPy dtype and Cython type-identifier and the C type identifier that could also be used in Cython here. But it's basically taken from the NumPy documentation and the Cython NumPy
pxd
file:Actually there are Cython types for
np.bool_
:cnp.npy_bool
andbint
but both they can't be used for NumPy arrays currently. For scalarscnp.npy_bool
will just be an unsigned integer whilebint
will be a boolean. Not sure what's going on there...1 Taken From the NumPy documentation "Data type objects"
np.int_
is the default integer type (as defined in the NumPy docs), on a 64bit system this would be aC long
.np.intc
is the defaultC int
eitherint32
orint64
.np.int
is an alias to the built-inint
functionThe cython datatypes should reflect
C
datatypes, socdef int a
is aC int
and so on.As for
np.int_t
that is theCython
compile time equivalent of the NumPynp.int_
datatype,np.int64_t
is theCython
compile time equivalent ofnp.int64