Passing/Returning Cython Memoryviews vs NumPy Arra

I am writing Python code to accelerate a region properties function for labeled objects in a binary image. The following code will calculate the number of border pixels of a labeled object in a binary image given the indices of the object. The main() function will cycle through all labeled objects in a binary image 'mask' and calculate the number of border pixels for each one.

I am wondering what the best way is to pass or return my variables in this Cython code. The variables are either in NumPy arrays or typed Memoryviews. I've messed around with passing/returning the variables in the different formats, but cannot deduce what the best/most efficient way is. I am new to Cython so Memoryviews are still fairly abstract to me and whether there is a different between the two methods remains a mystery. The images I am working with contain 100,000+ labeled objects so operations such as these need to be fairly efficient.

To summarize:

When/should I pass/return my variables as typed Memoryviews rather than NumPy arrays for very repetitive computations? Is there a way that is best or are they exactly the same?

%%cython --annotate

import numpy as np
import cython
cimport numpy as np

DTYPE = np.intp
ctypedef np.intp_t DTYPE_t

@cython.boundscheck(False)
@cython.wraparound(False)
def erode(DTYPE_t [:,:] img):

    # Image dimensions
    cdef int height, width, local_min
    height = img.shape[0]
    width = img.shape[1]

    # Padded Array
    padded_np = np.zeros((height+2, width+2), dtype = DTYPE)
    cdef DTYPE_t[:,:] padded = padded_np
    padded[1:height+1,1:width+1] = img

    # Eroded image
    eroded_np = np.zeros((height,width),dtype=DTYPE)
    cdef DTYPE_t[:,:] eroded = eroded_np

    cdef DTYPE_t i,j
    for i in range(height):
        for j in range(width):
            local_min = min(padded[i+1,j+1], padded[i,j+1], padded[i+1,j],padded[i+1,j+2],padded[i+2,j+1])
            eroded[i,j] = local_min
    return eroded_np


@cython.boundscheck(False)
@cython.wraparound(False)
def border_image(slice_np):

    # Memoryview of slice_np
    cdef DTYPE_t [:,:] slice = slice_np

    # Image dimensions
    cdef Py_ssize_t ymax, xmax, y, x
    ymax = slice.shape[0]
    xmax = slice.shape[1]

    # Erode image
    eroded_image_np = erode(slice_np)
    cdef DTYPE_t[:,:] eroded_image = eroded_image_np

    # Border image
    border_image_np = np.zeros((ymax,xmax),dtype=DTYPE)
    cdef DTYPE_t[:,:] border_image = border_image_np
    for y in range(ymax):
        for x in range(xmax):
            border_image[y,x] = slice[y,x]-eroded_image[y,x]
    return border_image_np.sum()


@cython.boundscheck(False)
@cython.wraparound(False)
def main(DTYPE_t[:,:] mask, int numobjects, Py_ssize_t[:,:] indices):

    # Memoryview of boundary pixels
    boundary_pixels_np = np.zeros(numobjects,dtype=DTYPE)
    cdef DTYPE_t[:] boundary_pixels = boundary_pixels_np

    # Loop through each object
    cdef Py_ssize_t y_from, y_to, x_from, x_to, i
    cdef DTYPE_t[:,:] slice
    for i in range(numobjects):
        y_from = indices[i,0]
        y_to = indices[i,1]
        x_from = indices[i,2]
        x_to = indices[i,3]
        slice = mask[y_from:y_to, x_from:x_to]
        boundary_pixels[i] = border_image(slice)

    return boundary_pixels_np

Memoryviews are a more recent addition to Cython, designed to be an improvement compared to the original np.ndarray syntax. For this reason they're slightly preferred. It usually doesn't make too much difference which you use though. Here are a few notes:

Speed

For speed it makes very little difference - my experience is that memoryviews as function parameters are marginally slower, but it's hardly worth worrying about.

Generality

Memoryviews are designed to work with any type that has Python's buffer interface (for example the standard library array module). Typing as np.ndarray only works with numpy arrays. In principle memorviews can support an even wider range of memory layouts which can make interfacing with C code easier (in practice I've never actually seen this be useful).

As a return value

When returning an array from Cython to code Python the user will probably be happier with a numpy array than with a memoryview. If you're working with memoryviews you can do either:

return np.asarray(mview)
return mview.base

Ease of compiling

If you're using np.ndarray you have to get the set the include directory with np.get_include() in your setup.py file. You don't have to do this with memoryviews, which often means you can skip setup.py and just use the cythonize command line command or pyximport for simpler projects.

Parallelization

This is the big advantage of memoryviews compared to numpy arrays (if you want to use it). It does not require the global interpreter lock to take slices of a memoryview but it does for a numpy array. This means that the following code outline can work in parallel with a memoryview:

cdef void somefunc(double[:] x) nogil:
     # implementation goes here

cdef double[:,:] 2d_array = np.array(...)
for i in prange(2d_array.shape[0]):
    somefunc(2d_array[i,:])

If you aren't using Cython's parallel functionality this doesn't apply.

`cdef` classes

You can use memoryviews as attributes of cdef classes but not np.ndarrays. You can (of course) use numpy arrays as untyped object attributes instead.