I'm trying to develop a small Convolutional Neural Network framework with python. The code for the convolutional node already works (slowly) and I would like to speed it up. The hotspots are the loops where the convolutional filter is moved across the image. I chose to use cython to speed up those loops.
The obvious small annotations, cdef for all local variables and removing boundscheck, shaved hardly 10% off of my runtime. That seemed strange to me, based on what I read online, cython should already be able to do its magic.
Unfortunately the code is inside a class and relies heavily on the properties of that class. I decided to convert it into a cdef class. This means that all class attributes have to be declared with cdef. Apparently cython doesn't support numpy arrays, so I declared all numpy arrays as double[:,:,...]
So far the code worked fine, all unittests passing. Now the compilation to .pyd (I'm working under windows) still works. But running the code creates a Typeerror:
TypeError: only length-1 arrays can be converted to Python scalars
Here is some code. This is the entire forward method of my convolutional node, which might be too much and not easily readable. You probably only need the very last line. That's were the error happens:
@cython.boundscheck(False)
@cython.nonecheck(False)
def forward(self):
# im2col: x -> in_cols
# padding
cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((self.batch_size, self.in_colors, self.in_width + self.padding*2, self.in_height + self.padding*2))
if self.padding>0:
x_padded[:, :, self.padding:self.in_width+self.padding, self.padding:self.in_height+self.padding] = self.x
else:
x_padded[:]=self.x
# allocating new field
cdef np.ndarray[DTYPE_t, ndim=4] rec_fields = np.empty((self.filter_size**2* self.in_colors, self.batch_size, self.out_width, self.out_height))
# copying receptive fields
cdef int w,h
for w, h in np.ndindex((self.out_width, self.out_height)):
rec_fields[:, :, w, h] = x_padded[:, :, w*self.stride:w*self.stride + self.filter_size, h*self.stride:h*self.stride + self.filter_size] \
.reshape((self.batch_size, self.filter_size**2* self.in_colors)) \
.T
self.in_cols = rec_fields.reshape((self.filter_size**2 * self.in_colors, self.batch_size * self.out_width * self.out_height))
# linear node: in_cols -> out_cols
cdef np.ndarray[DTYPE_t, ndim=2] out_cols=np.dot(self.W,self.in_cols)+self.b
# col2im: out_cols -> out_image -> y
cdef np.ndarray[DTYPE_t, ndim=4] out_image = out_cols.reshape((self.out_colors, self.batch_size, self.out_width, self.out_height))
self.y[:] = out_image.transpose(1, 0, 2, 3)
This last call to transpose is marked in the exception. I can't explain this. Do memoryviews behave differently when transposed ?
UPDATE:
I'm sure that the dimensions are defined correctly. If there is a dimension mismatch, it produces a different runtime error. Can't check right now, but it was something like "got 4-dim, expected 2-dim". I've got to say that I'm extremely impressed by the type system of cython. This kind of runtime type information in a python exception is rather useful. Sadly it doesn't explain why the transpose above fails.
UPDATE:
There's some complication with the arrays: They must not be overwritten, only be used as references.
A little difficult to explain: At the core of the neural network is a loop which calls the method forward() on all nodes in the network consecutively.
for node in self.nodes:
node.forward()
In this method the node looks at its input data, makes some computations and writes to its output. It relies on the fact that the input already contains the correct data.
For the setup of my network I store the nodes in the right order. And I connect them manually.
node2.x=node1.y
Now if I write
self.y[:]= data
in the forward method of node1, node2 automatically has the correct input. This requires careful programming: the forward methods must be called in the right order and the output must never be overwritten, only written to.
The alternative would be a huge structure where I store the output of each node and pass this data around. That would create lots of boilerplate code and mess up the forward and backward pass.
UPDATE:
the last few lines in forward now look like this:
cdef np.ndarray[DTYPE_t, ndim=4] out_image = out_cols.reshape((self.out_colors, self.batch_size, self.out_width, self.out_height))
cdef double[:,:,:,:] temp
temp=out_image.transpose(1,0,2,3)
self.y[...] = temp
The assignment to temp fails with the same TypeError message.