theano gradient with respect to matrix row

As the question suggests, I would like to compute the gradient with respect to a matrix row. In code:

import numpy.random as rng
import theano.tensor as T
from theano import function

t_x = T.matrix('X')
t_w = T.matrix('W')
t_y = T.dot(t_x, t_w.T)

t_g = T.grad(t_y[0,0], t_x[0])   # my wish, but DisconnectedInputError
t_g = T.grad(t_y[0,0], t_x)      # no problems, but a lot of unnecessary zeros

f = function([t_x, t_w], [t_y, t_g])
y,g = f(rng.randn(2,5), rng.randn(7,5))

As the comments indicate, the code works without any problems when I compute the gradient with respect to the entire matrix. In this case the gradient is correctly computed, but the problem is that the result has only non-zero entries in row 0 (because other rows of x obviously do not appear in the equations for the first row of y).

I have found this question, suggesting to store all rows of the matrix in separate variables and build graphs from these variables. In my setting though, I have no idea how much rows might be in X.

Would anybody have an idea how to get the gradient with respect to a single row of a matrix or how I could omit the extra zeros in the output? If anybody would have suggestions how an arbitrary amount of vectors can be stacked, that should work as well, I guess.

I realised that it is possible to get rid of the zeros when computing derivatives with respect to the entries in row i:

t_g = T.grad(t_y[i,0], t_x)[i]

and for computing the Jacobian, I found out that

t_g = T.jacobian(t_y[i], t_x)[:,i]

does the trick. However it seems to have a rather heavy impact on computation speed.

It would also be possible to approach this problem mathematically. The Jacobian of the matrix multiplication t_y w.r.t. t_x is simply the transpose of t_w.T, which is t_w in this case (the transpose of the transpose is the original matrix). Thus, the computation would be as simple as