Theano max_pool_3d

2019-09-02 14:55发布

问题:

How do I extend theanos downsample.max_pool_2d_same_size in order to pool not only within a feature map, but also between those - in a efficient manner?

Lets say i got 3 feature maps, each of size 10x10, that would be a 4D Tensor (1,3,10,10). First lets max pool ((2,2), no overlapping) each of the (10,10) feature map. The results are 3 sparse feature maps, still (10,10) but most values equal to zero: within a (2,2) window is at most one value greater than zero. This is what downsample.max_pool_2d_same_size does.

Next, i want to compare every maximum of a certain (2,2) window to all other maxima of all other feature maps of the window at the same position. I want to keep only the maxima across all of the feature maps. The results are again 3 feature maps (10,10), with nearly all of the values being zero.

Is there a fast way of doing so? I wouldn't mind other max_pooling functions, but i need the exact locations of the maxima for pooling/unpooling purposes (but that's another topic).

回答1:

I solved it using lasagne with cudnn. Here are some minimal examples of how to get the indices of a max pooling operation (2d and 3d). See https://groups.google.com/forum/#!topic/lasagne-users/BhtKsRmFei4

import numpy as np
import theano
import theano.tensor as T
from theano.tensor.type import TensorType
from theano.configparser import config
import lasagne

def tensor5(name=None, dtype=None):
    if dtype is None:
        dtype = config.floatX
    type = TensorType(dtype, (False, False, False, False, False))
    return type(name)

def max_pooling_2d():
    input_var = T.tensor4('input')
    input_layer = lasagne.layers.InputLayer(shape=(None, 2, 4, 4), input_var=input_var)
    max_pool_layer = lasagne.layers.MaxPool2DLayer(input_layer, pool_size=(2, 2))

    pool_in, pool_out = lasagne.layers.get_output([input_layer, max_pool_layer])
    indices = T.grad(None, wrt=pool_in, known_grads={pool_out: T.ones_like(pool_out)})
    get_indices_fn = theano.function([input_var], indices,allow_input_downcast=True)

    data = np.random.randint(low=0, high=9, size=32).reshape((1,2,4,4))
    indices = get_indices_fn(data)
    print data, "\n\n", indices

def max_pooling_3d():
    input_var = tensor5('input')
    input_layer = lasagne.layers.InputLayer(shape=(1, 1, 2, 4, 4), input_var=input_var)
    # 5 input dimensions: (batchsize, channels, 3 spatial dimensions)
    max_pool_layer = lasagne.layers.dnn.MaxPool3DDNNLayer(input_layer, pool_size=(2, 2, 2))

    pool_in, pool_out = lasagne.layers.get_output([input_layer, max_pool_layer])
    indices = T.grad(None, wrt=pool_in, known_grads={pool_out: T.ones_like(pool_out)})
    get_indices_fn = theano.function([input_var], indices,allow_input_downcast=True)

    data = np.random.randint(low=0, high=9, size=32).reshape((1,1,2,4,4))
    indices = get_indices_fn(data)
    print data, "\n\n", indices