-->

MXNet: nn.Activation vs nd.relu?

2019-07-19 02:46发布

问题:

I am new to MXNet (I am using it in Python3)

Their tutorial series encourages you define your own gluon blocks.

So lets say this is your block (a common convolution structure):

class CNN1D(mx.gluon.Block):
    def __init__(self, **kwargs):
        super(CNN1D, self).__init__(**kwargs)
        with self.name_scope():
            self.cnn = mx.gluon.nn.Conv1D(10, 1)
            self.bn = mx.gluon.nn.BatchNorm()
            self.ramp = mx.gluon.nn.Activation(activation='relu')

    def forward(self, x):
        x = mx.nd.relu(self.cnn(x))
        x = mx.nd.relu(self.bn(x))
        x = mx.nd.relu(self.ramp(x))
        return x

This is mirror the structure of their example. What is the difference of mx.nd.relu vs mx.gluon.nn.Activation?

Should it be

x = self.ramp(x)

instead of

x = mx.nd.relu(self.ramp(x))

回答1:

It appears that

mx.gluon.nn.Activation(activation=<act>)

is a wrapper for calling a host of the underlying activations from the NDArray module.

Thus - in principle - it does not matter if in the forward definition one uses

x = self.ramp(x)

or

x = mx.nd.relu(x)

or

x = mx.nd.relu(self.ramp(x))

as relu is simply taking the max of 0 and the passed value (so multiple applications will not affect the value any more than a single call besides from a slight runtime duration increase).

Thus in this case it doesnt really matter. Of course with other activation functions stacking multiple calls might have an impact.

In MXNets documentation they use nd.relu in the forward definition when defining gluon.Blocks. This might carry slightly less overhead than using mx.gluon.nn.Activation(activation='relu').

Flavor-wise the gluon module is meant to be the high level abstraction. Therefore I am of the opinion that when defining a block one should use ramp = mx.gluon.nn.Activation(activation=<act>) instead of nd.<act>(x) and then call self.ramp(x) in the forward definition.

However given that at this point all custom Block tutorials / documentation stick to relu activation, whether or not this will have lasting consequences is yet to be seen.

All together the use of mx.gluon.nn.Activation seems to be a way to call activation functions from the NDArray module from the Gluon module.



回答2:

mx.gluon.nn.Activation wraps around mx.ndarray.Activation, see Gluon source code.

However, when using Gluon to build a neural net, it is recommended that you use the Gluon API and not branch off to use the lower level MXNet API arbitrarily - which may have issues as Gluon evolves and potentially change (e.g. stop using mx.nd under the hood).