I am new to MXNet (I am using it in Python3)
Their tutorial series encourages you define your own gluon
blocks.
So lets say this is your block (a common convolution structure):
class CNN1D(mx.gluon.Block):
def __init__(self, **kwargs):
super(CNN1D, self).__init__(**kwargs)
with self.name_scope():
self.cnn = mx.gluon.nn.Conv1D(10, 1)
self.bn = mx.gluon.nn.BatchNorm()
self.ramp = mx.gluon.nn.Activation(activation='relu')
def forward(self, x):
x = mx.nd.relu(self.cnn(x))
x = mx.nd.relu(self.bn(x))
x = mx.nd.relu(self.ramp(x))
return x
This is mirror the structure of their example.
What is the difference of mx.nd.relu
vs mx.gluon.nn.Activation
?
Should it be
x = self.ramp(x)
instead of
x = mx.nd.relu(self.ramp(x))
mx.gluon.nn.Activation wraps around mx.ndarray.Activation, see Gluon source code.
However, when using Gluon to build a neural net, it is recommended that you use the Gluon API and not branch off to use the lower level MXNet API arbitrarily - which may have issues as Gluon evolves and potentially change (e.g. stop using mx.nd under the hood).
It appears that
is a wrapper for calling a host of the underlying activations from the
NDArray
module.Thus - in principle - it does not matter if in the forward definition one uses
or
or
as relu is simply taking the max of 0 and the passed value (so multiple applications will not affect the value any more than a single call besides from a slight runtime duration increase).
Thus in this case it doesnt really matter. Of course with other activation functions stacking multiple calls might have an impact.
In MXNets documentation they use
nd.relu
in the forward definition when defininggluon.Block
s. This might carry slightly less overhead than usingmx.gluon.nn.Activation(activation='relu')
.Flavor-wise the
gluon
module is meant to be the high level abstraction. Therefore I am of the opinion that when defining a block one should useramp = mx.gluon.nn.Activation(activation=<act>)
instead ofnd.<act>(x)
and then callself.ramp(x)
in the forward definition.However given that at this point all custom
Block
tutorials / documentation stick torelu
activation, whether or not this will have lasting consequences is yet to be seen.All together the use of
mx.gluon.nn.Activation
seems to be a way to call activation functions from theNDArray
module from theGluon
module.