In CNTK, it has relu, hardmax, softmax, sigmoid and all that good stuff, but I'm building a regression based algorithm and the final layer needs to predict 2 or more regression outputs. So I need n nodes, and the activation be just a run of the mill linear activation. I see I can set the activation to None, is that in fact the correct thing?
with cntk.layers.default_options(activation=cntk.ops.relu, pad=True):
z = cntk.models.Sequential([
cntk.models.LayerStack(2, lambda : [
cntk.layers.Convolution((3,3), 64),
cntk.layers.Convolution((3,3), 64),
cntk.layers.MaxPooling((3,3), (2,2))
]),
cntk.models.LayerStack(2, lambda i: [
cntk.layers.Dense([256,128][i]),
cntk.layers.Dropout(0.5)
]),
cntk.layers.Dense(4, activation=None)
])(feature_var)