Tensorflow custom activation function

2019-08-17 23:46发布

I implemented a network with TensorFlow and created the model doing the following in my code:

def multilayer_perceptron(x, weights, biases):
    layer_1 = tf.add(tf.matmul(x, weights["h1"]), biases["b1"])
    layer_1 = tf.nn.relu(layer_1)
    out_layer = tf.add(tf.matmul(layer_1, weights["out"]), biases["out"])
    return out_layer

I initialize the weights and the biases doing:

weights = {
    "h": tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    "out": tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
    }

biases = {
    "b": tf.Variable(tf.random_normal([n_hidden_1])),
    "out": tf.Variable(tf.random_normal([n_classes]))
    }

Now I want to use a custom activation function. Therefore I replaced tf.nn.relu(layer_1) with a custom activation function custom_sigmoid(layer_1) which is defined as:

def custom_sigmoid(x):
    beta = tf.Variable(tf.random.normal(x.get_shape[1]))
    return tf.sigmoid(beta*x)

Where beta is a trainable parameter. I realized that this can not work since I don't know how to implement the derivative such that TensorFlow can use it.

Question: How can I use a custom activation function in TensorFlow? I would really appreciate any help.

2条回答
时光不老,我们不散
2楼-- · 2019-08-18 00:11

I try to answer my own question. Here is what I did and what seems to work:

First I define a custom activation function:

def custom_sigmoid(x, beta_weights):
    return tf.sigmoid(beta_weights*x)

Then I create weights for the activation function:

beta_weights = {
    "beta1": tf.Variable(tf.random_normal([n_hidden_1]))
    }

Finally I add beta_weights to my model function and replace the activation function in multilayer_perceptron():

def multilayer_perceptron(x, weights, biases, beta_weights):
    layer_1 = tf.add(tf.matmul(x, weights["h1"]), biases["b1"])
    #layer_1 = tf.nn.relu(layer_1) # Old
    layer_1 = custom_sigmoid(x, beta_weights["beta1"]) # New
    out_layer = tf.add(tf.matmul(layer_1, weights["out"]), biases["out"])
    return out_layer
查看更多
一纸荒年 Trace。
3楼-- · 2019-08-18 00:17

That's the beauty of automatic differentiation! You don't need to know how to compute the derivative of your function as long as you use all tensorflow constructs that are inherently differentiable (there are some functions that simply are non-differentiable functions in tensorflow).

For everything else the derivative is computed for you by tensorflow, any combination of operations that are inherently differentiable can be used and you never need to think about the gradient. Validate it by using tf.graidents in a test case to show that tensorflow is computing the gradient with respect to your cost function.

Here's a really nice explanation of automatic differentiation for the curious:

https://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/

You can make sure that beta is a trainable parameter by checking that it exists in the collection tf.GraphKeys.TRAINABLE_VARIABLES, this means that the optimizer will compute its derivative w.r.t. the cost and update it (if it's not in that collection you should investigate).

查看更多
登录 后发表回答