Activation function for output layer for regressio

2020-03-01 03:36发布

I have been experimenting with neural networks these days. I have come across a general question regarding the activation function to use. This might be a well known fact to but I couldn't understand properly. A lot of the examples and papers I have seen are working on classification problems and they either use sigmoid (in binary case) or softmax (in multi-class case) as the activation function in the out put layer and it makes sense. But I haven't seen any activation function used in the output layer of a regression model.

So my question is that is it by choice we don't use any activation function in the output layer of a regression model as we don't want the activation function to limit or put restrictions on the value. The output value can be any number and as big as thousands so the activation function like sigmoid to tanh won't make sense. Or is there any other reason? Or we actually can use some activation function which are made for these kind of problems?

3条回答
The star\"
2楼-- · 2020-03-01 04:15

Just use a linear activation function without limiting the output value range unless you have some reasonable assumption about it.

查看更多
祖国的老花朵
3楼-- · 2020-03-01 04:22

for linear regression type of problem, you can simply create the Output layer without any activation function as we are interested in numerical values without any transformation.

more info :

https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/

for classification : You can use sigmoid, tanh, Softmax etc.

查看更多
走好不送
4楼-- · 2020-03-01 04:38

If you have, say, a Sigmoid as an activation function in output layer of your NN you will never get any value less than 0 and greater than 1.

Basically if the data your're trying to predict are distributed within that range you might approach with a Sigmoid function and test if your prediction performs well on your training set.

Even more general, when predict a data you should come up with the function that represents your data in the most effective way.

Hence if your real data does not fit Sigmoid function well you have to think of any other function (e.g. some polynomial function, or periodic function or any other or a combination of them) but you also should always care of how easily you will build your cost function and evaluate derivatives.

查看更多
登录 后发表回答