Tensorflow mean squared error loss function

I have seen a few different mean squared error loss functions in various posts for regression models in Tensorflow:

loss = tf.reduce_sum(tf.pow(prediction - Y,2))/(n_instances)
loss = tf.reduce_mean(tf.squared_difference(prediction, Y))
loss = tf.nn.l2_loss(prediction - Y)

What are the differences between these?

标签： python machine-learning tensorflow

2条回答

萌系小妹纸

2楼-- · 2020-05-15 15:23

The first and the second loss functions calculate the same thing, but in a slightly different way. The third function calculate something completely different. You can see this by executing this code:

import tensorflow as tf

shape_obj = (5, 5)
shape_obj = (100, 6, 12)
Y1 = tf.random_normal(shape=shape_obj)
Y2 = tf.random_normal(shape=shape_obj)

loss1 = tf.reduce_sum(tf.pow(Y1 - Y2, 2)) / (reduce(lambda x, y: x*y, shape_obj))
loss2 = tf.reduce_mean(tf.squared_difference(Y1, Y2))
loss3 = tf.nn.l2_loss(Y1 - Y2)

with tf.Session() as sess:
    print sess.run([loss1, loss2, loss3])
# when I run it I got: [2.0291963, 2.0291963, 7305.1069]

Now you can verify that 1-st and 2-nd calculates the same thing (in theory) by noticing that tf.pow(a - b, 2) is the same as tf.squared_difference(a - b, 2). Also reduce_mean is the same as reduce_sum / number_of_element. The thing is that computers can't calculate everything exactly. To see what numerical instabilities can do to your calculations take a look at this:

import tensorflow as tf

shape_obj = (5000, 5000, 10)
Y1 = tf.zeros(shape=shape_obj)
Y2 = tf.ones(shape=shape_obj)

loss1 = tf.reduce_sum(tf.pow(Y1 - Y2, 2)) / (reduce(lambda x, y: x*y, shape_obj))
loss2 = tf.reduce_mean(tf.squared_difference(Y1, Y2))

with tf.Session() as sess:
    print sess.run([loss1, loss2])

It is easy to see that the answer should be 1, but you will get something like this: [1.0, 0.26843545].

Regarding your last function, the documentation says that:

Computes half the L2 norm of a tensor without the sqrt: output = sum(t ** 2) / 2

So if you want it to calculate the same thing (in theory) as the first one you need to scale it appropriately:

loss3 = tf.nn.l2_loss(Y1 - Y2) * 2 / (reduce(lambda x, y: x*y, shape_obj))

0人赞添加讨论(0) 举报

时光不老，我们不散

3楼-- · 2020-05-15 15:35

I would say that the third equation is different, while the 1st and 2nd are formally the same but behave differently due to numerical concerns.

I think that the 3rd equation (using l2_loss) is just returning 1/2 of the squared Euclidean norm, that is, the sum of the element-wise square of the input, which is x=prediction-Y. You are not dividing by the number of samples anywhere. Thus, if you have a very large number of samples, the computation may overflow (returning Inf).

The other two are formally the same, computing the mean of the element-wise squared x tensor. However, while the documentation does not specify it explicitly, it is very likely that reduce_mean uses an algorithm adapted to avoid overflowing with very large number of samples. In other words, it likely does not try to sum everything first and then divide by N, but use some kind of rolling mean that can adapt to an arbitrary number of samples without necessarily causing an overflow.

0人赞添加讨论(0) 举报

Tensorflow mean squared error loss function

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间