I've been going through a few tutorials on using neural networks for key points detection. I've noticed that for the inputs (images) it's very common to divide by 255 (normalizing to [0,1] since values fall between 0 and 255). But for the targets (X/Y) coordinates I've noticed it's more common to normalize to [-1,1]. Any reason for this disparity.
X = np.vstack(df['Image'].values) / 255. # scale pixel values to [0, 1]
y = (y - 48) / 48 # scale target coordinates to [-1, 1]
I think the most common for image normalization for neural network in general is to remove the mean of the image and dividing by its standard deviation
I think key points detection problems should not be too different.
It might be interesting to see the differences in performance. My guess is that removing mean and dividing by std ([-1,1]) will converge more quickly compared to a [0,1] normalization.
Because the bias in the model will be smaller and thus need less time to reach if they are initialised at 0.
According to me, technically there should not be much of a difference on how you are normalising the values.
But these things matter in ML techniques.
Normalising the pixel range from (0 to 255 ) to (0 to 1) makes the convergence rate faster. Here you can do ranging between -1 and 1 as well. I have used this range in lot of problems. And there are no as such issues.
But for the output it is little tricky. Using range 0 to 1 is not a better idea because of the activation function you are using. ReLU is max(0, x) which works better when you provide negative values as well. That is the whole point of relu. Also tanh ranges values between -1 and 1. The only choice you are left with is to use sigmoid function which does not perform that well compared to relu and tanh functions. The problem with sigmoid is of vanishing gradient and it is not zero centered which gives somewhat zig zagged gradient updates for the weights. You can look for it here.