Very large loss values when training multiple regr

I was trying to build a multiple regression model to predict housing prices using the following features:

  [bedrooms  bathrooms  sqft_living  view  grade]
= [0.09375   0.266667     0.149582   0.0    0.6]

I have standardized and scaled the features using sklearn.preprocessing.MinMaxScaler.

I used Keras to build the model:

def build_model(X_train):
    model = Sequential()
    model.add(Dense(5, activation = 'relu', input_shape = X_train.shape[1:]))
        model.add(Dense(1))

    optimizer = Adam(lr = 0.001)

    model.compile(loss = 'mean_squared_error', optimizer = optimizer)

    return model

When I go to train the model, my loss values are insanely high, something like 4 or 40 trillion and it will only go down about a million per epoch making training infeasibly slow. At first I tried increasing the learning rate, but it didn't help much. Then I did some searching and found that others have used a log-MSE loss function so I tried it and my model seemed to work fine. (Started at 140 loss, went down to 0.2 after 400 epochs)

My question is do I always just use log-MSE when I see very large MSE values for linear/multiple regression problems? Or are there other things i can do to try and fix this issue?

A guess as to why this issue occurred is the scale between my predictor and response variables were vastly different. X's are between 0-1 while the highest Y went up to 8 million. (Am I suppose to scale down my Y's? And then scale back up for predicting?)

A lot of people believe in scaling everything. If your y goes up to 8 million, I'd scale it, yes, and reverse the scaling when you get predictions out, later.

Don't worry too much about specifically what loss number you see. Sure, 40 trillion is a bit ridiculously high, indicating changes may need to be made to the network architecture or parameters. The main concern is whether the validation loss is actually decreasing, and the network actually learning therewith. If, as you say, it 'went down to 0.2 after 400 epochs', then it sounds like you're on the right track.

There are many other loss functions besides log-mse, mse, and mae, for regression problems. Have a look at these. Hope that helps!