How can I get weights converged in a way that MSE

2019-08-31 00:03发布

here is my code

for _ in range(5):
    K.clear_session()
    model = Sequential()

    model.add(LSTM(256, input_shape=(None, 1)))
    model.add(Dropout(0.2))

    model.add(Dense(256))
    model.add(Dropout(0.2))

    model.add(Dense(1))

    model.compile(loss='mean_squared_error', optimizer='RmsProp', metrics=['accuracy'])
    hist = model.fit(x_train, y_train, epochs=20, batch_size=64, verbose=0, validation_data=(x_val, y_val))


    p = model.predict(x_test)
    print(mean_squared_error(y_test, p))


    plt.plot(y_test)
    plt.plot(p)
    plt.legend(['testY', 'p'], loc='upper right')
    plt.show()

Total params : 330,241 samples : 2264

and below is the result

1

I haven't changed anything.

I only ran for loop.

As you can see in the picture, the result of the MSE is huge, even though I have just run the for loop.

I think the fundamental reason for this problem is that the optimizer can not find global maximum and find local maximum and converge. The reason is that after checking all the loss graphs, the loss is no longer reduced significantly. (After 20 times) So in order to solve this problem, I have to find the global minimum. How should I do this?

I tried adjusting the number of batch_size, epoch. Also, I tried hidden layer size, LSTM unit, kerner_initializer addition, optimizer change, etc. but could not get any meaningful result.

I wonder how can I solve this problem.

Your valuable opinions and thoughts will be very much appreciated.

if you want to see full source here is link https://gist.github.com/Lay4U/e1fc7d036356575f4d0799cdcebed90e

2条回答
叛逆
2楼-- · 2019-08-31 00:23

If you want to always start from the same point you should set some seed. You can do it like this if you use Tensorflow backend in Keras:

from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)

If you want to learn why do you get different results in ML/DL models, I recommend this article.

查看更多
小情绪 Triste *
3楼-- · 2019-08-31 00:35

From your example, the problem simply comes from the fact that you have over 100 times more parameters than you have samples. If you reduce the size of your model, you will see less variance.

The wider question you are asking is actually very interesting that usually isn't covered in tutorials. Nearly all Machine Learning models are by nature stochastic, the output predictions will change slightly everytime you run it which means you will always have to ask the question: Which model do I deploy to production ?

Off the top of my head there are two things you can do:

  • Choose the first model trained on all the data (after cross-validation, ...)
  • Build an ensemble of models that all have the same hyper-parameters and implement a simple voting strategy

References:

查看更多
登录 后发表回答