Restore keras seq2seq model

2020-06-23 08:06发布

I'm working with the keras seq2seq example here: https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py

I would like to persist the vocabulary and decoder so that I can load it again later, and apply it to new sequences.

While the code calls model.save(), this is insufficient because I can see the decoding setup referencing a number of other variables which are deep pointers into the trained model:

encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

I would like to translate this code to determine encoder_inputs, encoder_states, latent_dim, decoder_inputs from a model loaded from disk. It's ok to assume I know the model architecture in advance. Is there a straightforward way to do this?

Update: I have made some progress using the decoder construction code and pulling out the layer inputs/outputs as needed.

encoder_inputs = model.input[0] #input_1
decoder_inputs = model.input[1] #input_2
encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output # lstm_1
_, state_h_dec, state_c_dec = model.layers[3].output # lstm_2
decoder_outputs = model.layers[4].output # dense_1

encoder_states = [state_h_enc, state_c_enc]
encoder_model = Model(encoder_inputs, encoder_states)

latent_dim = 256 # TODO: infer this from the model. Should match lstm_1 outputs.

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]

decoder_states = [state_h_dec, state_c_dec]

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

However, when I try to construct the decoder model, I encounter this error:

RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, ?, 96), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []

As a test I tried Model(decoder_inputs,decoder_outputs) with the same result. It's not clear to me what is disconnected from the graph, since these layers are loaded from the model.

2条回答
老娘就宠你
2楼-- · 2020-06-23 08:56

Ok, I solved this problem and the decoder is producing reasonable results. In my code above I missed a couple details in the decoder step, specifically that it call()s the LSTM and Dense layers in order to wire them up. In addition, the new decoder inputs need unique names so they don't collide with input_1 and input_2 (this detail smells like a keras bug).

encoder_inputs = model.input[0] #input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output # lstm_1
encoder_states = [state_h_enc, state_c_enc]
encoder_model = Model(encoder_inputs, encoder_states)

decoder_inputs = model.input[1] #input_2
decoder_state_input_h = Input(shape=(latent_dim,),name='input_3')
decoder_state_input_c = Input(shape=(latent_dim,),name='input_4')
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = model.layers[3]
decoder_outputs, state_h_dec, state_c_dec = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[4]
decoder_outputs=decoder_dense(decoder_outputs)

decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

A big drawback with this code is the fact we know the full architecture in advance. I would like to eventually be able to load an architecture-agnostic decoder.

查看更多
该账号已被封号
3楼-- · 2020-06-23 09:08

At a point in the code of the Keras seq2seq example you will have a finished encoder and decoder model. You can save the architecture and weights of these models to disk and load them later. The following works for me:

Save the models to disk:

with open('encoder_model.json', 'w', encoding='utf8') as f:
    f.write(encoder_model.to_json())
encoder_model.save_weights('encoder_model_weights.h5')

with open('decoder_model.json', 'w', encoding='utf8') as f:
    f.write(decoder_model.to_json())
decoder_model.save_weights('decoder_model_weights.h5')

Later load the encoder and decoder:

def load_model(model_filename, model_weights_filename):
    with open(model_filename, 'r', encoding='utf8') as f:
        model = model_from_json(f.read())
    model.load_weights(model_weights_filename)
    return model

encoder = load_model('encoder_model.json', 'encoder_model_weights.h5')
decoder = load_model('decoder_model.json', 'decoder_model_weights.h5')

During prediction you will also need a number of other data, like number of encoder/decoder tokens, dictionaries mapping char to index etc. You can just save these to file after training and load them later, just like with the models.

查看更多
登录 后发表回答