Function call stack: keras_scratch_graph Error

2020-07-06 02:44发布

问题:

I am reimplementing a text2speech project. I am facing a Function call stack : keras_scratch_graph error in decoder part. The network architecture is from Deep Voice 3 paper.

I am using keras from TF 2.0 on Google Colab. Below is the code for Decoder Keras Model.

y1 = tf.ones(shape = (16, 203, 320))
def Decoder(name = "decoder"):
    # Decoder Prenet
    din = tf.concat((tf.zeros_like(y1[:, :1, -hp.mel:]), y1[:, :-1, -hp.mel:]), 1)
    keys = K.Input(shape = (180, 256), batch_size = 16, name = "keys")
    vals = K.Input(shape = (180, 256), batch_size = 16, name = "vals")
    prev_max_attentions_li = tf.ones(shape=(hp.dlayer, hp.batch_size), dtype=tf.int32)
    #prev_max_attentions_li = K.Input(tensor = prev_max_attentions_li)
    for i in range(hp.dlayer):
        dpout = K.layers.Dropout(rate = 0 if i == 0 else hp.dropout)(din)
        fc_out = K.layers.Dense(hp.char_embed, activation = 'relu')(dpout)

    print("=======================================================================================================")
    print("The FC value is ", fc_out)
    print("=======================================================================================================")

    query_pe = K.layers.Embedding(hp.Ty, hp.char_embed)(tf.tile(tf.expand_dims(tf.range(hp.Ty // hp.r), 0), [hp.batch_size, 1]))
    key_pe = K.layers.Embedding(hp.Tx, hp.char_embed)(tf.tile(tf.expand_dims(tf.range(hp.Tx), 0), [hp.batch_size, 1]))

    alignments_li, max_attentions_li = [], []
    for i in range(hp.dlayer):
        dpout = K.layers.Dropout(rate = 0)(fc_out)
        queries = K.layers.Conv1D(hp.datten_size, hp.dfilter, padding = 'causal', dilation_rate = 2**i)(dpout)
        fc_out = (queries + fc_out) * tf.math.sqrt(0.5)
        print("=======================================================================================================")
        print("The FC value is ", fc_out)
        print("=======================================================================================================")
        queries = fc_out + query_pe
        keys += key_pe

        tensor, alignments, max_attentions = Attention(name = "attention")(queries, keys, vals, prev_max_attentions_li[i])

        fc_out = (tensor + queries) * tf.math.sqrt(0.5)

        alignments_li.append(alignments)
        max_attentions_li.append(max_attentions)

    decoder_output = fc_out

    dpout = K.layers.Dropout(rate = 0)(decoder_output)
    mel_logits = K.layers.Dense(hp.mel * hp.r)(dpout)

    dpout = K.layers.Dropout(rate = 0)(fc_out)
    done_output = K.layers.Dense(2)(dpout)

    return K.Model(inputs = [keys, vals], outputs = [mel_logits, done_output, decoder_output, alignments_li, max_attentions_li], name = name)

decode = Decoder()
kin = tf.ones(shape = (16, 180, 256))
vin = tf.ones(shape = (16, 180, 256))
print(decode(kin, vin))
tf.keras.utils.plot_model(decode, to_file = "decoder.png", show_shapes = True)

When I test with some data, it shows the error messages below. It's going to be some problem with "fc_out", but I dun know how to pass "fc_out" output from the first for loop to the second for loop? Any answer would be appreciated.

File "Decoder.py", line 60, in <module>
    decode = Decoder()
  File "Decoder.py", line 33, in Decoder
    dpout = K.layers.Dropout(rate = 0)(fc_out)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 596, in __call__
    base_layer_utils.create_keras_history(inputs)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 199, in create_keras_history
    _, created_layers = _create_keras_history_helper(tensors, set(), [])
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
    layer_inputs, processed_ops, created_layers)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
    layer_inputs, processed_ops, created_layers)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
    layer_inputs, processed_ops, created_layers)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 243, in _create_keras_history_helper
    constants[i] = backend.function([], op_input)([])
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3510, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
    return self._call_flat(args)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 671, in _call_flat
    outputs = self._inference_function.call(ctx, args)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 445, in call
    ctx=ctx)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.FailedPreconditionError:  Error while reading resource variable _AnonymousVar19 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar19/N10tensorflow3VarE does not exist.
     [[node dense_7/BiasAdd/ReadVariableOp (defined at Decoder.py:33) ]] [Op:__inference_keras_scratch_graph_566]

Function call stack:
keras_scratch_graph

回答1:

My situation is tensorflow sample code works fine in Google colab but not in my machine as I got keras_scratch_graph error.

Then i add this Python code at the beginning and it works fine.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Restrict TensorFlow to only use the fourth GPU
        tf.config.experimental.set_visible_devices(gpus[0], 'GPU')

        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.

In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.

For example, you want to train multiple small models with one GPU at the same time. By calling tf.config.experimental.set_memory_growth, which attempts to allocate only as much GPU memory in needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, we extend the GPU memory region allocated to the TensorFlow process.

Hope it helps!



回答2:

I think it's a thing about the gpu. look at the traceback:

File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
    return self._call_flat(args)

tf is calling on eager execution, which means that gpu will be used if the version is available. I had the same issue when I was testing a dense network:

inputs=Input(shape=(100,)
             )
x=Dense(32, activation='relu')(inputs)
x=Dense(32, activation='relu')(x)
x=Dense(32, activation='relu')(x)
outputs=Dense(10, activation='softmax')(x)
model=Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
t=tf.zeros([1,100])
model.predict(t, steps=1, batch_size=1)

... and it gave a similar traceback, also linking to eager execution. Then when I disabled gpu using the following line:

tf.config.experimental.set_visible_devices([], 'GPU')

... the code ran just fine. See if this would help solve the issue. Btw, does colab even support gpu? I didn't even know.



回答3:

I was getting similar error. I reduced the batch size and the error disappeared. I don't know why but it worked for me. I am guessing something related to over stacking.



回答4:

If you use Tensorflow-GPU, then add:

physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("physical_devices-------------", len(physical_devices))
tf.config.experimental.set_memory_growth(physical_devices[0], True)

In addition, you can reduce your batch_size or change another computer or cloud services, like google colab, amazon cloud to run your codes because I think this is because the limitation of memory.



回答5:

The following Tensorflow and Keras version removed my issue:

tensorflow==2.2.0
keras=2.3.1