Issues when applying model.predict() inside data g

2019-08-27 04:19发布

问题:

Basically I am implementing a model that employs perceptual loss to perform single image super-resolution. I constructed my full model such that the input will first pass through the main model, then feed into a pretrained VGG16, and give the output from layer[5] of VGG16 as the final output of the full model. I tried to pass a pre-trained VGG16 model to my data generator in order to prepare my ground truth images for the computation of perceptual loss on the fly. However I have encountered value issues during the training with fit_generator.

I have tried write my own loop to generate data for each batch and use the train_on_batch function instead, and it is working fine. However I do want the benefit of use_multiprocessing with fit_generator.

Here is the generator I have written.I pass lossModel to the generator and use it to generate the output for training with perceptual loss.

class DataGenerator(keras.utils.Sequence):
    'Generates data for Keras'
    def __init__(self, x_train, y_train, lossModel, batch_size=4, shuffle=True):
        'Initialization'
        self.x_train = x_train
        self.y_train = y_train
        self.lossModel = lossModel
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.on_epoch_end()

    def __len__(self):
        'Denotes the number of batches per epoch'
        return int(np.floor(len(self.x_train) / self.batch_size))

    def __getitem__(self, index):
        'Generate one batch of data'
        # Generate  batch of data
        idx = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
        x = self.x_train[idx,]
        y = self.lossModel.predict_on_batch(self.y_train[idx,])
        return x, y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.indexes = np.arange(len(self.x_train))
        if self.shuffle == True:
            np.random.shuffle(self.indexes)

And here I construct the model.

### Create Image Transformation Model ###
mainModel = ResnetBuilder.build((3,72,72), 5, basic_block, [1, 1, 1, 1, 1])

### Create Loss Model (VGG16) ###
lossModel = VGG16(include_top=False, weights='imagenet', input_tensor=None, input_shape=(288,288,3))
lossModel.trainable=False
for layer in lossModel.layers:
    layer.trainable=False

### Create New Loss Model (Use Relu2-2 layer output for perceptual loss)
lossModel = Model(lossModel.inputs,lossModel.layers[5].output)
lossModelOutputs = lossModel(mainModel.output)

### Create Full Model ###
fullModel = Model(mainModel.input, lossModelOutputs)

### Compile FUll Model
fullModel.compile(loss='mse', optimizer='adam',metrics=['mse'])
trained_epochs=0

Error occurs during fit_generator(). Notice the dimension of my input is (72,72,3) and outputs from VGG.layer[5] are in (144,144,128), my y_train is ground truth image in (288,288,3).

# Generators
training_generator = DataGenerator(x_train, y_train, lossModel, batch_size=4, shuffle=True)
# Train model on dataset
fullModel.fit_generator(generator=training_generator, use_multiprocessing=True, workers=6)
Epoch 1/1
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/utils/data_utils.py", line 401, in get_index
    return _SHARED_SEQUENCES[uid][i]
  File "/home/lucien/sr-perceptual/my_classes.py", line 26, in __getitem__
    y = self.lossModel.predict_on_batch(self.y_train[idx,])
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/engine/training.py", line 1273, in predict_on_batch
    self._make_predict_function()
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/engine/training.py", line 554, in _make_predict_function
    **kwargs)
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2744, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2546, in __init__
    with tf.control_dependencies(self.outputs):
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 5004, in control_dependencies
    return get_default_graph().control_dependencies(control_inputs)
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4543, in control_dependencies
    c = self.as_graph_element(c)
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3490, in as_graph_element
    return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3569, in _as_graph_element_locked
    raise ValueError("Tensor %s is not an element of this graph." % obj)
ValueError: Tensor Tensor("block2_conv2/Relu:0", shape=(?, 144, 144, 128), dtype=float32) is not an element of this graph.
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-10-4a040e0935cf> in <module>
      1 # Train model on dataset
----> 2 fullModel.fit_generator(generator=training_generator, use_multiprocessing=True, workers=6)

~/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

~/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    179             batch_index = 0
    180             while steps_done < steps_per_epoch:
--> 181                 generator_output = next(output_generator)
    182 
    183                 if not hasattr(generator_output, '__len__'):

~/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/utils/data_utils.py in get(self)
    599         except Exception as e:
    600             self.stop()
--> 601             six.reraise(*sys.exc_info())
    602 
    603 

~/anaconda3/envs/fyp/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

~/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/utils/data_utils.py in get(self)
    593         try:
    594             while self.is_running():
--> 595                 inputs = self.queue.get(block=True).get()
    596                 self.queue.task_done()
    597                 if inputs is not None:

~/anaconda3/envs/fyp/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
    642             return self._value
    643         else:
--> 644             raise self._value
    645 
    646     def _set(self, i, obj):

ValueError: Tensor Tensor("block2_conv2/Relu:0", shape=(?, 144, 144, 128), dtype=float32) is not an element of this graph.

回答1:

The problem here is mutli-threading. When you are calling the 6 workers, block2_conv2/Relu:0 is created after graph is terminated.

The problem is with _make_predict_function(). You can check this file in your PC for the reasons(i got this from your error text) File "/home/lucien/anaconda3/envs/fyp/lib/python3.6/site-packages/keras/engine/training.py", line 1273, in predict_on_batch self._make_predict_function().

Some ways in which you can remove the errors are :

  • Use theano backend.
  • call model._make_predict_function() right after loading the trained model.
  • Use global model :

Functions :

def load_model():
    global model
    model = yourmodel(weights=xx111122)
        # this is key : save the graph after loading the model
    global graph
    graph = tf.get_default_graph()

While predicting:

with graph.as_default():
   preds = model.predict(image)
   #... etc