Different loss values for test_on_batch and train_

While trying to train a GAN for image generation I ran into a problem which I cannot explain.

When training the generator, the loss which is returned by train_on_batch after just 2 or 3 iterations directly drops to zero. After investigating I realized some strange behavior of the train_on_batch method:

When I check the following:

noise = np.random.uniform(-1.0, 1.0, size=[batch_size, gen_noise_length])
predictions = GAN.stackedModel.predict(noise)

This returns values all close to zero as I would expect since the generator is not trained yet.

However:

y = np.ones([batch_size, 1])
noise = np.random.uniform(-1.0, 1.0, size=[batch_size, gen_noise_length])
loss = GAN.stackedModel.train_on_batch(noise, y)

here the loss is almost zero even though my expected targets are obvious ones. When I run:

y = np.ones([batch_size, 1])
noise = np.random.uniform(-1.0, 1.0, size=[batch_size, gen_noise_length])
loss = GAN.stackedModel.test_on_batch(noise, y)

the returned loss is high as I would expect.

What is going on with the train_on_batch method? I'm really clueless here...

edit

My loss is binary-crossentropy and I build the model like:

def createStackedModel(self):
    # Build stacked GAN model
    gan_in = Input([self.noise_length])
    H = self.genModel(gan_in)
    gan_V = self.disModel(H)
    GAN = Model(gan_in, gan_V)
    opt = RMSprop(lr=0.0001, decay=3e-8)
    GAN.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    return GAN

edit 2

The generator is constructed by stacking some of those blocks each containing a BatchNormalization:

    self.G.add(UpSampling2D())
    self.G.add(Conv2DTranspose(int(depth/8), 5, padding='same'))
    self.G.add(BatchNormalization(momentum=0.5))
    self.G.add(Activation('relu'))

edit 3

I loaded my code to https://gitlab.com/benjamingraf24/DCGAN/ Apparently the problem results from the way how I build the GAN network. So in GANBuilder.py there must be something wrong. However, I cant find it...

BatchNormalization layers behave differently during training and testing phase.

During training phase they will use the current batch mean and variance of the activations to normalize.

However, during testing phase they use the moving mean and moving variance that they collected during training. Without enough previous training these collected values can be far from the actual batch statistics, resulting in significant loss value differences.

Refer to the Keras documentation for BatchNormalization. The momentum argument is used to define how fast the moving mean and moving average will adapt to freshly collected values of batches during training.