Gradients are always zero

I have written an algorithm using tensorflow framework and faced with the problem, that tf.train.Optimizer.compute_gradients(loss) returns zero for all weights. Another problem is if I put batch size larger than about 5, tf.histogram_summary for weights throws an error that some of values are NaN.

I cannot provide here a reproducible example, because my code is quite bulky and I am not so good in TF for make it shorter. I will try to paste here some fragments.

Main loop:

images_ph = tf.placeholder(tf.float32, shape=some_shape)
labels_ph = tf.placeholder(tf.float32, shape=some_shape)
output = inference(BATCH_SIZE, images_ph)
loss = loss(labels_ph, output)
train_op = train(loss, global_step)
session = tf.Session()
session.run(tf.initialize_all_variables())

for i in xrange(MAX_STEPS):
    images, labels = train_dataset.get_batch(BATCH_SIZE, yolo.INPUT_SIZE, yolo.OUTPUT_SIZE)
    session.run([loss, train_op], feed_dict={images_ph : images, labels_ph : labels})

Train_op (here is the problem occures):

def train(total_loss)
    opt = tf.train.AdamOptimizer()
    grads = opt.compute_gradients(total_loss)

    # Here gradients are zeros
    for grad, var in grads:
        if grad is not None:
            tf.histogram_summary("gradients/" + var.op.name, grad)

    return opt.apply_gradients(grads, global_step=global_step)

Loss (the loss is calculated correctly, since it changes from sample to sample):

def loss(labels, output)
    return tf.reduce_mean(tf.squared_difference(labels, output))

Inference: a set of convolution layers with ReLU followed by 3 fully connected layers with sigmoid activation in the last layer. All weights initialized by truncated normal rv's. All labels are vectors of fixed length with real numbers in range [0,1].

Thanks in advance for any help! If you have some hypothesis for my problem, please share I will try them. Also I can share the whole code if you like.