I have some files that include image filepaths and features, and some of the images may be missing or corrupt. I'm wondering how to robustly handle errors, by skipping these images and removing them from the queue.
I notice that simply catching the error and continuing will cause the queue to output the same image, so it will repeatedly error out on the same image. Is there a way to dequeue the image on error?
Also, I have a 'tf.Print()' statement to log the filename, but the 'Result:' line in my log shows that the valid image was processed with no corresponding print output. Why does 'tf.Print()' only print the name of the nonexistent file, not the correctly processed file?
Below is a small example, with the same error-handling code as my larger program:
Code:
#!/usr/bin/python3
import tensorflow as tf
example_filename = 'example.csv'
max_iterations = 20
### Create the graph ###
filename_container_queue = tf.train.string_input_producer([ example_filename ])
filename_container_reader = tf.TextLineReader()
_, filename_container_contents = filename_container_reader.read(filename_container_queue)
image_filenames = tf.decode_csv(filename_container_contents, [ tf.constant('', shape=[1], dtype=tf.string) ])
# decode_jpeg only works on a single image at a time
image_filename_batch = tf.train.shuffle_batch([ image_filenames ], batch_size=1, capacity=100, min_after_dequeue=0)
image_filename = tf.reshape(image_filename_batch, [1])
image_filenames_queue = tf.train.string_input_producer(image_filename)
image_reader = tf.WholeFileReader()
_, image_contents = image_reader.read(image_filenames_queue)
image = tf.image.decode_jpeg(tf.Print(image_contents, [ image_filename ]), channels=3)
counter = tf.count_up_to(tf.Variable(tf.constant(0)), max_iterations)
result_op = tf.reduce_mean(tf.image.convert_image_dtype(image, tf.float32), [0,1]) # Output average Red, Green, Blue values.
init_op = tf.initialize_all_variables()
### Run the graph ###
print("Running graph")
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
sess.run([ init_op ])
n = 0
try:
while not coord.should_stop():
try:
result, n = sess.run([ result_op, counter ])
print("Result:", result)
except tf.errors.NotFoundError as e:
print("Skipping file due to image not existing")
# coord.request_stop(e) <--- We only want to skip, not stop the entire process.
except tf.errors.OutOfRangeError as e:
print('Done training -- epoch limit reached after %d iterations' % n)
coord.request_stop(e)
finally:
coord.request_stop()
coord.join(threads)
Data:
example.csv contains:
/home/mburge/Pictures/junk/109798.jpg
nonexistent.jpg
Program Output:
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Running graph
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning N
UMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8475
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 6.83GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
I tensorflow/core/kernels/logging_ops.cc:79] [nonexistent.jpg]
Result: [ 0.33875707 0.39879724 0.28882763]
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
W tensorflow/core/framework/op_kernel.cc:968] Not found: nonexistent.jpg
[[Node: ReaderRead_1 = ReaderRead[_class=["loc:@WholeFileReader", "loc:@input_producer_1"], _device="/job:localhost/replica:0/task:0/cpu:0"](WholeFileReader, input_produ
cer_1)]]
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Skipping file due to image not existing
Done training -- epoch limit reached after 0 iterations
You can manually define a dequeue op:
and later, if you find a problem with reading a file, dequeue that file from the filename queue: