Triplet, Siamese and Softmax in Tensorflow

I'd like to compare the performance of following types of CNNs for two different large image data sets. The goal is to measure the similarity between two images, which both have not been seen during training. I have access to 2 GPUs and 16 CPU cores.

Triplet CNN (Input: Three images, Label: encoded in position)
Siamese CNN (Input: Two images, Label: one binary label)
Softmax CNN for Feature Learning (Input: One image, Label: one integer label)

For Softmax I can store the data in a binary format (Sequentially store label and image). Then read it with a TensorFlow reader.

To use the same method for Triplet and Siamese Networks, I'd have to generate the combinations in advance and store them to disk. That would result in a big overhead in both the time it takes to create the file and in disk space. How can it be done on the fly?

Another easy way would be to use feed_dict, but this would be slow. Therefore the problem would be solved if it would be possible to run the same function which I'd use for feed_dict in parallel and convert the result to a TensorFlow tensor as a last step. But as far as I know such a conversion does not exist so one has to read the files with a TensorFlow reader in the first place and do the whole process with TensorFlow methods. Is this correct?

Short answer do the pair/triplet creation online with numpy no need to convert it to a tensor as the feed_dict arguments accepts numpy arrays already.

The best would be to use tf.nn.embedding_lookup() from already existing batches in combination with itertools to create the indices of the pairs but for a naïve non-optimal solution you can look at the gen_batches_siamese.py script in my github repository. Where I reimplemented the caffe siamese example. Obviously it will be less efficient than using tensorflow queues but my advice would be to try this baseline first before going to the pure tensorflow solution.