Tensorflow Dataset API doubles graph protobuff fil

Summary: Using the new tf.contrib.data.Dataset doubles the size of my graph protobuff file and I'm unable to visualize the graph in Tensorboard.

The details:

I'm trying out the new TensorFlow tf.contrib.data.Dataset functionality together with the tf.contrib.learn.Experiment framework. My input data is defined as input functions which return tensors of features and labels.

If I create my input function with the tf.train.slice_input_producer function like in the following codeblock (full code here), then my resulting graph.pbtxt file is 620M and the .meta files are around 165M in size.

def train_inputs():
    with tf.name_scope('Training_data'):
        x = tf.constant(mnist.train.images.reshape([-1, 28, 28, 1]))
        y = tf.constant(mnist.train.labels)
        sliced_input = tf.train.slice_input_producer(
            tensor_list=[x, y], shuffle=True)
        return tf.train.shuffle_batch(
            sliced_input, batch_size=batch_size,
            capacity=10000, min_after_dequeue=batch_size*10)

Now if I create my input function with the new tf.contrib.data.Dataset.from_tensor_slices like in the following codeblock (full code here), then my resulting graph.pbtxt file doubles in size to 1.3G and the .meta files double in size to 330M.

def train_inputs():
    with tf.name_scope('Training_data'):
        images = mnist.train.images.reshape([-1, 28, 28, 1])
        labels = mnist.train.labels
        dataset = tf.contrib.data.Dataset.from_tensor_slices(
            (images, labels))
        dataset = dataset.repeat(None)  # Infinite
        dataset = dataset.shuffle(buffer_size=10000)
        dataset = dataset.batch(batch_size)
        iterator = dataset.make_one_shot_iterator()
        next_example, next_label = iterator.get_next()
        return next_example, next_label

Now because the graph.pbtxt file is so big TensorBoard takes ages to parse this file, and I'm unable to debug my model graph visually. I found in the Dataset documentation that this increase in size comes from: "the contents of the array will be copied multiple times" and the solution would be to use placeholders. However, in this case, I would need to feed in the numpy arrays into the placeholders with an active session to initialize the iterator:

sess.run(iterator.initializer, feed_dict={features_placeholder: features, labels_placeholder: labels})

This seems, however, to be out of my control when using the tf.contrib.learn.Experiment framework.

How can I initialize the iterator's initialiser with the Experiment framework? Or find a workaround to using the Dataset API without increasing my graph size?

标签： python tensorflow skflow

1条回答

ら.Afraid

2楼-- · 2019-02-18 10:39

I found a solution to my problem using tf.train.SessionRunHook. I create a SessionRunHook object that initialises the iterator after the session is created:

class IteratorInitializerHook(tf.train.SessionRunHook):
    def __init__(self):
        super(IteratorInitializerHook, self).__init__()
        self.iterator_initiliser_func = None

    def after_create_session(self, session, coord):
        self.iterator_initiliser_func(session)

The initializer function is set when creating the Dataset Iterator:

iterator_initiliser_hook.iterator_initiliser_func = \
    lambda sess: sess.run(
        iterator.initializer,
        feed_dict={images_placeholder: images,
                   labels_placeholder: labels})

And I pass in the hook objects to train_monitors and eval_hooks parameters of tf.contrib.learn.Experiment.

The resulting graph.pbtxt file is now only 500K while the .meta files are only 244K.

Full example here.

0人赞添加讨论(0) 举报

Tensorflow Dataset API doubles graph protobuff fil

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间