Tensorflow execution time

2019-01-28 18:00发布

I have a function within a Python script that I am calling multiple times (https://github.com/sankhaMukherjee/NNoptExpt/blob/dev/src/lib/NNlib/NNmodel.py): I have simplified the function significantly for this example.

def errorValW(self, X, y, weights):

    errVal = None

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        nW = len(self.allW)
        W = weights[:nW] 
        B = weights[nW:]

        for i in range(len(W)):
            sess.run(tf.assign( self.allW[i], W[i] ))

        for i in range(len(B)):
            sess.run(tf.assign( self.allB[i], B[i] ))

        errVal = sess.run(self.err, 
            feed_dict = {self.Inp: X, self.Op: y})

    return errVal

I am calling this function many times from another function. When I see the program log, It appears that this function keeps taking longer and longer. A partial log is shown:

21:37:12,634 - ... .errorValW ... - Finished the function [errorValW] in 1.477610e+00 seconds
21:37:14,116 - ... .errorValW ... - Finished the function [errorValW] in 1.481470e+00 seconds
21:37:15,608 - ... .errorValW ... - Finished the function [errorValW] in 1.490914e+00 seconds
21:37:17,113 - ... .errorValW ... - Finished the function [errorValW] in 1.504651e+00 seconds
21:37:18,557 - ... .errorValW ... - Finished the function [errorValW] in 1.443876e+00 seconds
21:37:20,183 - ... .errorValW ... - Finished the function [errorValW] in 1.625608e+00 seconds
21:37:21,719 - ... .errorValW ... - Finished the function [errorValW] in 1.534915e+00 seconds
... many lines later  
22:59:26,524 - ... .errorValW ... - Finished the function [errorValW] in 9.576592e+00 seconds
22:59:35,991 - ... .errorValW ... - Finished the function [errorValW] in 9.466405e+00 seconds
22:59:45,708 - ... .errorValW ... - Finished the function [errorValW] in 9.716456e+00 seconds
22:59:54,991 - ... .errorValW ... - Finished the function [errorValW] in 9.282923e+00 seconds
23:00:04,407 - ... .errorValW ... - Finished the function [errorValW] in 9.415035e+00 seconds

Has anyone else experienced anything like this?? This is totally baffling to me ...

Edit: this is for reference ...

For reference, the initializer for the class is shown below. I suspect that the graph for the result variable is progressively increasing in size. I have seen this problem when I try to save models with tf.train.Saver(tf.trainable_variables()) and the size of this file keeps increasing. I am not sure if I am making a mistake in defining the model in any way ...

def __init__(self, inpSize, opSize, layers, activations):

    self.inpSize = inpSize
    self.Inp     = tf.placeholder(dtype=tf.float32, shape=inpSize, name='Inp')
    self.Op      = tf.placeholder(dtype=tf.float32, shape=opSize, name='Op')

    self.allW    = []
    self.allB    = []

    self.result  = None

    prevSize = inpSize[0]
    for i, l in enumerate(layers):
        tempW = tf.Variable( 0.1*(np.random.rand(l, prevSize) - 0.5), dtype=tf.float32, name='W_{}'.format(i) )
        tempB = tf.Variable( 0, dtype=tf.float32, name='B_{}'.format(i) )

        self.allW.append( tempW )
        self.allB.append( tempB )

        if i == 0:
            self.result = tf.matmul( tempW, self.Inp ) + tempB
        else:
            self.result = tf.matmul( tempW, self.result ) + tempB

        prevSize = l

        if activations[i] is not None:
            self.result = activations[i]( self.result )

    self.err = tf.sqrt(tf.reduce_mean((self.Op - self.result)**2))


    return

1条回答
贪生不怕死
2楼-- · 2019-01-28 18:33

You are calling tf.assign in the the session context. This will keep adding ops to your graph every time you execute the errorValW function, slowing down execution as your graph grows larger. As a rule of thumb, you should avoid ever calling Tensorflow ops when executing models on data (since this will usually be inside a loop, resulting in constant growth of the graph). From my personal experience, even if you are only adding "a few" ops during execution time this can result in extreme slowdown.

Note that tf.assign is an op like any other. You should define it once beforehand (when creating the model/building the graph) and then run the same op repeatedly after launching the session.

I don't know what exactly you are trying to achieve in your code snippet, but consider the following:

...
with tf.Session() as sess:
    sess.run(tf.assign(some_var, a_value))

could be replaced by

a_placeholder = tf.placeholder(type_for_a_value, shape_for_a_value)
assign_op = tf.assign(some_var, a_placeholder)
...
with tf.Session() as sess:
    sess.run(assign_op, feed_dict={a_placeholder: a_value})

where a_placeholder should have the same dtype/shape as some_var. I have to admit I haven't tested this snippet so please let me know if there are issues, but this should be about right.

查看更多
登录 后发表回答