tensorflow: creating variables in fn of tf.map_fn

2019-08-17 23:01发布

问题:

I have questions regarding variable initialization in map_fn.

I was trying to apply some highway layers separately on each individual element in a tensor, so i figure map_fn might be the best way to do it.

segment_list = tf.reshape(raw_segment_embedding,[batch_size*seqlen,embed_dim])
segment_embedding = tf.map_fn(lambda x: stack_highways(x, hparams), segment_list)

Now the problem is my fn, i.e. stack_highways, create variables, and for some reason tensorflow fails to initialize those variables and give this error.

W = tf.Variable(tf.truncated_normal(W_shape, stddev=0.1), name='weight')

ValueError: Initializer for variable body/model/parallel_0/body/map/while/highway_layer0/weight/ is from inside a control-flow construct, such as a loop or conditional. When creating a variable inside a loop or conditional, use a lambda as the initializer. 

I am pretty clueless now, based on the error I suppose it is not about scope but I have no idea how to use a lambda as the initializer (I dont even know what exactly does that mean). Below are the implementation of stack_highways, any advice would be much appreciated..

def weight_bias(W_shape, b_shape, bias_init=0.1):
  """Fully connected highway layer adopted from 
     https://github.com/fomorians/highway-fcn/blob/master/main.py
  """
  W = tf.Variable(tf.truncated_normal(W_shape, stddev=0.1), name='weight')
  b = tf.Variable(tf.constant(bias_init, shape=b_shape), name='bias')
  return W, b




def highway_layer(x, size, activation, carry_bias=-1.0):
  """Fully connected highway layer adopted from 
     https://github.com/fomorians/highway-fcn/blob/master/main.py
  """
  W, b = weight_bias([size, size], [size])
  with tf.name_scope('transform_gate'):
    W_T, b_T = weight_bias([size, size], bias_init=carry_bias)


    H = activation(tf.matmul(x, W) + b, name='activation')
    T = tf.sigmoid(tf.matmul(x, W_T) + b_T, name='transform_gate')
    C = tf.sub(1.0, T, name="carry_gate")


    y = tf.add(tf.mul(H, T), tf.mul(x, C), name='y') # y = (H * T) + (x * C)
    return y




def stack_highways(x, hparams):
  """Create highway networks, this would not create
  a padding layer in the bottom and the top, it would 
  just be layers of highways.


  Args:
    x: a raw_segment_embedding
    hparams: run hyperparameters


  Returns:
    y: a segment_embedding
  """
  highway_size = hparams.highway_size
  activation = hparams.highway_activation #tf.nn.relu
  carry_bias_init = hparams.highway_carry_bias
  prev_y = None
  y = None
  for i in range(highway_size):
    with tf.name_scope("highway_layer{}".format(i)) as scope:
      if i == 0: # first, input layer
        prev_y = highway_layer(x, highway_size, activation, carry_bias=carry_bias_init)
      elif i == highways - 1: # last, output layer
        y = highway_layer(prev_y, highway_size, activation, carry_bias=carry_bias_init)
      else: # hidden layers
        prev_y = highway_layer(prev_y, highway_size, activation, carry_bias=carry_bias_init)
  return y

Warmest Regards, Colman

回答1:

TensorFlow provides two main ways of initializing variables:

  1. "lambda" initializers: callables that return the value of initialization. TF provides many nicely packaged ones.
  2. Initialization by tensor values: This is what you are using currently.

The error message is stating that you need to use the first type of initializer when using variables from within a while_loop (which map_fn calls internally). (In general lambda initializers seem more robust to me.)

Additionally in the past, tf.get_variable seems to be preferred over tf.Variable when used from within control flow.

So, I suspect you can resolve your issue by fixing your weight_bias function to something like this:

def weight_bias(W_shape, b_shape, bias_init=0.1):
  """Fully connected highway layer adopted from 
     https://github.com/fomorians/highway-fcn/blob/master/main.py
  """
  W = tf.get_variable("weight", shape=W_shape,
          initializer=tf.truncated_normal_initializer(stddev=0.1))
  b = tf.get_variable("bias", shape=b_shape,
          initializer=tf.constant_inititializer(bias_init))
  return W, b

Hope that helps!