I've run into an issue trying to use Tensorflow's feature_column mappings inside of a function passed in to the Dataset map method. This happens when trying to one hot encode categorical string features of a Dataset as part of the input pipeline using Dataset.map. The error message I'm getting is that: tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized.
The following code is a basic example that recreates the problem:
import numpy as np
import tensorflow as tf
from tensorflow.contrib.lookup import index_table_from_tensor
# generate tfrecords with two string categorical features and write to file
vlists = dict(season=['Spring', 'Summer', 'Fall', 'Winter'],
day=['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'])
writer = tf.python_io.TFRecordWriter('test.tfr')
for s,d in zip(np.random.choice(vlists['season'],50),
np.random.choice(vlists['day'],50)):
example = tf.train.Example(
features = tf.train.Features(
feature={
'season':tf.train.Feature(
bytes_list=tf.train.BytesList(value=[s.encode()])),
'day':tf.train.Feature(
bytes_list=tf.train.BytesList(value=[d.encode()]))
}
)
)
serialized = example.SerializeToString()
writer.write(serialized)
writer.close()
Now there's a tfrecord file in the cwd called test.tfr with 50 records, and each record consists of two string features, 'season' and 'day', The following will then create a Dataset that will parse the tfrecords and create batches of size 4
def parse_record(element):
feats = {
'season': tf.FixedLenFeature((), tf.string),
'day': tf.FixedLenFeature((), tf.string)
}
return tf.parse_example(element, feats)
fname = tf.placeholder(tf.string, [])
ds = tf.data.TFRecordDataset(fname)
ds = ds.batch(4).map(parse_record)
At this point if you create an iterator and call get_next on it several times, it works as expected and you would see output like this each run:
iterator = ds.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)
# output of run(nxt) would look like
# {'day': array([b'Sat', b'Thu', b'Fri', b'Thu'], dtype=object), 'season': array([b'Winter', b'Winter', b'Fall', b'Summer'], dtype=object)}
However, if I wanted to use feature_columns to one hot encode those categoricals as a Dataset transformation using map, then it runs once producing correct output, but on every subsequent call to run(nxt) it gives the Tables already initialized error, eg:
# using the same Dataset ds from above
season_enc = tf.feature_column.categorical_column_with_vocabulary_list(
key='season', vocabulary_list=vlists['season'])
season_col = tf.feature_column.indicator_column(season_enc)
day_enc = tf.feature_column.categorical_column_with_vocabulary_list(
key='day', vocabulary_list=vlists['day'])
day_col = tf.feature_column.indicator_column(day_enc)
cols = [season_col, day_col]
def _encode(element, feat_cols=cols):
return tf.feature_column.input_layer(element, feat_cols)
ds1 = ds.map(_encode)
iterator = ds1.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)
# first run will produce correct one hot encoded output
sess.run(nxt)
# second run will generate
W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: Table
already initialized.
2018-01-25 19:29:55.802358: W tensorflow/core/framework/op_kernel.cc:1192]
Failed precondition: Table already initialized.
2018-01-25 19:29:55.802612: W tensorflow/core/framework/op_kernel.cc:1192]
Failed precondition: Table already initialized.
tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized.
However, if I try to do the one hot encoding manually without feature_columns as below, then it only works if tables are created before the map function, otherwise it gives the same error above
# using same original Dataset ds
tables = dict(season=index_table_from_tensor(vlists['season']),
day=index_table_from_tensor(vlists['day']))
def to_dummy(element):
s = tables['season'].lookup(element['season'])
d = tables['day'].lookup(element['day'])
return (tf.one_hot(s, depth=len(vlists['season']), axis=-1),
tf.one_hot(d, depth=len(vlists['day']), axis=-1))
ds2 = ds.map(to_dummy)
iterator = ds2.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)
It seems as if it has something to do with the scope or namespace of the index lookup tables created by feature_columns, but I'm not sure how to figure out what's happening here, I've tried changing where and when the feature_column objects are defined, but it hasn't made a difference.