Tensorflow feature column for variable list of val

2019-04-06 07:48发布

问题:

From the TensorFlow docs it's clear how to use tf.feature_column.categorical_column_with_vocabulary_list to create a feature column which takes as input some string and outputs a one-hot vector. For example

vocabulary_feature_column =
    tf.feature_column.categorical_column_with_vocabulary_list(
        key="vocab_feature",
        vocabulary_list=["kitchenware", "electronics", "sports"])

Let's say "kitchenware" maps to [1,0,0] and "electronics" maps to [0,1,0]. My question is related to having a list of strings as a feature. For example, if the feature value was ["kitchenware","electronics"] then the desired output would be [1,1,0]. The input list length is not fixed but the output dimension is.

The use case is a straight bag-of-words type model (obviously with a much larger vocabulary list!).

What is the correct way to implement this?

回答1:

you should use tf.feature_column.indicator_column see https://www.tensorflow.org/versions/master/api_docs/python/tf/feature_column/indicator_column



回答2:

Here is an example how to feed data to the indicator column:

features = {'letter': [['A','A'], ['C','D'], ['E','F'], ['G','A'], ['X','R']]}

letter_feature = tf.feature_column.categorical_column_with_vocabulary_list(
                "letter", ["A", "B", "C"], dtype=tf.string)

indicator = tf.feature_column.indicator_column(letter_feature)
tensor = tf.feature_column.input_layer(features, [indicator])

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    session.run(tf.tables_initializer())
    print(session.run([tensor]))

Which outputs:

[array([[2., 0., 0.],
       [0., 0., 1.],
       [0., 0., 0.],
       [1., 0., 0.],
       [0., 0., 0.]], dtype=float32)]