Tensorflow feature column for variable list of val

2019-04-06 07:50发布

From the TensorFlow docs it's clear how to use tf.feature_column.categorical_column_with_vocabulary_list to create a feature column which takes as input some string and outputs a one-hot vector. For example

vocabulary_feature_column =
    tf.feature_column.categorical_column_with_vocabulary_list(
        key="vocab_feature",
        vocabulary_list=["kitchenware", "electronics", "sports"])

Let's say "kitchenware" maps to [1,0,0] and "electronics" maps to [0,1,0]. My question is related to having a list of strings as a feature. For example, if the feature value was ["kitchenware","electronics"] then the desired output would be [1,1,0]. The input list length is not fixed but the output dimension is.

The use case is a straight bag-of-words type model (obviously with a much larger vocabulary list!).

What is the correct way to implement this?

2条回答
你好瞎i
2楼-- · 2019-04-06 07:57
劳资没心,怎么记你
3楼-- · 2019-04-06 08:06

Here is an example how to feed data to the indicator column:

features = {'letter': [['A','A'], ['C','D'], ['E','F'], ['G','A'], ['X','R']]}

letter_feature = tf.feature_column.categorical_column_with_vocabulary_list(
                "letter", ["A", "B", "C"], dtype=tf.string)

indicator = tf.feature_column.indicator_column(letter_feature)
tensor = tf.feature_column.input_layer(features, [indicator])

with tf.Session() as session:
    session.run(tf.global_variables_initializer())
    session.run(tf.tables_initializer())
    print(session.run([tensor]))

Which outputs:

[array([[2., 0., 0.],
       [0., 0., 1.],
       [0., 0., 0.],
       [1., 0., 0.],
       [0., 0., 0.]], dtype=float32)]
查看更多
登录 后发表回答