From the TensorFlow docs it's clear how to use tf.feature_column.categorical_column_with_vocabulary_list
to create a feature column which takes as input some string and outputs a one-hot vector. For example
vocabulary_feature_column =
tf.feature_column.categorical_column_with_vocabulary_list(
key="vocab_feature",
vocabulary_list=["kitchenware", "electronics", "sports"])
Let's say "kitchenware"
maps to [1,0,0]
and "electronics"
maps to [0,1,0]
. My question is related to having a list of strings as a feature. For example, if the feature value was ["kitchenware","electronics"]
then the desired output would be [1,1,0]
. The input list length is not fixed but the output dimension is.
The use case is a straight bag-of-words type model (obviously with a much larger vocabulary list!).
What is the correct way to implement this?
you should use tf.feature_column.indicator_column see https://www.tensorflow.org/versions/master/api_docs/python/tf/feature_column/indicator_column
Here is an example how to feed data to the indicator column:
Which outputs: