System/Version Info
- Canned census example committed on 2017_06_22_15:06:37.
- TensorFlow 1.2.
- Python 3
- GCP ML Engine 1.2
Approach
Fabrice, I had the same question as you and it took me a while to figure this one out (with the generous help of Eli). I took a slightly different approach. Instead of trying to create an instance key, I made the assumption that the instance key would be in the data (training, evaluation, and prediction).
Here, I use the gender field as the instance key. Obviously, I would not use the gender field in reality as an instance key, I'm only using it here for illustration purposes.
Other than those changes described here, am not making any updates to any other functions or constants from the original script other than to change some things from python 2 to python 3, e.g., changing dict.iteritems() to dict.items().
Here is a gist of my modified model.py file. I did not make any changes to the task.py file.
Updating the key_model_fn_gen()
function
This code relies on guidance I got from Eli. The insight for me was that I need to modify the output_alternatives dictionary in order to return the key and that I do not need to modify the predictions dictionary. (Additionally, I learned that I could get the params as an attribute of the estimator from your (Fabrice's) example, thanks for that.)
KEY = 'gender'
def key_model_fn_gen(estimator):
def _model_fn(features, labels, mode):
key = features.pop(KEY)
params = estimator.params
model_fn_ops = estimator._model_fn(features=features, labels=labels, mode=mode, params=params)
model_fn_ops.output_alternatives[None][1]['key'] = key
return model_fn_ops
return _model_fn
Updating the build_estimator()
function
- I remove gender from
deep_columns
list and wide_columns
list so that it is not used as a feature for training and evaluation.
- I modify the return to include the key wrapper per Eli's guidance.
- I get the model_dir as an attribute of config.
Here is the full code:
def build_estimator(config, embedding_size=8, hidden_units=None):
(gender, race, education, marital_status, relationship,
workclass, occupation, native_country, age,
education_num, capital_gain, capital_loss, hours_per_week) = INPUT_COLUMNS
"""Build an estimator."""
# Reused Transformations.
# Continuous columns can be converted to categorical via bucketization
age_buckets = tf.feature_column.bucketized_column(
age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
# Wide columns and deep columns.
wide_columns = [
# Interactions between different categorical features can also
# be added as new virtual features.
tf.feature_column.crossed_column(
['education', 'occupation'], hash_bucket_size=int(1e4)),
tf.feature_column.crossed_column(
[age_buckets, race, 'occupation'], hash_bucket_size=int(1e6)),
tf.feature_column.crossed_column(
['native_country', 'occupation'], hash_bucket_size=int(1e4)),
native_country,
education,
occupation,
workclass,
marital_status,
relationship,
age_buckets,
]
deep_columns = [
# Use indicator columns for low dimensional vocabularies
tf.feature_column.indicator_column(workclass),
tf.feature_column.indicator_column(education),
tf.feature_column.indicator_column(marital_status),
tf.feature_column.indicator_column(relationship),
tf.feature_column.indicator_column(race),
# Use embedding columns for high dimensional vocabularies
tf.feature_column.embedding_column(
native_country, dimension=embedding_size),
tf.feature_column.embedding_column(occupation, dimension=embedding_size),
age,
education_num,
capital_gain,
capital_loss,
hours_per_week,
]
return tf.contrib.learn.Estimator(
model_fn=key_model_fn_gen(
tf.contrib.learn.DNNLinearCombinedClassifier(
config=config,
linear_feature_columns=wide_columns,
dnn_feature_columns=deep_columns,
dnn_hidden_units=hidden_units or [100, 70, 50, 25],
fix_global_step_increment_bug=True)
),
model_dir=config.model_dir
)
Input format for batch predictions
After the version has been uploaded to ML Engine, the prediction input takes the following form:
{"native_country":" United-States","race":" Black","age":"44","relationship":" Other-relative","gender":" Male","marital_status":" Never-married","hours_per_week":"32","capital_gain":"0","education_num":"9","education":" HS-grad","occupation":" Other-service","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"35","relationship":" Not-in-family","gender":" Male","marital_status":" Divorced","hours_per_week":"40","capital_gain":"0","education_num":"9","education":" HS-grad","occupation":" Craft-repair","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"20","relationship":" Husband","gender":" Male","marital_status":" Married-civ-spouse","hours_per_week":"40","capital_gain":"0","education_num":"10","education":" Some-college","occupation":" Craft-repair","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"43","relationship":" Husband","gender":" Male","marital_status":" Married-civ-spouse","hours_per_week":"50","capital_gain":"0","education_num":"10","education":" Some-college","occupation":" Farming-fishing","capital_loss":"0","workclass":" Self-emp-not-inc"}
{"native_country":" England","race":" White","age":"33","relationship":" Husband","gender":" Male","marital_status":" Married-civ-spouse","hours_per_week":"40","capital_gain":"0","education_num":"13","education":" Bachelors","occupation":" Farming-fishing","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"38","relationship":" Unmarried","gender":" Female","marital_status":" Divorced","hours_per_week":"56","capital_gain":"0","education_num":"13","education":" Bachelors","occupation":" Prof-specialty","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"53","relationship":" Not-in-family","gender":" Female","marital_status":" Never-married","hours_per_week":"35","capital_gain":"8614","education_num":"14","education":" Masters","occupation":" ?","capital_loss":"0","workclass":" ?"}
{"native_country":" China","race":" Asian-Pac-Islander","age":"64","relationship":" Husband","gender":" Male","marital_status":" Married-civ-spouse","hours_per_week":"60","capital_gain":"0","education_num":"14","education":" Masters","occupation":" Prof-specialty","capital_loss":"2057","workclass":" Private"}
Output format of batch prediction
After completing the batch prediction job, I get the following output:
{"probabilities": [0.9633187055587769, 0.036681365221738815], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.9452069997787476, 0.05479296296834946], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.8586776852607727, 0.1413223296403885], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.7370017170906067, 0.2629982531070709], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.48797568678855896, 0.5120242238044739], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.8111950755119324, 0.18880495429039001], "classes": ["0", "1"], "key": [" Female"]}
{"probabilities": [0.5560402274131775, 0.4439597725868225], "classes": ["0", "1"], "key": [" Female"]}
{"probabilities": [0.3235422968864441, 0.6764576435089111], "classes": ["0", "1"], "key": [" Male"]}