Using a created tensorflow model for predicting

2020-05-09 01:13发布

问题:

I'm looking at source code from this Tensorflow article that talks about how to create a wide-and-deep learning model. https://www.tensorflow.org/versions/r1.3/tutorials/wide_and_deep

Here is the link to the python source code: https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/examples/learn/wide_n_deep_tutorial.py

What the goal of it is, is to train a model that will predict if someone makes more or less than $50k a year given the data in the census information.

As instructed, I'm running this command to execute:

python wide_n_deep_tutorial.py --model_type=wide_n_deep

The result that I get is the following:

$ python wide_n_deep.py --model_type=wide_n_deep
Training data is downloaded to /tmp/tmp_pwqo2h8
Test data is downloaded to /tmp/tmph6jcimik
2018-01-03 05:34:12.236038: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
WARNING:tensorflow:enqueue_data was called with num_epochs and num_threads > 1. num_epochs is applied per thread, so this will produce more epochs than you probably intend. If you want to limit epochs, use one thread.
WARNING:tensorflow:enqueue_data was called with shuffle=False and num_threads > 1. This will create multiple threads, all reading the array/dataframe in order. If you want examples read in order, use one thread; if you want multiple threads, enable shuffling.
WARNING:tensorflow:Casting <dtype: 'float32'> labels to bool.
WARNING:tensorflow:Casting <dtype: 'float32'> labels to bool.
model directory = /tmp/tmp_ab6cfsf
accuracy: 0.808673
accuracy_baseline: 0.763774
auc: 0.841373
auc_precision_recall: 0.66043
average_loss: 0.418642
global_step: 2000
label/mean: 0.236226
loss: 41.8154
prediction/mean: 0.251593

In the various articles that I've seen online, it talks about loading in a .ckpt file. When I look in my model directory I see these files:

$ ls /tmp/tmp_ab6cfsf
checkpoint  eval  events.out.tfevents.1514957651.ml-1  graph.pbtxt  model.ckpt-1.data-00000-of-00001  model.ckpt-1.index  model.ckpt-1.meta  model.ckpt-2000.data-00000-of-00001  model.ckpt-2000.index  model.ckpt-2000.meta

I'm guessing the one that I would be using is model.ckpt-1.meta, is that correct?

But I'm also confused on how to use and feed this model data. I've looked at this article on Tensorflow's website: https://www.tensorflow.org/versions/r1.3/programmers_guide/saved_model

Which says "Note that Estimators automatically saves and restores variables (in the model_dir)." (not sure what that means in this context)

How can I generate information in the format of the census data, except the salary since that is what we are supposed to be predicting? It's not obvious to me how to use the two Tensorflow articles in order to be able to use the trained model in order to make predictions.

回答1:

You can look at the official blog posts (part 1 and part 3) from the TensorFlow team that explains well how to use an estimator.

In particular they explain how to make predictions using a custom input. This uses the built-in predict method of Estimators:

estimator = tf.estimator.Estimator(model_fn, ...)

predict_input_fn = ...  # define this using tf.data

predict_results = estimator.predict(predict_input_fn)
for idx, prediction in enumerate(predict_results):
    print(idx)
    for key in prediction:
        print("...{}: {}".format(key, prediction[key]))

For your example, we can create a predict input function using an additional csv file. Let's suppose we have a csv file called "predict.csv" containing three examples (could be the first three lines of "test.csv" for instance without the labels). This would give:

predict.csv:

...skip this line...
25, Private, 226802, 11th, 7, Never-married, Machine-op-inspct, Own-child, Black, Male, 0, 0, 40, United-States
38, Private, 89814, HS-grad, 9, Married-civ-spouse, Farming-fishing, Husband, White, Male, 0, 0, 50, United-States
28, Local-gov, 336951, Assoc-acdm, 12, Married-civ-spouse, Protective-serv, Husband, White, Male, 0, 0, 40, United-States

estimator = build_estimator(FLAGS.model_dir, FLAGS.model_type)

def predict_input_fn(data_file):
    """Input builder function."""
    df_data = pd.read_csv(
        tf.gfile.Open(data_file),
        names=CSV_COLUMNS[:-1],  # remove the last name "income_bracket" that corresponds to the label
        skipinitialspace=True,
        engine="python",
        skiprows=1)
    # remove NaN elements
    df_data = df_data.dropna(how="any", axis=0)
    return tf.estimator.inputs.pandas_input_fn(x=df_data, y=None, shuffle=False)

predict_file_name = "wide_n_deep/predict.csv"
predict_results = estimator.predict(input_fn=predict_input_fn(predict_file_name))
for idx, prediction in enumerate(predict_results):
    print(idx)
    for key in prediction:
        print("...{}: {}".format(key, prediction[key]))