Predict in Tensorflow estimator using input fn

2020-05-28 10:50发布

I use the tutorial code from https://github.com/tensorflow/tensorflow/blob/r1.3/tensorflow/examples/learn/wide_n_deep_tutorial.py and the code works fine until I tried to make a prediction instead of just evaluate it. I tried to make another function for prediction that look like this (by just removing parameter y):

def input_fn_predict(data_file, num_epochs, shuffle):
  """Input builder function."""
  df_data = pd.read_csv(
      tf.gfile.Open(data_file),
      names=CSV_COLUMNS,
      skipinitialspace=True,
      engine="python",
      skiprows=1)
  # remove NaN elements
  df_data = df_data.dropna(how="any", axis=0)
  labels = df_data["income_bracket"].apply(lambda x: ">50K" in x).astype(int)
  return tf.estimator.inputs.pandas_input_fn( #removed paramter y
      x=df_data,
      batch_size=100,
      num_epochs=num_epochs,
      shuffle=shuffle,
      num_threads=5)

And to call it like this:

predictions = m.predict(
      input_fn=input_fn_predict(test_file_name, num_epochs=1, shuffle=True)
  )
  for i, p in enumerate(predictions):
      print(i, p)
  • Am I doing it right?
  • Why do I get the prediction 81404 instead of 16282(number of line in test file)?
  • Each line contains something like this:

{'probabilities': array([ 0.78595656, 0.21404342], dtype=float32), 'logits': array([-1.3007226], dtype=float32), 'classes': array(['0'], dtype=object), 'class_ids': array([0]), 'logistic': array([ 0.21404341], dtype=float32)}

How do I read that?

1条回答
何必那么认真
2楼-- · 2020-05-28 11:16

You need to set shuffle=False since to predict new label, you need to maintain data order.

Below is my code to run the prediction (I've tested it). The input file is like test data (in csv), but there is no label column.



    def predict_input_fn(data_file):
        global CSV_COLUMNS
        CSV_COLUMNS = CSV_COLUMNS[:-1]
        df_data = pd.read_csv(
            tf.gfile.Open(data_file),
            names=CSV_COLUMNS,
            skipinitialspace=True,
            engine='python',
            skiprows=1
        )

        # remove NaN elements
        df_data = df_data.dropna(how='any', axis=0)

        return tf.estimator.inputs.pandas_input_fn(
            x=df_data,
            num_epochs=1,
           shuffle=False
        )

To call it:



    predict_file_name = 'tutorials/data/adult.predict'
    results = m.predict(
        input_fn=predict_input_fn(predict_file_name)
    )
    for result in results:
        print 'result: {}'.format(result)

The prediction result for one sample is below:



    {
        'probabilities': array([0.78595656, 0.21404342], dtype = float32),
        'logits': array([-1.3007226], dtype = float32),
        'classes': array(['0'], dtype = object),
        'class_ids': array([0]),
        'logistic': array([0.21404341], dtype = float32)
    }

What each field means are

  • 'probabilities': array([0.78595656, 0.21404342], dtype = float32).
    It predicts the output label is class-0 (in this case <=50K) with confidence 0.78595656
  • 'logits': array([-1.3007226], dtype = float32)
    The value of z in equation 1/(1+e^(-z)) is -1.3.
  • 'classes': array(['0'], dtype = object)
    The class label is 0
查看更多
登录 后发表回答