Deploying and predicting the tensorflow for poets

I was able to deploy tensorflow for poets onto the cloud ml engine by creating a saved model using this script by rhaertel80

import tensorflow as tf
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.saved_model import builder as saved_model_builder

input_graph = 'retrained_graph.pb'
saved_model_dir = 'my_model'

with tf.Graph().as_default() as graph:
  # Read in the export graph
  with tf.gfile.FastGFile(input_graph, 'rb') as f:
      graph_def = tf.GraphDef()
      graph_def.ParseFromString(f.read())
      tf.import_graph_def(graph_def, name='')

  # Define SavedModel Signature (inputs and outputs)
  in_image = graph.get_tensor_by_name('DecodeJpeg/contents:0')
  inputs = {'image_bytes': tf.saved_model.utils.build_tensor_info(in_image)}

  out_classes = graph.get_tensor_by_name('final_result:0')
  outputs = {'prediction': tf.saved_model.utils.build_tensor_info(out_classes)}

  signature = tf.saved_model.signature_def_utils.build_signature_def(
      inputs=inputs,
      outputs=outputs,
      method_name='tensorflow/serving/predict'
  )

  with tf.Session(graph=graph) as sess:
    # Save out the SavedModel.
    b = saved_model_builder.SavedModelBuilder(saved_model_dir)
    b.add_meta_graph_and_variables(sess,
                               [tf.saved_model.tag_constants.SERVING],
                               signature_def_map={'serving_default': signature})
    b.save()

The current version of tensorflow for poets uses mobilenet architecture which wasn't working with the above script, I used the default inceptionv3 by not specifying the architecture and then ran the above script which worked successfully. I then uploaded the above savedmodel onto my bucket and created a new model and version from the console and specified the directory to my bucket and used runtime version 1.5.

After deploying my model succesfully i wrote a short script to test my model like here :

from oauth2client.client import GoogleCredentials
from googleapiclient import discovery
from googleapiclient import errors

# Store your full project ID in a variable in the format the API needs.
projectID = 'projects/{}'.format('edocoto-186909')

# Build a representation of the Cloud ML API.
ml = discovery.build('ml', 'v1')

# Create a dictionary with the fields from the request body.
name1 = 'projects/{}/models/{}'.format('edocoto-186909','flower_inception')

# Create a request to call projects.models.create.
request = ml.projects().predict(
    name=name1,
    body={'instances': [{'image_bytes': {'b64': b64imagedata }, 'key': '0'}]})  
print (request)

# Make the call.
try:
    response = request.execute()
    print(response)
except errors.HttpError as err:
    # Something went wrong, print out some information.
    print('There was an error creating the model. Check the details:')
    print(err._get_reason())

This gave the following error :

{'error': "Prediction failed: Expected tensor name: image_bytes, got tensor name: [u'image_bytes', u'key']."}

I removed the key variable

body={'instances': {'image_bytes': {'b64': b64imagedata }}})

and now i get the following error :

{'error': 'Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="NodeDef mentions attr \'dilations\' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_FLOAT]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>; NodeDef: conv/Conv2D = Conv2D[T=DT_FLOAT, _output_shapes=[[1,149,149,32]], data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Mul, conv/conv2d_params). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).\n\t [[Node: conv/Conv2D = Conv2D[T=DT_FLOAT, _output_shapes=[[1,149,149,32]], data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Mul, conv/conv2d_params)]]")'}

I have no idea what to do now and any help would be appreciated

Edit1: After training the model on tensorflow 1.5, i redeployed it too cloud-ml and ran the above script and now i am getting this error:

{u'error': u'Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="contents must be scalar, got shape [1]\n\t [[Node: DecodeJpeg = DecodeJpeg[_output_shapes=[[?,?,3]], acceptable_fraction=1, channels=3, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_DecodeJpeg/contents_0_0)]]")'}

Edit2: After so long and thanks to the efforts of rhaertel80, I have successfully deployed to ml engine. Here is the final converter script refrenced here courtesy of rhaertel80

    import tensorflow as tf
from tensorflow.contrib import layers

from tensorflow.python.saved_model import builder as saved_model_builder
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import signature_def_utils
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.saved_model import utils as saved_model_utils
import tensorflow.python.saved_model.simple_save


export_dir = 'my_model2'
retrained_graph = 'retrained_graph.pb'
label_count = 5

class Model(object):
    def __init__(self, label_count):
        self.label_count = label_count

    def build_prediction_graph(self, g):
        inputs = {
            'key': keys_placeholder,
            'image_bytes': tensors.input_jpeg
        }

        keys = tf.identity(keys_placeholder)
        outputs = {
            'key': keys,
            'prediction': g.get_tensor_by_name('final_result:0')
        }

        return inputs, outputs

    def export(self, output_dir):
        with tf.Session(graph=tf.Graph()) as sess:
            # This will be our input that accepts a batch of inputs
            image_bytes = tf.placeholder(tf.string, name='input', shape=(None,))
            # Force it to be a single input; will raise an error if we send a batch.
            coerced = tf.squeeze(image_bytes)
            # When we import the graph, we'll connect `coerced` to `DecodeJPGInput:0`
            input_map = {'DecodeJpeg/contents:0': coerced}

            with tf.gfile.GFile(retrained_graph, "rb") as f:
                graph_def = tf.GraphDef()
                graph_def.ParseFromString(f.read())
                tf.import_graph_def(graph_def, input_map=input_map, name="")

            keys_placeholder = tf.placeholder(tf.string, shape=[None])

            inputs = {'image_bytes': image_bytes, 'key': keys_placeholder}

            keys = tf.identity(keys_placeholder)
            outputs = {
                'key': keys,
                'prediction': tf.get_default_graph().get_tensor_by_name('final_result:0')}    

            tf.saved_model.simple_save(sess, output_dir, inputs, outputs)

model = Model(label_count)
model.export(export_dir)

main difference from rhaertel80's code is the change from DecodeJPGInput:0 to DecodeJpeg/contents:0 since it was giviing an error stating that there is no such reference in the graph for the former

回答1:

Those types of errors tend to occur when you train with a newer version of TensorFlow than you specify when trying to serve the model. You mentioned you deployed the model with TF 1.5, but you didn't mention what version of TF you used to train the model / run the export.

My recommendation is to use the same version of TF that you used to train the model. CloudML Engine officially supports TF 1.6 and will support TF 1.7 in the next week or two (it may even work now, unofficially).

Alternatively, you could downgrade the version of TF used to train the model.

回答2:

The last time I saw that error it was a version conflict in tensorflow. Dialations are a newish concept and are changing in the API from minor version to minor version. I suspect the code was written for an older version of tensorflow and you'll need to make sure you have the same version down to the minor version number that the code was written for.

The easiest way to install old versions is creating a new conda enviroment then following the answer on this page by minion (it's like the 3rd answer down, way eaiser to follow than other answers, so look for it).

How to download previous version of tensorflow?

https://conda.io/docs/user-guide/tasks/manage-environments.html