Context:
I have a simple classifier based on tf.estimator.DNNClassifier that takes text and output probabilities over an intent tags. I am able to train an export the model to a serveable as well as serve the serveable using tensorflow serving. The problem is this servable is too big (around 1GB) and so I wanted to try some tensorflow graph transforms to try to reduce the size of the files being served.
Problem:
I understand how to take the saved_model.pb
and use freeze_model.py to create a new .pb
file that can be used to call transforms on. The result of these transforms (a .pb
file as well) is not a servable and cannot be used with tensorflow serving.
How can a developer go from:
saved model -> graph transforms -> back to a servable
There's documentation that suggests that this is certainly possible, but its not at all intuitive from the docs as to how to do this.
What I've Tried:
import tensorflow as tf
from tensorflow.saved_model import simple_save
from tensorflow.saved_model import signature_constants
from tensorflow.saved_model import tag_constants
from tensorflow.tools.graph_transforms import TransformGraph
with tf.Session(graph=tf.Graph()) as sess_meta:
meta_graph_def = tf.saved_model.loader.load(
sess_meta,
[tag_constants.SERVING],
"/model/path")
graph_def = meta_graph_def.graph_def
other_graph_def = TransformGraph(
graph_def,
["Placeholder"],
["dnn/head/predictions/probabilities"],
["quantize_weights"])
with tf.Graph().as_default():
graph = tf.get_default_graph()
tf.import_graph_def(other_graph_def)
in_tensor = graph.get_tensor_by_name(
"import/Placeholder:0")
out_tensor = graph.get_tensor_by_name(
"import/dnn/head/predictions/probabilities:0")
inputs = {"inputs": in_tensor}
outputs = {"outputs": out_tensor}
simple_save(sess_meta, "./new", inputs, outputs)
My idea was to load the servable, extract the graph_def from the meta_graph_def, transform the graph_def and then try to recreate the servable. This seems to be the incorrect approach.
Is there a way to successfully perform transforms (to reduce file size at inference) on a graph from an exported servable, and then recreate a servable with the transformed graph?
Thanks.
Update (2018-08-28):
Found contrib.meta_graph_transform() which looks promising.
Update (2018-12-03):
A related github issue I opened that seems to be resolved in a detailed blog post which is listed at the end of the ticket.